**Arabicthon**

**Ibn Sidah Team**

Prof. Yaser Hifny
yhifny@yahoo.com

Dr. Waleed Nazeeh
w.nazeeh@gmail.com

Mr. Amr ElGendy
amr.algendy@gmail.com



# 1) Mounting Google drive and define paths

In [1]:
# Print CPU and memory details
import tensorflow as tf
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

!lscpu |grep 'Model name'

print('Normal CPU')
print('Processor model')
!cat /proc/cpuinfo  | grep 'name'| uniq
print('Number of processors')
!cat /proc/cpuinfo  | grep process| wc -l
print('Memory details')
!free -h

Your runtime has 13.6 gigabytes of available RAM

Not using a high-RAM runtime
Model name:          Intel(R) Xeon(R) CPU @ 2.20GHz
Normal CPU
Processor model
model name	: Intel(R) Xeon(R) CPU @ 2.20GHz
Number of processors
2
Memory details
              total        used        free      shared  buff/cache   available
Mem:            12G        849M         10G        1.2M        1.8G         11G
Swap:            0B          0B          0B


In [2]:
import os
from google.colab import drive, files
# Mount google drive folders
drive.mount('/content/drive')
# Project path
PROJ_PATH = '/content/drive/My Drive/Sense_Gram_Project'
# Set current directory to the project directory
os.chdir(PROJ_PATH)


Mounted at /content/drive


# 2) Prepare sense gram requirments and load our model.*Takes about 2 minutes.*

In [3]:
!pip install -r requirements.txt
!pip install faiss-cpu
# In the requirements.txt but not installed correctly so we have to use pip command
#!pip install gensim==3.8.1
!python -m spacy download en_core_web_sm
# Install gradio for user interface
!pip install gradio


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting clint
  Downloading clint-0.5.1.tar.gz (29 kB)
Collecting bidict
  Downloading bidict-0.22.0-py3-none-any.whl (36 kB)
Collecting stop_words
  Downloading stop-words-2018.7.23.tar.gz (31 kB)
Collecting chinese_whispers
  Downloading chinese_whispers-0.8.0-py3-none-any.whl (7.7 kB)
Collecting args
  Downloading args-0.1.0.tar.gz (3.0 kB)
Building wheels for collected packages: clint, stop-words, args
  Building wheel for clint (setup.py) ... [?25l[?25hdone
  Created wheel for clint: filename=clint-0.5.1-py3-none-any.whl size=34473 sha256=67763775e5689e44ec5f7d7febba37fc36a55ac7447ad45fe044d377e6bd7f46
  Stored in directory: /root/.cache/pip/wheels/29/97/84/72d17bd67a52abe83c647807c3d77dc4d7c1d7709d7077a5f3
  Building wheel for stop-words (setup.py) ... [?25l[?25hdone
  Created wheel for stop-words: filename=stop_words-2018.7.23-py3-none-any.whl size=32911 sha256=0cac6c31814d0

In [4]:
import sensegram
from wsd import WSD
from gensim.models import KeyedVectors

# Model files
sense_vectors_fpath = "./best_sense_gram_model/best_model.sense_vectors"
word_vectors_fpath = "./best_sense_gram_model/best_model.word_vectors"

# Model parameters
max_context_words  = 3
context_window_size = 5
ignore_case = True
lang = "ar" # to filter out stopwords

# Model loading ... takes some time
sv = sensegram.SenseGram.load_word2vec_format(sense_vectors_fpath, binary=False)
wv = KeyedVectors.load_word2vec_format(word_vectors_fpath, binary=False, unicode_errors="ignore")

# Method takes word and context and retirn the results of the model.
def wsd_method(word, context):
  output = ""
  output += "Probabilities of the senses:\n{}\n\n".format(sv.get_senses(word, ignore_case=ignore_case))
  for sense_id, prob in sv.get_senses(word, ignore_case=ignore_case):
      output += sense_id
      output += ("\n"+"="*20+"\n")
      for rsense_id, sim in sv.wv.most_similar(sense_id):
          output += "{} {:f}\n".format(rsense_id, sim)
      output +="\n"
  # Disambiguate a word in a context
  wsd_model = WSD(sv, wv, window=context_window_size, lang=lang,
                  max_context_words=max_context_words, ignore_case=ignore_case)    
  output += str(wsd_model.disambiguate(context, word))  
  return output

# 3) Live Demo

In [5]:
import gradio as gr
# Lanuching live demo
demo = gr.Interface(
    fn=wsd_method,
    inputs=[gr.Textbox(lines=1, placeholder="الكلمة"),gr.Textbox(lines=2, placeholder="السياق")],
    outputs="text",
    title="فـك الالتباس الدلالي",
    description="فضلًا أدخل الكلمة ثم السياق ثم اضغط على زر إرسال، ولاستعراض المخرجات كاملة يرجى استخدام زر التمرير لأسفل.",
)
demo.launch()


Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`
Running on public URL: https://14283.gradio.app

This share link expires in 72 hours. For free permanent hosting, check out Spaces (https://huggingface.co/spaces)


(<gradio.routes.App at 0x7f6ec0bf08d0>,
 'http://127.0.0.1:7860/',
 'https://14283.gradio.app')

# 4) Test sense gram using group of words and their contexts

In [6]:
# Evaluate sene gram model using a prepared file contains every test word and its context in the same line 
input_file = open('model_test.txt', 'r').read().split('\n')

for line in input_file:    
    splits = line.split('\t')  
    if(len(splits) < 2):
      continue
    
    word = splits[0]
    context = splits[1]  
    print('Word: ', word)
    print('Context: ', context)
    print(wsd_method(word, context))
    print("\n"+"@"*20+"\n")



Word:  ابن
Context:  لأنه يقول بحدسه عن ابن الستين إنه قد استوفى عمرين،
Probabilities of the senses:
[('ابن#1', 1.0), ('ابن#2', 1.0)]

ابن#1
لإبن#1 0.989877
فابن#1 0.983247
لابن#1 0.980642
ولابن#1 0.979941
قدادرة#1 0.976854
لأبن#1 0.976541
وابن#1 0.975853
لأبى#1 0.960908
إبن#1 0.959252
بأبي#1 0.946664

ابن#2
أبن#1 0.979652
أخو#1 0.976889
وأخاه#1 0.970255
وأبوه#1 0.969730
وأخوه#1 0.969557
وولده#2 0.963071
وسيده#2 0.962541
وأستاذه#3 0.960610
وإبن#1 0.959008
واخوه#1 0.958246

('ابن#2', [0.33655633916387667, 0.364251329113882])

@@@@@@@@@@@@@@@@@@@@

Word:  أبناء
Context:  ولا بد للمسترسلين في سباتهم العميق أن يفيقوا من غفوتهم قبل أن تقضي الحوادث المرورية على كل أبناء المجتمع.
Probabilities of the senses:
[('أبناء#1', 1.0)]

أبناء#1
اهل#1 0.927013
ابناء#1 0.926594
أهل#1 0.889120
بابناء#2 0.861529
المنصد#7 0.860206
ابناء#2 0.858489
واهل#1 0.857403
وابناء#1 0.844709
اهالي#2 0.838220
مواطنى#1 0.837131

('أبناء#1', [0.0192911149711917])

@@@@@@@@@@@@@@@@@@@@

Word:  أرسل
Context:  وفي الثانية 