## Word Sense Disambiguation Approach using LESK:  
### **Step 1: Get Input**   
  from Json:  
    1. Get the input: word, sentence and paragraph  
    2. Get all dictionary meanings of the word https://dictionaryapi.com/products/api-collegiate-dictionary
    
### Step 2: Perform Preprocessing on 
**(a) the context (sentence/paragraph)  
(b) dictionary data(i.e. POS(fl) and meanings).**  
   1. Preprocessing steps on both context and dictionary data include tokenization, lower case, filter stop words and lemmatize (stemming can be explored).  
   2. Generate a bag(list) of signatures (unique words) of the word meanings with POS.  
   For example, one of the meaning for the word 'firm' is:  
           POS & Meaning: 'fl': 'noun', 'meaning': 'the name or title under which a company transacts business'  
           Generate Signature: ['noun', 'name', 'title', 'company', 'transacts', 'business']  
   3. Genetate a bag of signature for context with POS(tag the word based on context).   
   For example, for the word 'firm', the sentence is:  
           Sentence: My job at the firm ended after four weeks.  
           Generate Signature: ['noun', 'job', 'firm', 'ended', 'four', 'week']  
   4. Remove the looked up word from both the bags (in progress)    

### **Step 3: Apply Lesk Algoritm**    

   1. Apply Cosine Lesk algorithm based on Cosine similarity.  
   2. Convert the words to ventors and assign counts.  
   3. Compute the cosine similarity.  
  **Cosine similarity** is a metric used to measure how similar the documents (in our case signatures) are irrespective of their size.  
   Mathematically, it measures the cosine of the angle between two vectors projected in a multi-dimensional space.  
   4. Return a dataframe of the ranked order of meanings based on higher score. 

### **Step 4: Generate and export Output in Json file format**

#### Synonyms and example sentences in the dictionary data as an addition in input file to be considered for better scoring results

In [1]:
# Import relevant libraries
import json

# nltk and pywsd # Libraries used 
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.corpus import wordnet as wn
from nltk.stem import WordNetLemmatizer 
from nltk.stem import PorterStemmer

import re, math
from re import sub
from collections import Counter

import numpy as np
import pandas as pd 

In [2]:
# Takes word meanings and returns its signature with POS.

#dict_data = obj['dictionaryData']
def meaning_signatures(dict_data,lu_word):
    signatures = [] # Stores meaning signatures with POS
    tokenizer = nltk.RegexpTokenizer(r"\w+")

    for i in range(len(dict_data)):

        # 1. Tokenize and remove punctuation
        dict_meaning= tokenizer.tokenize(dict_data[i].get("meaning"))

        # 2. Convert all to lower case
        word_set = [word.lower() for word in dict_meaning]

        # 3. Filter out stop words
        stop_words = stopwords.words('english')
        signature = [w for w in word_set if not w in stop_words]

        # 4. Lemmatize the words
        lmtzr = WordNetLemmatizer()
        lemmatized = [lmtzr.lemmatize(word) for word in signature]
        lemmatized= [dict_data[i].get("fl")] + lemmatized #add POS label to the list
        #signatures.append(lemmatized)

        # 5. Stemmer
        ps = PorterStemmer()
        stemmed = [ps.stem(word) for word in lemmatized]
        
        # 6. Remove the look up word if present in meaning
        stemmed[:] = [x for x in stemmed if lu_word not in x]
        # Add stemmed meanings to a list
        signatures.append(stemmed)
        
        
    return (signatures)    

#meaning_signatures(dict_data, word)

In [3]:
# Takes a paragraph and returns the signature.

#paragraph = str(obj['paragraph'])
def para_signature(paragraph, word):
    sent_sig = [] # Stores sentence signature with POS

    # 1. Tokenize and remove punctuation
    tokenizer = nltk.RegexpTokenizer(r"\w+")
    new_words = tokenizer.tokenize(paragraph)
    
    # For POS Tagging
    def find_pos(new_words):
        pt = nltk.pos_tag(new_words)
        pos1 = "None"
        for i in pt:
            if i[0]== word:
                pos1 = (i[1])
            else:
                continue
        return pos1

    def add_pos(pos_value):
        # make a dictionary of possible tags (in progress)
        pos_tag = {'verb':['VB','VBD','VBG','VBN','VBP','VBZ'], 
                   'noun': ['NN','NNS','NNP'], 
                   'adjective': ['JJ','JJR','JJS'],
                   'adverb':['RB', 'RBR', 'RBS'],
                   'None':['None','']} # "masculine noun",

        for ele in pos_tag:
            for i in pos_tag[ele]:
                if i == pos:
                    return ele
                else:
                    continue

    

    # 2. POS Tagging
    pos = find_pos(new_words)
    pos_to_list = [add_pos(pos)]

    # 3. Convert all to lower case
    words = [word.lower() for word in new_words]

    # 4. Filter out stop words
    stop_words = stopwords.words('english')
    words_rem = [w for w in words if not w in stop_words]

    # 4. Lemmatize the words
    lmtzr = WordNetLemmatizer()
    lemmatized = [lmtzr.lemmatize(word) for word in words_rem]
    lemmatized= pos_to_list + lemmatized   #add POS label to the list
    #sent_sig = lemmatized
    
    # 5. Stemmer
    ps = PorterStemmer()
    stemmed = [ps.stem(word) for word in lemmatized]
    sent_sig = stemmed
    
    return sent_sig

# para_signature(paragraph,word)


Cosine similarity calculates similarity by measuring the cosine of angle between two vectors.  
https://medium.com/@adriensieg/text-similarities-da019229c894

In [4]:
def get_cosine(vec1, vec2):
        intersection = set(vec1.keys()) & set(vec2.keys())
        numerator = sum([vec1[x] * vec2[x] for x in intersection])

        sum1 = sum([vec1[x]**2 for x in vec1.keys()])
        sum2 = sum([vec2[x]**2 for x in vec2.keys()])
        denominator = math.sqrt(sum1) * math.sqrt(sum2)

        if not denominator:
            return 0.0
        else:
            return float(numerator) / denominator

# Compare sentence signature with each meaning signature and compute scores 

def df_ranked_scores(signatures,sent_sig,dict_data):
    # 1. Find scores
    scores = []
    for i in signatures:
        vector1 = Counter(sent_sig)
        vector2 = Counter(i)
        cosine = get_cosine(vector1, vector2)
        scores.append(cosine)

    # 2. Extract meaning id and meaning from the input Json
    meaning_ids= [] # meaning ids
    meanings = [] # meanings
    isteacher = [] # Teacher score
    for i in range(len(dict_data)):
        meaning_id = dict_data[i].get("id")
        w_meanings = dict_data[i].get("meaning")
        isteacher_score = dict_data[i].get("isTeacher")
        meaning_ids.append(meaning_id)
        meanings.append(w_meanings)
        isteacher.append(isteacher_score)

    # 3. Generate a dataframe of id, meaning and score
    zipped = list(zip(meaning_ids,meanings, scores,isteacher))
    features = sorted(zipped, key = lambda x: x[2], reverse = True)
    import pandas as pd 
    ranked_meanings = pd.DataFrame(features, columns = ['id', 'meaning','score','isTeacher'])

    return ranked_meanings
#ranked_meanings
#df_ranked_scores(meaning_signatures(dict_data),sent_signature(sentence,word),dict_data)

# Perform Lesk
def cos_lesk(word, paragraph, dict_data):
    p_sig = para_signature(paragraph,word)
    m_sig = meaning_signatures(dict_data,word)
    ranking = df_ranked_scores(m_sig,p_sig,dict_data)
    return ranking

## Perform Lesk on Data

In [5]:
def extract_ids(dicdata):
    ids = []
    for i in dict_data:
        mea_id = i['id']
        ids.append(mea_id)
    return ids

def extract_meanings(dicdata):
    meaning_list = []
    for i in dict_data:
        meaning = i['meaning'] 
        meaning_list.append(meaning)
    return meaning_list

def extract_pos_meanings(dicdata):
    pos_meaning = []
    for i in dict_data:
        pos = i['fl']
        meaning = i['meaning'] 
        pos_mea = pos + ' '+ meaning # POS + meaning
        pos_meaning.append(pos_mea)
    return pos_meaning

def extract_isteacher(dicdata):
    isteacher = []
    for i in dict_data:
        is_tea = i['isTeacher']
        isteacher.append(is_tea)
    return isteacher

### Data with 600 words
1. delimiter missing before new words
2. issue with word so removed:
    1. 138/139 immobile
    2. 290/291 abstruse
    3. 585/586 assuage

In [6]:
# Load Data and store
# Consideration: If score = 0 and isTeacher = 0 then accuracy score = 0
jsonfile= open('data_600.json','r',encoding='utf8') 
data= jsonfile.read() 

# Parse and Extract word, paragraph, pos and meanings
obj = json.loads(data)

data = obj['data']
print("Total words: ",len(data)) # Total # of words
count=0
# Store accuracy and MRR in blank lists
acc_score=[] 
mrr_score = []

# Parse data and find best meaning
for each_word_data in data:
    word = each_word_data['word']
    paragraph = each_word_data['paragraph']
    dict_data = each_word_data['dictionaryData']
    #for i in dict_data:
        #ids = extract_ids(dict_data)
        #meanings = extract_meanings(dict_data)
        #pos_meanings = extract_pos_meanings(dict_data)
        #isteacher = extract_isteacher(dict_data)
        
    print("\n\nLookup word: ",word)    
    ranked_meanings= cos_lesk(word, paragraph, dict_data)
    print(ranked_meanings)
    
    ai_response = ranked_meanings.iloc[0]
    count +=1
    print(count)
    
    # Find accuracy
    if (ranked_meanings.iloc[0]['isTeacher'] ==1) & (ranked_meanings.iloc[0]['score'] !=0): #Consideration: If score = 0 and isTeacher = 0 then accuracy score = 0
        acc_score.append(1)
    else:
        acc_score.append(0)
        
        
    # Find MRR score
    istea_index = ranked_meanings[ranked_meanings['isTeacher']==1].index[0]+1
    pred_rank =1/istea_index
    mrr_score.append(pred_rank)

# Return Accuracy and MRR    
#print(acc_score)
avg = sum(acc_score)/len(acc_score)
print("The accuracy score is ", round(avg,4))    


mrr = np.sum(mrr_score)/ len(mrr_score)
print("The MRR is ", round(mrr,4)) 

Total words:  600


Lookup word:  circumvent 
     id                                            meaning  score  isTeacher
0  3230  to avoid being stopped by (something, such as ...    0.0          1
1


Lookup word:  addict
     id                                            meaning     score  \
0  3401  a person who is not able to stop taking drugs ...  0.071919   
1  5884  one exhibiting a compulsive, chronic, physiolo...  0.046065   
2  5885                    to cause addiction in (someone)  0.000000   

   isTeacher  
0        NaN  
1        1.0  
2        NaN  
2


Lookup word:  frustrated
     id                                            meaning     score  \
0  5886  feeling, showing, or characterized by frustrat...  0.097590   
1  5887                   to balk or defeat in an endeavor  0.054554   

   isTeacher  
0        1.0  
1        NaN  
3


Lookup word:  manslaughter
     id                                            meaning     score  \
0  2554  the crime of killing a 

     id                                            meaning     score  \
0  2503  to push against (someone) while moving forward...  0.141264   
1  5966               to come in contact or into collision  0.078087   

   isTeacher  
0        1.0  
1        NaN  
34


Lookup word:  incongruously
     id                                            meaning  score  isTeacher
0  2738  strange because of not agreeing with what is u...    0.0        1.0
1  5967                         lacking congruity: such as    0.0        NaN
35


Lookup word:  expelled
     id               meaning     score  isTeacher
0  5968  to force out : eject  0.091287          1
36


Lookup word:  bullied
     id                                            meaning     score  \
0  5969  to treat (someone) in a cruel, insulting, thre...  0.083333   

   isTeacher  
0          1  
37


Lookup word:  poked
     id                                            meaning     score  \
0  5972                                      

     id                                            meaning     score  \
0  2793  a feeling of dizziness caused especially by be...  0.091670   
1  6020  a sensation of motion in which the individual ...  0.073127   
2  6021  a condition marked by short, recurrent episode...  0.050572   

   isTeacher  
0        1.0  
1        NaN  
2        NaN  
66


Lookup word:  emancipation
     id                                            meaning     score  \
0  5548                 the act or process of emancipating  0.120386   
1  5549         the act or process of emancipating oneself  0.107676   
2  2794  to free (someone) from someone else's control ...  0.000000   
3  2795                                 yourself or itself  0.000000   

   isTeacher  
0        NaN  
1        NaN  
2        1.0  
3        NaN  
67


Lookup word:  massacres
     id                                            meaning     score  \
0  2693                 the violent killing of many people  0.107676   
1  2694  t

     id                                            meaning     score  \
0  6056  marked by greatness especially in size or degr...  0.044992   

   isTeacher  
0          1  
103


Lookup word:  escorting
     id                                            meaning     score  \
0  6057                          to accompany as an escort  0.365148   
1  3422  to go with (someone or something) to give prot...  0.119523   

   isTeacher  
0        NaN  
1        1.0  
104


Lookup word:  prophet
     id                                            meaning     score  \
0  6059       someone who says that bad things will happen  0.084667   
1  6058  one who utters divinely inspired revelations: ...  0.084667   
2  6060  —used as another name for Muhammad, the founde...  0.039193   

   isTeacher  
0        1.0  
1        NaN  
2        NaN  
105


Lookup word:  dangling
     id                                            meaning     score  \
0  6026  to hang loosely and usually so as to be able t

     id                                            meaning     score  \
0  6122  to smile or laugh with facial contortions that...  0.088388   
1  1267  to smile or laugh at someone or something with...  0.072169   

   isTeacher  
0        NaN  
1        1.0  
138


Lookup word:  stiffened
     id                                            meaning     score  \
0  2890  to stop moving and become completely still esp...  0.072548   

   isTeacher  
0          1  
139


Lookup word:  diminutive
     id                                            meaning  score  isTeacher
0  2901                                         very small    0.0        1.0
1  2902  a word or suffix that indicates that something...    0.0        NaN
140


Lookup word:  gesturing
     id                                            meaning     score  \
0  6128                                  to make a gesture  0.204124   
1  6708  direct or invite (someone) to move somewhere s...  0.117851   

   isTeacher  
0        

0  6186  to turn on or as if on a pivot  0.311086          1
172


Lookup word:  captivate
     id                                            meaning     score  \
0  6187  to influence and dominate by some special char...  0.111111   

   isTeacher  
0          1  
173


Lookup word:  zealot
     id                                            meaning     score  \
0  3080  a person who has very strong feelings about so...  0.134704   
1  6188  a zealous person; especially : a fanatical par...  0.073324   
2  6189                     peter —called also Simon Peter  0.063500   

   isTeacher  
0        1.0  
1        NaN  
2        NaN  
174


Lookup word:  guileless
     id                meaning     score  isTeacher
0  3079  very innocent : naive  0.103695        1.0
1  6190        innocent, naive  0.103695        NaN
175


Lookup word:  poked
     id                                            meaning    score  isTeacher
0  5972                                          prod, jab  0.06415

     id                                            meaning     score  \
0  6225                                  a retaliatory act  0.057166   
1  1584  something that is done to hurt or punish someo...  0.049507   

   isTeacher  
0        NaN  
1        1.0  
200


Lookup word:  impel
     id                                            meaning     score  \
0  3105  to cause (someone) to feel a strong need or de...  0.035007   
1  6226  to urge or drive forward or on by or as if by ...  0.033005   

   isTeacher  
0        1.0  
1        NaN  
201


Lookup word:  shrouded
     id                         meaning     score  isTeacher
0  6227  to cut off from view : obscure  0.130189        NaN
1  3110    to cover or hide (something)  0.065094        1.0
202


Lookup word:  endured
     id                               meaning     score  isTeacher
0  6228  to continue in the same state : last  0.065094          1
203


Lookup word:  destitute
     id                                       

     id                 meaning     score  isTeacher
0  6301  disposition to do good  0.069505          1
1  3132       kind and generous  0.000000          0
238


Lookup word:  retaliation
     id                                            meaning     score  \
0  6303  an act of retaliating against a previous retal...  0.085126   
1  6302  to return like for like; especially : to get r...  0.080257   

   isTeacher  
0        NaN  
1        1.0  
239


Lookup word:  rendered
     id                                            meaning  score  isTeacher
0  2881  to cause (someone or something) to be in a spe...    0.0        1.0
1  6305  for something that a person, company, etc., ha...    0.0        NaN
2  6304         to melt down; also : to extract by melting    0.0        NaN
240


Lookup word:  timid
     id                                meaning     score  isTeacher
0  6306  lacking in courage or self-confidence  0.065938          1
241


Lookup word:  mediocre
     id        mean

     id                                            meaning  score  isTeacher
0  3230  to avoid being stopped by (something, such as ...    0.0          1
276


Lookup word:  garb
     id                                            meaning     score  \
0  6368                                    fashion, manner  0.117851   
1  6370  clothing or dress, especially of a distinctive...  0.077152   
2  6369               to cover with or as if with clothing  0.000000   

   isTeacher  
0        NaN  
1        1.0  
2        NaN  
277


Lookup word:  sheepish
     id                                            meaning     score  \
0  3057  showing or feeling embarrassment especially be...  0.136083   
1  6371                        resembling a sheep: such as  0.117851   

   isTeacher  
0        1.0  
1        NaN  
278


Lookup word:  Frenzied
     id                                            meaning     score  \
0  6373                              to affect with frenzy  0.082479   
1  3874 

     id                                    meaning     score  isTeacher
0  6490  not comprehending : lacking understanding  0.130189          1
298


Lookup word:  faltering
     id                     meaning  score  isTeacher
0  4810  to move unsteadily : waver    0.0          1
299


Lookup word:  heaving 
     id                                            meaning     score  \
0  6492  an upthrust of ground or pavement caused by fr...  0.080064   
1  6491                                        lift, raise  0.000000   

   isTeacher  
0        NaN  
1        1.0  
300


Lookup word:  escorted
     id                    meaning     score  isTeacher
0  6493  to accompany as an escort  0.516398          1
301


Lookup word:  frowned
     id                                            meaning    score  isTeacher
0  6494  to contract the brow in displeasure or concent...  0.11547          1
302


Lookup word:  twitched
     id                   meaning     score  isTeacher
0  6495  to move

     id                                            meaning     score  \
0  6574  to release or activate by means of a trigger; ...  0.169842   
1  6573       released, initiated, or set off by a trigger  0.087706   
2  6575  a piece (such as a lever) connected with a cat...  0.000000   

   isTeacher  
0        NaN  
1        1.0  
2        NaN  
334


Lookup word:  futility
     id                               meaning  score  isTeacher
0  4490  having no result or effect : useless    0.0          1
335


Lookup word:  glee
     id                                            meaning  score  isTeacher
0  6578             exultant high-spirited joy : merriment    0.0        1.0
1  6579  a chorus organized for singing usually short p...    0.0        NaN
336


Lookup word:  tincture
     id                                            meaning     score  \
0  6580  a solution of a medicinal substance in an alco...  0.074536   
1  6581              to tint or stain with a color : tinge  0.000

     id                     meaning     score  isTeacher
0  4687  to make indirect reference  0.075378          1
385


Lookup word:  indiscreetly
     id                   meaning  score  isTeacher
0  4826  not discreet : imprudent    0.0          1
386


Lookup word:  designated
     id                                            meaning     score  \
0  5785                              designated in advance  0.150329   
1  1101  to officially choose (someone or something) to...  0.090652   
2  5784  a baseball player designated at the start of t...  0.061372   
3  5782  to indicate and set apart for a specific purpo...  0.046029   
4  5783  a person chosen to abstain from intoxicants (s...  0.037582   
5  1102  a person who agrees not to drink alcohol on a ...  0.000000   
6  1103  a player who is chosen at the beginning of a g...  0.000000   

   isTeacher  
0        NaN  
1        0.0  
2        NaN  
3        1.0  
4        NaN  
5        NaN  
6        NaN  
387


Lookup word:  t

1        1.0  
423


Lookup word:  gaunt
     id                                            meaning  score  isTeacher
0  4513  excessively thin and angular often as a result...    0.0        1.0
1  6678                       excessively thin and angular    0.0        NaN
2  6679  1340—1399 Duke of Lancaster; son of Edward III...    0.0        NaN
3  4514  1340-1399 Duke of Lancaster; son of Edward III...    0.0        NaN
424


Lookup word:  rummage
     id                                            meaning     score  \
0  4516  to make a thorough search especially by moving...  0.150756   
1  4517                a confused miscellaneous collection  0.000000   
2  4518  a sale of miscellaneous and often donated arti...  0.000000   

   isTeacher  
0        1.0  
1        NaN  
2        NaN  
425


Lookup word:  intermediary
     id              meaning     score  isTeacher
0  4708         intermediate  0.150756        NaN
1  4709  mediator go-between  0.000000        1.0
426


Lookup w

     id                                            meaning  score  isTeacher
0  6760  a usually elected public officer who is typica...    0.0          1
471


Lookup word:  autopsy
     id                                            meaning     score  \
0  4552  an examination of a dead body especially to fi...  0.083333   
1  6761  an examination of a body after death to determ...  0.057166   

   isTeacher  
0        1.0  
1        NaN  
472


Lookup word:  dispatched
     id                                            meaning     score  \
0  6061  to send off or away with promptness or speed; ...  0.035533   
1  4722  to send away promptly or rapidly to a particul...  0.000000   

   isTeacher  
0        NaN  
1        1.0  
473


Lookup word:  martyr
     id                                            meaning     score  \
0  3047  a person who is killed or who suffers greatly ...  0.062994   
1  6167  a person who voluntarily suffers death as the ...  0.059761   
2  6168  to put to d

     id                                            meaning     score  \
0  6751  an abode of souls that are according to Roman ...  0.072169   
1  6755  If you say that someone or something is in lim...  0.069338   
2  6752  a dance or contest that involves bending over ...  0.069338   
3  6753  a tree (Bursera simaruba of the family Bursera...  0.055902   
4  6754  in a forgotten or ignored place, state, or sit...  0.000000   

   isTeacher  
0        NaN  
1        1.0  
2        NaN  
3        NaN  
4        NaN  
513


Lookup word:  aggressive
     id                                            meaning     score  \
0  3833  ready and willing to fight, argue, etc. : feel...  0.083333   

   isTeacher  
0          1  
514


Lookup word:  embedded
     id                                            meaning     score  \
0  6847         to enclose closely in or as if in a matrix  0.081111   
1  1469  to place or set (something) firmly in somethin...  0.046829   
2  6846  occurring as a gr

     id                                    meaning  score  isTeacher
0  4585  to attract and win over : charm fascinate    0.0          1
551


Lookup word:  unassailable
     id                                            meaning     score  \
0  6924  not assailable : not liable to doubt, attack, ...  0.050252   

   isTeacher  
0          1  
552


Lookup word:  dignity
     id                                            meaning     score  \
0  2888  a way of appearing or behaving that suggests s...  0.087039   
1  6094  suitable for someone who is considered less im...  0.050252   
2  6093  formal reserve or seriousness of manner, appea...  0.046524   

   isTeacher  
0        1.0  
1        NaN  
2        NaN  
553


Lookup word:  emancipation
     id                                            meaning     score  \
0  5548                 the act or process of emancipating  0.577350   
1  5549         the act or process of emancipating oneself  0.516398   
2  2794  to free (someone) f

      id                                            meaning     score  \
0   3481  something that can be chosen instead of someth...  0.110506   
1   6153  a proposition or situation offering a choice b...  0.077267   
2   6158  a U.S. federal income tax that was originally ...  0.077267   
3   6161  alternative music that blends elements of conv...  0.073302   
4   6154                                        alt-country  0.070535   
5   6155  usable power (such as heat or electricity) tha...  0.040723   
6   6156  a fuel for internal combustion engines that is...  0.029630   
7   6157  any of various systems of healing or treating ...  0.028796   
8   6159  music that is produced by performers who are o...  0.023512   
9   6160  pop music that has broad appeal but that is pr...  0.019080   
10  3480                    offering or expressing a choice  0.000000   

    isTeacher  
0         1.0  
1         NaN  
2         NaN  
3         NaN  
4         NaN  
5         NaN  
6         N

In [7]:
print("The accuracy score is ", round(avg,4)) 
print("The MRR is ", round(mrr,4)) 

The accuracy score is  0.5467
The MRR is  0.8254


In [8]:
# ranked_meanings.to_json('ranked_meaning_lesk.json')

In [9]:
ai_response =  ranked_meanings.head(1).to_dict(orient='index')
ai_response

{0: {'id': 7014,
  'meaning': 'owing gratitude or recognition to another : beholden',
  'score': 0.0,
  'isTeacher': 1}}