# Word Similarity


A similarity score between two word senses can be extracted from WordNet, where the similarity ranges from 0 (little similarity) to 1 (identity).

In [1]:
from nltk.corpus import wordnet as wn

In [2]:
dog = wn.synset('dog.n.01')
cat = wn.synset('cat.n.01')
dog.path_similarity(cat)

0.2

In [3]:
hit = wn.synset('hit.v.01')
slap = wn.synset('slap.v.01')
wn.path_similarity(hit, slap)

0.14285714285714285

In [4]:
hit = wn.synset('hit.v.01')
strike = wn.synset('strike.v.01')
wn.path_similarity(hit, strike)

0.16666666666666666

The Wu-Palmer similarity metric is based on the depth of the two senses in the taxonomy and that of their most specific common ancestor node. 

In [5]:
wn.wup_similarity(dog, cat)

0.8571428571428571

In [6]:
wn.wup_similarity(hit, slap)

0.25

#  Lesk algorithm

NLTK also has an implementation of the Lesk algorithm. The function returns the synset with the highest number of overlapping words between the context sentence and the definitions in each synset for the target word. You can provide a pos argument as well as the target word.

In [7]:
from nltk.wsd import lesk


In [8]:
# look at the definitions for 'bank'
for ss in wn.synsets('bank'):
    print(ss, ss.definition())

Synset('bank.n.01') sloping land (especially the slope beside a body of water)
Synset('depository_financial_institution.n.01') a financial institution that accepts deposits and channels the money into lending activities
Synset('bank.n.03') a long ridge or pile
Synset('bank.n.04') an arrangement of similar objects in a row or in tiers
Synset('bank.n.05') a supply or stock held in reserve for future use (especially in emergencies)
Synset('bank.n.06') the funds held by a gambling house or the dealer in some gambling games
Synset('bank.n.07') a slope in the turn of a road or track; the outside is higher than the inside in order to reduce the effects of centrifugal force
Synset('savings_bank.n.02') a container (usually with a slot in the top) for keeping money at home
Synset('bank.n.09') a building in which the business of banking transacted
Synset('bank.n.10') a flight maneuver; aircraft tips laterally about its longitudinal axis (especially in turning)
Synset('bank.v.01') tip laterally
Sy

In [9]:
sent = ['I', 'went', 'to', 'the', 'bank', 'to', 'deposit', 'money', '.']
print(lesk(sent, 'bank', 'n'))
print(lesk(sent, 'bank'))

Synset('savings_bank.n.02')
Synset('savings_bank.n.02')


In [10]:
# senses of 'able'
for ss in wn.synsets('able'):
    print(ss, ss.definition())

Synset('able.a.01') (usually followed by `to') having the necessary means or skill or know-how or authority to do something
Synset('able.s.02') have the skills and qualifications to do things well
Synset('able.s.03') having inherent physical or mental ability or capacity
Synset('able.s.04') having a strong healthy body


In [12]:
sent2 = ['You','should','be','able','to','pass','the','next','quiz','.']
print(lesk(sent2, 'able'))


Synset('able.s.02')


In [12]:
# now specify pos
print(lesk(sent2, 'able', pos='a'))

Synset('able.a.01')
