## Word Sense Disambiguation

Word Sense Disambiguation is an important method of NLP by which the meaning of a word is determined, which is used in a particular context. NLP systems often face the challenge of properly identifying words, and determining the specific usage of a word in a particular sentence has many applications.

Word Sense Disambiguation basically solves the ambiguity that arises in determining the meaning of the same word used in different situations.

Reference: https://www.nltk.org/howto/wsd.html


**Lesk Algorithm**

`Lesk Algorithm` is a classical Word Sense Disambiguation algorithm introduced by `Michael E. Lesk` in `1986`.

The Lesk algorithm is based on the idea that words in a given region of the text will have a similar meaning. In the Simplified Lesk Algorithm, the correct meaning of each word context is found by getting the sense which overlaps the most among the given context and its dictionary meaning.

Read More about the Lesk Algorithm here.

We can use `NLTK` to implement Lesk in Python.

In [2]:
import nltk
from nltk.wsd import lesk
from nltk.tokenize import word_tokenize

Let us now proceed with some examples.

In [3]:
nltk.download('punkt')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...


True

In [4]:
a1= lesk(word_tokenize('This device is used to jam the signal'),'jam')
print(a1)
print(a1.definition())

a2 = lesk(word_tokenize('I am stuck in a traffic jam'),'jam')
print(a2,a2.definition())

Synset('jamming.n.01')
deliberate radiation or reflection of electromagnetic energy for the purpose of disrupting enemy use of electronic devices or systems
Synset('jam.v.05') get stuck and immobilized


The definitions for “bank” are:

In [5]:
from nltk.corpus import wordnet as wn
for ss in wn.synsets('bank'):
    print(ss, ss.definition())

Synset('bank.n.01') sloping land (especially the slope beside a body of water)
Synset('depository_financial_institution.n.01') a financial institution that accepts deposits and channels the money into lending activities
Synset('bank.n.03') a long ridge or pile
Synset('bank.n.04') an arrangement of similar objects in a row or in tiers
Synset('bank.n.05') a supply or stock held in reserve for future use (especially in emergencies)
Synset('bank.n.06') the funds held by a gambling house or the dealer in some gambling games
Synset('bank.n.07') a slope in the turn of a road or track; the outside is higher than the inside in order to reduce the effects of centrifugal force
Synset('savings_bank.n.02') a container (usually with a slot in the top) for keeping money at home
Synset('bank.n.09') a building in which the business of banking transacted
Synset('bank.n.10') a flight maneuver; aircraft tips laterally about its longitudinal axis (especially in turning)
Synset('bank.v.01') tip laterally
Sy

Test disambiguation of POS tagged able.

In [6]:
[(s, s.pos()) for s in wn.synsets('able')]

[(Synset('able.a.01'), 'a'),
 (Synset('able.s.02'), 's'),
 (Synset('able.s.03'), 's'),
 (Synset('able.s.04'), 's')]

In [8]:
sentence = "I love reading books on coding.".split()
lesk(sentence, 'book').definition()

'a number of sheets (ticket or stamps etc.) bound together on one edge'

In [9]:
sentence = "I love reading books on coding.".split()
lesk(sentence, 'book').definition()

'a number of sheets (ticket or stamps etc.) bound together on one edge'

In [10]:
sentence = "The table was already booked by someone else.".split()
lesk(sentence, 'book').definition()

'arrange for and reserve (something for someone else) in advance'

In [11]:
for ss in wn.synsets('bat'):
    print(ss, ss.definition())

Synset('bat.n.01') nocturnal mouselike mammal with forelimbs modified to form membranous wings and anatomical adaptations for echolocation by which they navigate
Synset('bat.n.02') (baseball) a turn trying to get a hit
Synset('squash_racket.n.01') a small racket with a long handle used for playing squash
Synset('cricket_bat.n.01') the club used in playing cricket
Synset('bat.n.05') a club used for hitting a ball in various games
Synset('bat.v.01') strike with, or as if with a baseball bat
Synset('bat.v.02') wink briefly
Synset('bat.v.03') have a turn at bat
Synset('bat.v.04') use a bat
Synset('cream.v.02') beat thoroughly and conclusively in a competition or fight


In [12]:
sent = 'people should be able to marry a person of their choice'.split()


In [13]:
seq1 = 'My mother prepares very yummy jam.'
seq2 = 'Signal jammers are the reason for no signal.'