# **Word Sense Disambiguation**

Words can have different meanings in different contexts. Sometimes the intended
meaning of a word is hard to understand and leads to miscommunication. If a word has multiple meanings, this is called word
sense ambiguity. While solving syntactic ambiguity is done with part-of-speech (POS)
tagging, solving semantic ambiguity is done with word sense disambiguation (WSD).
The challenge is to semantically separate words by their meaning in context [[1]](#scrollTo=fPge5oRLQwid).

This notebook shows examples of WSD with the ``pywsd`` library.

### Import libraries

#### Install pywsd

``pywsd`` is a Python library that provides WSD functions as well as several variations of the Lesk algorithm [[1]](#scrollTo=fPge5oRLQwid). For more details about ``pywsd``, please refer to [[2]](https://pypi.org/project/pywsd/).



In [2]:
# Install pywsd
!pip install pywsd  

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pywsd
  Downloading pywsd-1.2.4.tar.gz (26.8 MB)
[K     |████████████████████████████████| 26.8 MB 1.1 MB/s 
Collecting wn
  Downloading wn-0.9.1-py3-none-any.whl (75 kB)
[K     |████████████████████████████████| 75 kB 4.1 MB/s 
Building wheels for collected packages: pywsd
  Building wheel for pywsd (setup.py) ... [?25l[?25hdone
  Created wheel for pywsd: filename=pywsd-1.2.4-py3-none-any.whl size=26940436 sha256=82fb480a489a5ebe5593773cdc2fe97c0068bbfc38fbc3a324b3fd7254dc59db
  Stored in directory: /root/.cache/pip/wheels/56/67/c0/6e6fa8456d1374b393328368316c3b33844cb4043bd225bc66
Successfully built pywsd
Installing collected packages: wn, pywsd
Successfully installed pywsd-1.2.4 wn-0.9.1


#### Install ``wn``
``wn`` is a new Python library for working with wordnets. Unlike previous libraries, ``wn`` is built from the beginning to accommodate multiple wordnets (for multiple languages or multiple versions of the same wordnet) while retaining the ability to query and traverse them independently. For more detail about the ``wn`` library, please refer to [[3]](https://pypi.org/project/wn/) and [[4]](https://aclanthology.org/2021.gwc-1.12/).



In [4]:
# Install wn
!pip install wn==0.0.22

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wn==0.0.22
  Downloading wn-0.0.22.tar.gz (31.5 MB)
[K     |████████████████████████████████| 31.5 MB 1.1 MB/s 
[?25hBuilding wheels for collected packages: wn
  Building wheel for wn (setup.py) ... [?25l[?25hdone
  Created wheel for wn: filename=wn-0.0.22-py3-none-any.whl size=31618484 sha256=dd4dddd129974f1683520f474e625f021e87ccc440ebc0edbf7a3276804b5608
  Stored in directory: /root/.cache/pip/wheels/3d/0d/59/4b7902879d8cbad9bb73aaf0cc0a051edc1b18da983889c412
Successfully built wn
Installing collected packages: wn
  Attempting uninstall: wn
    Found existing installation: wn 0.9.1
    Uninstalling wn-0.9.1:
      Successfully uninstalled wn-0.9.1
Successfully installed wn-0.0.22


#### Import ``nltk`` and ``wordnet``

``nltk``(Natural Language Toolkit) is an open source Python library for natural language processing. For more details about ``nltk``, please refer to [[5]](https://www.nltk.org/api/nltk.html#nltk.wsd.lesk).

``wordnet`` is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms, each expressing a distinct concept. Synonyms are interlinked by means of conceptual-semantic and lexical relations [[6]](http://www.nltk.org/howto/wsd.html). 


In [7]:
# Import the nltk module
import nltk

# Download "wordnet" package by using the nltk module
nltk.download('wordnet')

# The module "averaged_perceptron_tagger" is used for POS tagging
nltk.download('averaged_perceptron_tagger')

# The module "punkt" is used for tokenizing sentences 
nltk.download('punkt')

# Download 'omw-1.4' to use Multilingual Wordnet Data from OMW with newer Wordnet versions (December 2021 release)
nltk.download('omw-1.4')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


True

#### Import ``simple_lesk``

The ``lesk`` algorithm is an example of a knowledge-based method and is based on contextual overlap of dictionary definitions. The approach is based on the assumption that words used together are also related to each other [[1]](#scrollTo=fPge5oRLQwid).

In [8]:
# Import the simple_lesk algorithm
from pywsd.lesk import simple_lesk  

### WSD application examples

#### Bank

In [9]:
# Create a sample text which contains two sentences
text1 = ['I went to the bank to deposit my money', 'The river bank was full of dead fishes']

# Analyze the first sentence and print the definition of the word "bank"
print( "=============== Analyze sentence 1 =================\n")
print ("Context-1:", text1[0])  
answer1 = simple_lesk(text1[0],'bank')  
print ("Sense:", answer1)  
print ("Definition: ", answer1.definition())  

# Analyze the second sentence and print the definition of the word "bank"
print( "\n\n=============== Analyze sentence 2 =================\n")
print ("Context-2:", text1[1])  
answer2 = simple_lesk(text1[1],'bank')  
print ("Sense:", answer2)  
print ("Definition: ", answer2.definition())  

# For a general overview, print all definitions of the word "bank"
print( "\n\n=============== All definitions of the word \'bank\'===============\n")
for s in wn.synsets('bank'):
    print('\t', s, s.definition())


Context-1: I went to the bank to deposit my money
Sense: Synset('depository_financial_institution.n.01')
Definition:  a financial institution that accepts deposits and channels the money into lending activities



Context-2: The river bank was full of dead fishes
Sense: Synset('bank.n.01')
Definition:  sloping land (especially the slope beside a body of water)



	 Synset('bank.n.01') sloping land (especially the slope beside a body of water)
	 Synset('depository_financial_institution.n.01') a financial institution that accepts deposits and channels the money into lending activities
	 Synset('bank.n.03') a long ridge or pile
	 Synset('bank.n.04') an arrangement of similar objects in a row or in tiers
	 Synset('bank.n.05') a supply or stock held in reserve for future use (especially in emergencies)
	 Synset('bank.n.06') the funds held by a gambling house or the dealer in some gambling games
	 Synset('bank.n.07') a slope in the turn of a road or track; the outside is higher than the ins

#### Plant

In [10]:
# Create a sample text which contains two sentences
text2 = ['The workers at the industrial plant were overworked.', 'The plant was no longer bearing flowers.']

# Analyze the first sentence and print the definition of the word "plant"
print( "=============== Analyze sentence 1 =================\n")
print ("Context-1:", text2[0])  
answer1 = simple_lesk(text2[0],'plant')  
print ("Sense:", answer1)  
print ("Definition: ", answer1.definition())  

# Analyze the second sentence and print the definition of the word "plant"
print( "\n\n=============== Analyze sentence 2 =================\n")
print ("Context-2:", text2[1])  
answer2 = simple_lesk(text2[1],'plant')  
print ("Sense:", answer2)  
print ("Definition: ", answer2.definition())  

# For a general overview, print all definitions of the word "plant"
print( "\n\n=============== All definitions of the word \'plant\'===============\n")
for s in wn.synsets('plant'):
    print('\t', s, s.definition())


Context-1: The workers at the industrial plant were overworked.
Sense: Synset('plant.n.01')
Definition:  buildings for carrying on industrial labor



Context-2: The plant was no longer bearing flowers.
Sense: Synset('plant.v.01')
Definition:  put or set (seeds, seedlings, or plants) into the ground



	 Synset('plant.n.01') buildings for carrying on industrial labor
	 Synset('plant.n.02') (botany) a living organism lacking the power of locomotion
	 Synset('plant.n.03') an actor situated in the audience whose acting is rehearsed but seems spontaneous to the audience
	 Synset('plant.n.04') something planted secretly for discovery by another
	 Synset('plant.v.01') put or set (seeds, seedlings, or plants) into the ground
	 Synset('implant.v.01') fix or set securely or deeply
	 Synset('establish.v.02') set up or lay the groundwork for
	 Synset('plant.v.04') place into a river
	 Synset('plant.v.05') place something or someone in a certain position in order to secretly observe or deceive
	 

#### Fair

In [11]:
# Create a sample text which contains two sentences
text3 = ['Everyone needs to be given a fair chance in the competition.', 'The annual fair in our city is next weekend.']

# Analyze the first sentence and print the definition of the word "fair"
print( "=============== Analyze sentence 1 =================\n")
print ("Context-1:", text3[0])  
answer1 = simple_lesk(text3[0],'fair')  
print ("Sense:", answer1)  
print ("Definition : ", answer1.definition())  

# Analyze the second sentence and print the definition of the word "fair"
print( "\n\n=============== Analyze sentence 2 =================\n")
print ("Context-2:", text3[1])  
answer2 = simple_lesk(text3[1],'fair', 'n')  
print ("Sense:", answer2)  
print ("Definition : ", answer2.definition())  

# For a general overview, print all definitions of the word "fair"
print( "\n\n=============== All definitions of the word \'fair\'===============\n")
for s in wn.synsets('fair'):
    print('\t', s, s.definition())


Context-1: Everyone needs to be given a fair chance in the competition.
Sense: Synset('honest.s.07')
Definition :  gained or earned without cheating or stealing



Context-2: The annual fair in our city is next weekend.
Sense: Synset('fair.n.03')
Definition :  a competitive exhibition of farm products



	 Synset('carnival.n.03') a traveling show; having sideshows and rides and games of skill etc.
	 Synset('fair.n.02') gathering of producers to promote business
	 Synset('fair.n.03') a competitive exhibition of farm products
	 Synset('bazaar.n.03') a sale of miscellany; often for charity
	 Synset('fair.v.01') join so that the external surfaces blend smoothly
	 Synset('fair.a.01') free from favoritism or self-interest or bias or deception; conforming with established standards or rules
	 Synset('fair.a.04') (of a baseball) hit between the foul lines
	 Synset('fair.s.02') not excessive or extreme
	 Synset('bonny.s.01') very pleasing to the eye
	 Synset('average.s.03') lacking exceptional

# **References**

- [1] Course Book "NLP and Computer Vision" (DLMAINLPCV01)
- [2] https://pypi.org/project/pywsd/
- [3] https://pypi.org/project/wn/
- [4] https://aclanthology.org/2021.gwc-1.12/
- [5] https://www.nltk.org/api/nltk.html#nltk.wsd.lesk
- [6] http://www.nltk.org/howto/wsd.html



Copyright © 2022 IU International University of Applied Sciences