<a href="https://colab.research.google.com/github/pranavsrinivas29/Natural-Language-Processing/blob/main/NLP_Ambiguities.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


1. [Ambiguity and Context](#nlp-ambiguities)
  - [Lexical Ambiguity](#nlp-lex-amb)
  - [Word Sense Disambiguation (WSD)](#nlp-wsd)

# <a name="nlp-ambiguities"></a> 3. Ambiguity and Context
A word/sentence can have different meanings in the language to which it belongs. This is called **Ambiguity**. The example of `"Jaguar"` in page 8 of slide set 1.3 is a case of **Lexical ambiguity**, in which one word of a specific syntactic category can have several meanings. As mentioned in the lecture, the communicated content via language can be interpreted in different ways depending on the **Context**.



## 3.1 <a name="nlp-lex-amb"></a> Lexical Ambiguity
The traditional approach to resolve Lexical Ambiguity is to first build a taxonomy, such as [WordNet](https://wordnet.princeton.edu/), that provides a denotational definition for each word and represents interrelations between words in a hierarchical structure. Based on their definitions, WordNet categorizes words into synonyms. A **Synset** is a set of one or more synonyms. WordNet is organized as hierarchies of Synsets.
Example:[ Lookup WordNet for "star"](http://wordnetweb.princeton.edu/perl/webwn?s=star&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=)

WordNet is integrated into the Python [NLTK library](https://www.nltk.org/) and can be accessed through this library.

In [1]:
#First we have to import nltk and download the wordnet package
import nltk
nltk.download('wordnet')

#Next we import wordnet from nltk
from nltk.corpus import wordnet as wn

[nltk_data] Downloading package wordnet to /root/nltk_data...


In [2]:
#We can look up synsets of a specific word
wn.synsets("star")

[Synset('star.n.01'),
 Synset('ace.n.03'),
 Synset('star.n.03'),
 Synset('star.n.04'),
 Synset('star.n.05'),
 Synset('headliner.n.01'),
 Synset('asterisk.n.01'),
 Synset('star_topology.n.01'),
 Synset('star.v.01'),
 Synset('star.v.02'),
 Synset('star.v.03'),
 Synset('leading.s.01')]

In [5]:
#We can look up the definition of a specific synset
wn.synset("star.v.01").definition()

'feature as the star'

In [6]:
# or look up all definitions, lexical information and synonyms of a specific synset
i=0
for sense in wn.synsets("star"):
  i+=1
  print(i,sense.name(),": ",sense.lexname(),", ",sense.definition(),", ",sense.lemma_names())

1 star.n.01 :  noun.object ,  (astronomy) a celestial body of hot gases that radiates energy derived from thermonuclear reactions in the interior ,  ['star']
2 ace.n.03 :  noun.person ,  someone who is dazzlingly skilled in any field ,  ['ace', 'adept', 'champion', 'sensation', 'maven', 'mavin', 'virtuoso', 'genius', 'hotshot', 'star', 'superstar', 'whiz', 'whizz', 'wizard', 'wiz']
3 star.n.03 :  noun.object ,  any celestial body visible (as a point of light) from the Earth at night ,  ['star']
4 star.n.04 :  noun.person ,  an actor who plays a principal role ,  ['star', 'principal', 'lead']
5 star.n.05 :  noun.shape ,  a plane figure with 5 or more points; often used as an emblem ,  ['star']
6 headliner.n.01 :  noun.person ,  a performer who receives prominent billing ,  ['headliner', 'star']
7 asterisk.n.01 :  noun.communication ,  a star-shaped character * used in printing ,  ['asterisk', 'star']
8 star_topology.n.01 :  noun.cognition ,  the topology of a network whose components ar

In [7]:
#and we can look up hypernyms for a given synset
star = wn.synset("star.n.03")
hypernyms = lambda s:s.hypernyms()
list(star.closure(hypernyms))

[Synset('celestial_body.n.01'),
 Synset('natural_object.n.01'),
 Synset('whole.n.02'),
 Synset('object.n.01'),
 Synset('physical_entity.n.01'),
 Synset('entity.n.01')]

## 3.2 Word Sense Disambiguation (WSD) <a name="nlp-wsd">

Given a word and its context we want to automatically determine which of the WordNet senses is the context-appropriate one:

`"The astronomer loves the star who plays the lead role"`

Let's determine the correct synset for `"star"`.

**Idea**: Look for the maximum overlap of the sentence (context) and the synset definition, e.g., "an actor who **plays** a principal **role**".

(A simple [Lesk algorithm](https://en.wikipedia.org/wiki/Lesk_algorithm), introduced by Michael E. Lesk in 1986)

In [8]:
from nltk.corpus import wordnet as wn

#simplified lesk algorithm
def lesk(sentence, ambiguous_word):
  max_overlaps = 0
  lesk_sense = ""
  #the context is composed of all the single words in the sentence
  context = sentence.split()

  #for all synsets of the ambiguous word
  for sense in wn.synsets(ambiguous_word):
    lesk_dictionary=[]
    #split the definition into words
    lesk_dictionary = sense.definition().split()
    #add the group of lemmas with a similar meaning from the same sense
    lesk_dictionary += sense.lemma_names()
    #count the overlaps between definition and sentence
    overlaps = set(lesk_dictionary).intersection(context)

    if len(overlaps) > max_overlaps:
      #the correct sense is the one with the highest overlap
      lesk_sense = sense
      max_overlaps = len(overlaps)

  return lesk_sense

sentence1 = "The astronomer loves the star who plays the lead role"
ambiguous_word = 'star'

answer1 = lesk(sentence1, ambiguous_word)
print(answer1)
print(answer1.definition())

Synset('star.n.04')
an actor who plays a principal role


In [12]:
#try another sentence
sentence2 = "An actor likes to be starred in the movie"
ambiguous_word = 'star'

answer2 = lesk(sentence2, ambiguous_word)
print(answer2)
print(answer2.definition())

Synset('star.v.02')
be the star in a performance


**More information on how to use WordNet with NLTK:**

  Steven Bird, Ewan Klein, and Edward Loper: [Natural Language Processing with Python
– Analyzing Text with the Natural Language Toolkit](https://www.nltk.org/book/), O'Reilly Media, 2009.
> Chap 2: [Accessing Text Corpora and Lexical Resources](https://www.nltk.org/book/ch02.html), Section 5: WordNet.





---

Picture References:


[1] Car Jaguar Vehicle, OpenClipart-Vectors (pixabay.com), https://www.needpix.com/photo/101821/car-jaguar-vehicle-automobile-transportation


[2] Felis onca, Geoffroy-Saint-Hilaire & Cuvier, Histoire naturelle des mammifères, pl. 170, [public domain],
https://commons.wikimedia.org/wiki/File:Felis_onca_-_1818-1842_-_Print_-_Iconographia_Zoologica_-_Special_Collections_University_of_Amsterdam_-_(white_background).jpg
