# Word sense disambiguation

## Lexical ambiguity

Let's refresh our knowlegde of the concept of **ambiguity**. 
Ambiguity can be
- semantic (category  - e.g. noun or verb?)
- syntactic (when a sentence has more than one phrase structure tree that can be assigned to it)

![picture](https://www.thoughtco.com/thmb/3DSAgVoW-eEc3f3pwxPnjYT_zWw=/1500x1000/filters:fill(auto,1)/ambiguity-language-1692388_FINAL-dd68c7d1dd374507aa633d27539f0e62.png)
(Image source: https://www.thoughtco.com/ambiguity-language-1692388)

![picture](https://i.pinimg.com/564x/af/77/fe/af77fe576a152e4bcbf9e24762a70886.jpg)
(Image source: http://criticalthinkingexamples.blogspot.com/2011/03/semantic-ambiguity-ii.html)

![picture] (http://1.bp.blogspot.com/-nbpn8LI40ko/Tocbqxw4DTI/AAAAAAAAAhY/aYP-9EVfYkE/s400/Ambiguity.jpg)
(Image source: http://criticalthinkingexamples.blogspot.com/2011/10/semantic-ambiguity-iii.html)

![picture](https://jennykellerford.files.wordpress.com/2011/10/students-cook-serve-grandparents.jpg)
(Image source: https://jennykellerford.wordpress.com/2011/10/02/students-cook-and-serve-grandparents-and-other-misplaced-modifiers)

CURIOSITY MINUTE: Have you ever heard about **Oxford comma**? It may help you avoid ambiguity when you write in English!

![picture] (https://mereinkling.files.wordpress.com/2019/09/commas.png)
(Image source: https://mereinkling.net/2019/09/17/peculiarities-of-punctuation/)

In this notebook we will focus only on **semantic ambiguity**.
It can be of two types:

- **polysemy** (different meanings of a word are somewhat related)

(Crane: a) a bird with a long neck b) a type of construction equipment which looks like it has a long neck)

- **homonymy** (different meanings of a word are not related)

(peer: a) person belonging to the same group in age and status b) look searchingly)

In dictionary, polysemous items are usually found in a single dictionary entry listing all meanings of the words while homonyms are all represented by several separate entries.

**Glosses** - definitions for each sense of a word in a dictionary.

## Relations between word senses

**Synonyms** (words with (nearly) identical senses): couch/sofa

Note: it is a relationship between word senses and NOT between words.
Why?

Example:
- "big plane and small plane": in this context, "big" can be replaced by "large" because these two words both have a sense of "significant size", and in this sense you can use either "big" or "large" to describe a plane.
- "she is my big sister" - here "big" is used in the sense "older" and cannot be replace by "large" because "large" does not have such a sense. 

**Antonyms** (words with opposite meaning: binary opposition/opposite ends of a scale) (Long/short, Black/white)

**Hyponyms (subordinate)** (one word is a specific case (subclass) of the other) (car/vehicle, mango/fruit).

**Hypernym (superordinate)** (one word denotes a class to which the other word belongs) (animal is a hypernym of dog, fruit is a hypernym of mango, and animal is a hypernym of dog). 

**Meronymy** - part-whole relationship (foot/leg)


**Metonymy** - using one aspect of a concept or entity to refer to other aspects of the entity or to the entity itself ("I love Charles Dickens" (meaning "I like books by Charles Dickens"))


## WordNet

Enough linguistics! Let's move on to NLP.

**WordNet** (https://wordnet.princeton.edu/) is the most commonly used **resource for sense relations** in English and many other languages.
It is similar to a traditional **thesaurus** (dictionary of synonyms) but has a richer structure.
The set of near-synonyms for a WordNet sense is called a **synset (for synonym
set)**. 

Let's play with it a little. Enter your favorite word (do you have one?) and check out its entries in WordNet. http://wordnetweb.princeton.edu/perl/webwn

WordNet is included in NLTK. Below you will find many code examples to analyse word relations. Feel free to replace example words with your own to check your intuitions and have some fun. 

In [None]:
!pip install nltk
import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet as wn

Some words have many senses:

In [None]:
wn.synsets("bank")

Some words only have one sense:

In [None]:
wn.synsets('siege')

You can restrict search by POS:

In [None]:
wn.synsets('dog', pos='v')


**Lemmas** are words in a synset. 

In [None]:
wn.synset('car.n.01').lemmas() 

We can get a **list of lemmas** for a word sense:

In [None]:
wn.synset('car.n.01').lemma_names()


or for all senses of a word at once:

In [None]:
for synset in wn.synsets('car'):
    print(synset.lemma_names())

NOTE: To explore WordNet in languages other than English, we can use [ISO-639 language codes](http://www.loc.gov/standards/iso639-2/php/code_list.php). 


In [None]:
nltk.download('omw')
sorted(wn.langs())

Let's try French (and then some other languages if you want):

In [None]:
wn.synset('dog.n.01').lemma_names('fra')


We can get a sense **definition**:

In [None]:
wn.synset('moose.n.01').definition() 

and **examples**:

In [None]:
wn.synset('deer.n.01').examples()

**CODE IT (1):** define a function that take a word as an input and returns a definiton and examples for each of its word senses. Try it on a word with many senses and/or with part of speech ambiguity (to make it more interesting).

In [None]:
# your code

### WordNet hierarchy

There is a certain hierarchy on conepts in WordNet. Some concepts are very general, such as Entity, State, Event — these are called **unique beginners** or **root synsets**. 

![picture](https://www.researchgate.net/profile/Bastian_Entrup/publication/321671739/figure/tbl3/AS:668725552361484@1536448004654/List-of-25-unique-beginners-in-the-noun-set-of-WordNet.png)
(Source: https://www.researchgate.net/figure/List-of-25-unique-beginners-in-the-noun-set-of-WordNet_tbl3_321671739)

Others, such as gas guzzler and hatchback, are much more specific. A small portion of a concept hierarchy is illustrated below:

![picture](https://www.nltk.org/images/wordnet-hierarchy.png)
(Image (and text above) source: https://www.nltk.org/book/ch02.html)
*Fragment of WordNet Concept Hierarchy: nodes correspond to synsets; edges indicate the hypernym/hyponym relation, i.e. the relation between superordinate and subordinate concepts*.



Let's use WordNet to navigate between concepts:



In [None]:
wn.synsets('car')

In [None]:
# get a synset for the first sense of the word "car"
motorcar = wn.synset('car.n.01') 
# get hyponyms (more narrow terms belonging to the more broad concept of a car)
types_of_motorcar = motorcar.hyponyms()
# what types of cars are there in WordNet?
sorted(lemma.name() for synset in types_of_motorcar for lemma in synset.lemmas())

**CODE IT (2)**: edit the code below to find hyponyms of a word sense of your choice. Be creative!

In [None]:
# your code

We can also explore the hierarchy by checking out **hypernyms** (superordinates, or more general "umbrella terms").


In [None]:
motorcar.hypernyms()

We can get the **most general hypernyms (or root hypernyms)** of a synset as follows:


In [None]:
motorcar.root_hypernyms()


Hypernyms and hyponyms are called **lexical relations**  -  they relate one synset to another. These two relations navigate up and down the "is-a" hierarchy. 

Another way to navigate WordNet is from items to their components (**meronyms**) or to the things they are contained in (**holonyms**).

A **meronym** represents a component of a larger whole. It is a vast relationship, and NLTK divides the meronym category into part-representing whole (part_meronyms()) and substance-representing whole (substance_meronyms()).

In [None]:
# what parts does a tree consist of?
wn.synset('tree.n.01').part_meronyms()

In [None]:
# what substance does ice consist of?
wn.synset('ice.n.01').substance_meronyms()

In [None]:
wn.synset('kitchen.n.01').part_holonyms() 


In [None]:
wn.synset('dog.n.01').member_holonyms() 


In [None]:
wn.synset('tree.n.01').member_holonyms()


In [None]:
wn.synset('ice.n.01').substance_holonyms()

An **entailment** is an implication. Looking implies seeing. Buying implies choosing and paying.

Take the following sentence: "She walked to her bed and in a few minutes she was snoring loudly”. What exactly is implied by "snoring" here? Is it important that she was snoring, or is it just that she was sleeping soundly? Entailment analysis can help investigate it.


In [None]:
wn.synset('snore.v.01').entailments()

Some other examples of verb entailment relationship:

In [None]:
wn.synset('eat.v.01').entailments()


In [None]:
wn.synset('buy.v.01').entailments()


Some lexical relationships hold between lemmas, e.g., **antonymy**:



In [None]:
wn.lemma('supply.n.02.supply').antonyms()


In [None]:
wn.lemma('horizontal.a.01.horizontal').antonyms()


### Semantic similarity in WordNet

Remember we measured semantic similarity with word embeddings? Those measurements were in the context of a specific corpus. WordNet allows us to measure semantic similarity between words based on their definitions and relations as defined in the WordNet itself.
It is useful to know which words are semantially related to other words - for example, for indexing a corpus of texts (searching for information about dinosaurs? You may be interested in reptiles or Pterodactylus)

Let's explore another example: **Right whales and minke whales** are species of large baleen whales. The **killer whale or orca**  is whale belonging to the oceanic dolphin family.

In [None]:
right_whale = wn.synset('right_whale.n.01')
orca = wn.synset('orca.n.01')
minke_whale = wn.synset('minke_whale.n.01')
tortoise = wn.synset('tortoise.n.01')
novel = wn.synset('novel.n.01')

**lowest_common_hypernyms()** locates the lowest hypernym (the **broadest sense**) that is shared by two given words.



Reminder: **hypernym**: a word with a broad meaning constituting a category into which words with more specific meanings fall (superordinate).

In [None]:
right_whale.lowest_common_hypernyms(minke_whale)


Indeed, right whale and minke are both baleen whales!

In [None]:
right_whale.lowest_common_hypernyms(orca)


Right whale and orca are both whales (but orca is not a baleen whale, so what they have in common is only being a (generic) whale).

In [None]:
right_whale.lowest_common_hypernyms(tortoise)


Right whale and tortoise are both vertebrate.

In [None]:
right_whale.lowest_common_hypernyms(novel)


...and a right whale and a novel - what do they have in common? Nothing...excep that they are both entities (and not 'feelings' or 'shapes' - return to **unique beginners** if you have forgotten everything about them!)

**CODE IT (3):** explore the relationship between three words senses of your choice (choose senses relate to each other in some interesting ways - for example, animal types from the same - and a different -  family etc.)

In [None]:
#insert your code here

"Whale" is very specific (and "baleen whale" is very very very specific!), and "vertebrate" is more general. And "entity" is very very very general. 
We can quantify this **concept of generality** by defining the depth of each synset:



In [None]:
wn.synset('baleen_whale.n.01').min_depth()


In [None]:
wn.synset('whale.n.02').min_depth()


In [None]:
wn.synset('vertebrate.n.01').min_depth()


In [None]:
wn.synset('entity.n.01').min_depth()

**path_similarity** assigns a score (0-1) based on the shortest path that connects the concepts in the hypernym hierarchy.
-1 means that a path cannot be found.
You will receive 1 when comparing a synset with itself. 
Although the numbers themselved do not mean anything really, they decrease as we move away from the semantic space of sea creatures to inanimate objects.




In [None]:
right_whale.path_similarity(minke_whale)


In [None]:
right_whale.path_similarity(orca)


In [None]:
right_whale.path_similarity(tortoise)


In [None]:
right_whale.path_similarity(novel)


**CODE IT (4) **: Use one of the predefined similarity measures to score the similarity of each of the following pairs of words. Rank the pairs in order of decreasing similarity. How close is your ranking to the order given here, an order that was established experimentally by (Miller & Charles, 1998): car-automobile, gem-jewel, journey-voyage, boy-lad, coast-shore, asylum-madhouse, magician-wizard, midday-noon, furnace-stove, food-fruit, bird-cock, bird-crane, tool-implement, brother-monk, lad-brother, crane-implement, journey-car, monk-oracle, cemetery-woodland, food-rooster, coast-hill, forest-graveyard, shore-woodland, monk-slave, coast-forest, lad-wizard, chord-smile, glass-magician, rooster-voyage, noon-string.


In [None]:
# your code

## Word sense disambiguation

Somtimes categonal ambiguity ("can" - is it a verb or a noun?) can be
resolved using syntactic aspects of it use ("I can do it" - "can" here is 100% a verb).



** Word sense disambiguation (WSD)** - selecting the correct sense for a word. 
WSD algorithms take as input a word in context and a fixed inventory of potential word senses and outputs the correct word sense in context.

How does it work in general? 
When we have a small pre-selected set of target words and an inventory of senses for each word, in such a **lexical sample task** when the set of words and the
set of senses are small, we can use simple supervised classification approaches.



However, we often have to disambiguate **all words** in some text. 
In this **all-words task**, the input is an entire text and a lexicon with an inventory of senses for each entry. 
We then need to disambiguate every content word in the text. This is somewhat simiar to part-of-speech tagging (though the number of tags is much bigger).


**Supervised all-word disambiguation tasks** are usually trained from a **semantic
concordance** -  a corpus in which each open-class word in each sentence is labeled
with its word sense from a specific dictionary or thesaurus, most often WordNet.

The **SemCor corpus** is a subset of the Brown Corpus where 
words were manually tagged with WordNet senses. There are other similar corpora too. 

Given each noun, verb, adjective, or adverb word in the hand-labeled test set (say
fruit), the SemCor-based WSD task is to choose the correct sense from the possible
senses in WordNet. For fruit this would mean choosing between the correct answer
fruit1 (the ripened reproductive body of a seed plant), and the other two senses fruit2
(yield; an amount of a product) and fruit3
(the consequence of some effort or action).


![picture] (https://www.researchgate.net/profile/Devendra_Chaplot/publication/322328811/figure/fig2/AS:631613809496087@1527599875540/An-example-of-the-all-word-WSD-task-Content-words-and-their-possible-senses-are-labeled.png)
(Image source: https://www.researchgate.net/figure/An-example-of-the-all-word-WSD-task-Content-words-and-their-possible-senses-are-labeled_fig2_322328811)

A surprisingly strong baseline is simply to choose the most frequent sense for most frequent
sense
each word from the senses in a labeled corpus (For WordNet, this
corresponds to the first sense). A second heuristic, called **one sense per discourse** is based on the work of one sense per
discourse who noticed that a word appearing multiple times in a text or
discourse often appears with the same sense.

## Lesk Algorithm

Generating sense labeled corpora like SemCor is difficult and expensive. An
alternative class of WSD algorithms - **knowledge-based algorithms**, rely solely on
WordNet or other similar resources and do not need labeled data. 

**Lesk algorithm**  - the oldest and most powerful knowledge-based WSD
method, and is a useful baseline. It basically chooses the sense whose dictionary gloss or definition shares the most words with the target
word’s neighborhood. It assumes that words in a given neighborhood (section of text) tend to share a common topic. 
A simplified version of the Lesk algorithm is to compare the dictionary definition of an ambiguous word with the terms contained in its neighbourhood.


In [None]:
from nltk.wsd import lesk

NLTK's Lesk arguments:
- context_sentence (a list of words)
- ambiguous word (word requiring WSD)
- pos - specified part of speech (default = None)

The POS tagset here includes 'a' = adjective/adverbs, 's'=satelite adjective, 'n' = nouns and 'v' = verbs.


- synsets - possible synsets of the ambiguous word (default=None)

It returns a synset for an ambiguous word in a context. Sometimes it will fail...



In [None]:
lesk(['I', 'went', 'to', 'the', 'bank', 'to', 'deposit', 'money', '.'], 'bank', 'n')

In [None]:
wn.synset('savings_bank.n.02').definition()

In [None]:
lesk(['I','have', 'put','an', 'ice' 'cream','cone', 'in','the', 'fridge'], 'cone')

In [None]:
wn.synset('cone.n.04').definition()

**CODE IT (5)** Consider a word which can have different meanings in a language you know well, using the implementation of Lesk algorithm, use example sentences to derive the several definitations of that word.

For instance:

**terminal**

    1.  a point on an electrical device at which electric current enters or leaves.

    2.  where transport vehicles load or unload passengers or goods.
    
    3.  an input-output device providing access to a computer.


In [None]:
#insert your code here

## WSD with Contextual Embeddings

Do you remember what Word Embeddings are?

In [None]:
!pip install gensim
import gensim.downloader

# Show all available models in gensim-data

print(list(gensim.downloader.info()['models'].keys()))
['fasttext-wiki-news-subwords-300',
 'conceptnet-numberbatch-17-06-300',
 'word2vec-ruscorpora-300',
 'word2vec-google-news-300',
 'glove-wiki-gigaword-50',
 'glove-wiki-gigaword-100',
 'glove-wiki-gigaword-200',
 'glove-wiki-gigaword-300',
 'glove-twitter-25',
 'glove-twitter-50',
 'glove-twitter-100',
 'glove-twitter-200',
 '__testing_word2vec-matrix-synopsis']


# Download the "glove-twitter-25" embeddings

model = gensim.downloader.load('glove-twitter-25')


In [None]:
model['mouse'] 

A word embedding is a numerical representation of a word learnt from huge amounts of data.

Each word in a vector space language model is represented by a single vector.

**But** what about **homonyms**? 

Consider the word "mouse": 

What if we had a word embedding for each of the different meanings of **mouse**?
![](http://ai.stanford.edu/blog/assets/img/posts/2020-03-24-contextual/contextual_mouse_transparent_1.png)
![](http://ai.stanford.edu/blog/assets/img/posts/2020-03-24-contextual/contextual_mouse_transparent_2.png)

(Image source: (http://ai.stanford.edu/blog/)


Here is where **Contextualized word embedding models** enter in NLP world.
![Elmo](https://media.giphy.com/media/yr7n0u3qzO9nG/giphy.gif)![BertUrl](https://media.giphy.com/media/umMYB9u0rpJyE/giphy.gif)

[ELMO](https://arxiv.org/pdf/1802.05365.pdf)(Embeddings from Language Models) and [BERT](https://arxiv.org/pdf/1810.04805.pdf)(Bidirectional Transformers for Language Understanding) are two of the most famous and most commonly used contextualized word embeddings.

Training these models require huge amounts of data and processing resources(including time and hardware).



In layman's terms contextualized word embeddings map each word to a vector considering the context.
![Elmo-mouse](https://miro.medium.com/max/1000/1*3mxK9XhkOQstPbrRfRT2vQ.png)
(Image source: https://medium.com/@leslie_huang/automatic-extraction-of-word-senses-from-deep-contextualized-word-embeddings-2f09f16e820)


**Homework Exercise** Do a quick research on the task of **textual entailment** in NLP and summarize it using your own words(at least 100 words). 

What are textual entailment relations(entails, contradicts, non-TE)?

Using only wordnet functions such as entails or antonyms, can you suggest a simple algoithm that can detect textual entailment relations? 



