# CS 195: Natural Language Processing
## WordNet

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ericmanley/f23-CS195NLP/blob/main/F4_1_WordNet.ipynb)


## References

Sample usage for WordNet: https://www.nltk.org/howto/wordnet.html

WordNet documentation: https://www.nltk.org/api/nltk.corpus.reader.wordnet.html

NLTK Book Chapter 2 (see Section 5): https://www.nltk.org/book/ch02.html

Dive into WordNet with NLTK by Norbert Kozlowski: https://medium.com/@don_khozzy/dive-into-wordnet-with-nltk-b313c480e788

Getting started with nltk-wordnet in Python: https://www.section.io/engineering-education/getting-started-with-nltk-wordnet-in-python/

In [None]:
import sys
!{sys.executable} -m pip install nltk

Defaulting to user installation because normal site-packages is not writeable


In [None]:
#you shouldn't need to do this in Colab, but I had to do it on my own machine
#in order to connect to the nltk service
import nltk
import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context


In [None]:
import nltk
from nltk.corpus import wordnet as wn
nltk.download('wordnet') #only need to do this once

[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/000794593/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

## What is WordNet?

**WordNet** is a *lexical database*.

So what does that mean? Let's ask WordNet.

In [None]:
wn.synsets("lexical")

[Synset('lexical.a.01'), Synset('lexical.a.02')]

A `synset` is a synonym set - The only synonym set it has of the word *lexical* is the word itself, but it has two different *senses* of the word.

`lexical.a.01`
* `lexical` is the word
* `a` is the part of speech - in this case, adjective
* `01` is for the first sense of the word - basically like different entries in a dictionary for the same word

<div>
    <img src="https://github.com/ericmanley/f23-CS195NLP/blob/main/images/lexical_definition.png?raw=1" width=500>
</div>

We can access each of these synonym sets with `synset` (as opposed to `synsets`) and then call various methods on them

In [None]:
print( wn.synset('lexical.a.01').definition() )

of or relating to words


In [None]:
print( wn.synset('lexical.a.02').definition() )

of or relating to dictionaries


## Let's try another word

In [None]:
wn.synsets("bank")

[Synset('bank.n.01'),
 Synset('depository_financial_institution.n.01'),
 Synset('bank.n.03'),
 Synset('bank.n.04'),
 Synset('bank.n.05'),
 Synset('bank.n.06'),
 Synset('bank.n.07'),
 Synset('savings_bank.n.02'),
 Synset('bank.n.09'),
 Synset('bank.n.10'),
 Synset('bank.v.01'),
 Synset('bank.v.02'),
 Synset('bank.v.03'),
 Synset('bank.v.04'),
 Synset('bank.v.05'),
 Synset('deposit.v.02'),
 Synset('bank.v.07'),
 Synset('trust.v.01')]

Let's loop through these and print out some information about each.

In [None]:
senses = wn.synsets("bank")
for sense in senses:
    print( "----------------")
    print( sense.name() )
    print( sense.pos() )
    print( sense.definition() )
    print( sense.examples() )

----------------
bank.n.01
n
sloping land (especially the slope beside a body of water)
['they pulled the canoe up on the bank', 'he sat on the bank of the river and watched the currents']
----------------
depository_financial_institution.n.01
n
a financial institution that accepts deposits and channels the money into lending activities
['he cashed a check at the bank', 'that bank holds the mortgage on my home']
----------------
bank.n.03
n
a long ridge or pile
['a huge bank of earth']
----------------
bank.n.04
n
an arrangement of similar objects in a row or in tiers
['he operated a bank of switches']
----------------
bank.n.05
n
a supply or stock held in reserve for future use (especially in emergencies)
[]
----------------
bank.n.06
n
the funds held by a gambling house or the dealer in some gambling games
['he tried to break the bank at Monte Carlo']
----------------
bank.n.07
n
a slope in the turn of a road or track; the outside is higher than the inside in order to reduce the effe

## Group Exercise

Try some additional words. What other parts of speech labels can you find, and what do they mean?

You may want to look here too: https://www.nltk.org/api/nltk.corpus.reader.wordnet.html

## Lemmas

In linguistics, a **lemma** is the base form of a word.

For example: *run*, *ran*, *running*, and *runs* all have the same lemma, **run**

Sometimes, you want to **lemmatize** a corpus
* change all the words into their base form
* can improve NLP tasks like text classification

In [None]:
run_senses = wn.synsets("ran")

# Iterate through the synsets and retrieve lemmas
for sense in run_senses:
    print( "----------------")
    print( sense.name() )
    lemmas = sense.lemmas()
    print( lemmas )
    #for lemma in lemmas:
    #    print(lemma.name())  # Print the lemma's name

----------------
run.v.01
[Lemma('run.v.01.run')]
----------------
scat.v.01
[Lemma('scat.v.01.scat'), Lemma('scat.v.01.run'), Lemma('scat.v.01.scarper'), Lemma('scat.v.01.turn_tail'), Lemma('scat.v.01.lam'), Lemma('scat.v.01.run_away'), Lemma('scat.v.01.hightail_it'), Lemma('scat.v.01.bunk'), Lemma('scat.v.01.head_for_the_hills'), Lemma('scat.v.01.take_to_the_woods'), Lemma('scat.v.01.escape'), Lemma('scat.v.01.fly_the_coop'), Lemma('scat.v.01.break_away')]
----------------
run.v.03
[Lemma('run.v.03.run'), Lemma('run.v.03.go'), Lemma('run.v.03.pass'), Lemma('run.v.03.lead'), Lemma('run.v.03.extend')]
----------------
operate.v.01
[Lemma('operate.v.01.operate'), Lemma('operate.v.01.run')]
----------------
run.v.05
[Lemma('run.v.05.run'), Lemma('run.v.05.go')]
----------------
run.v.06
[Lemma('run.v.06.run'), Lemma('run.v.06.flow'), Lemma('run.v.06.feed'), Lemma('run.v.06.course')]
----------------
function.v.01
[Lemma('function.v.01.function'), Lemma('function.v.01.work'), Lemma('funct

## Antonyms

An **antonym** is a word with an opposite meaning.

WordNet organizes antonyms on the lemmas rather than the word senses themselves.

In [None]:
wn.synset("good.a.01").lemmas()

[Lemma('good.a.01.good')]

In [None]:
wn.synset("good.a.01").antonyms()

AttributeError: 'Synset' object has no attribute 'antonyms'

In [None]:
wn.lemma('good.a.01.good').antonyms()

[Lemma('bad.a.01.bad')]

## Hypernyms and Hyponyms

**Hypernym:** a more general concept

**Hyponynm:** a more specific concept

*hyper* - think "higher" like hyperactive is higher activity

*hypo* - think "lower" like when you get hypothermia from being too cold

In [None]:
print( wn.synsets("dog", pos=wn.NOUN) ) # get only the noun synsets

[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01')]


In [None]:
dog = wn.synset('dog.n.01')
print( dog.definition() )

a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds


In [None]:
print("Dog hypernyms:", dog.hypernyms())
print("Dog hyponyms:", dog.hyponyms())

Dog hypernyms: [Synset('canine.n.02'), Synset('domestic_animal.n.01')]
Dog hyponyms: [Synset('basenji.n.01'), Synset('corgi.n.01'), Synset('cur.n.01'), Synset('dalmatian.n.02'), Synset('great_pyrenees.n.01'), Synset('griffon.n.02'), Synset('hunting_dog.n.01'), Synset('lapdog.n.01'), Synset('leonberg.n.01'), Synset('mexican_hairless.n.01'), Synset('newfoundland.n.01'), Synset('pooch.n.01'), Synset('poodle.n.01'), Synset('pug.n.01'), Synset('puppy.n.01'), Synset('spitz.n.01'), Synset('toy_dog.n.01'), Synset('working_dog.n.01')]


## Group Exercise

Write a loop to print out all the hypernym levels of a given synset - for example

dog.n.01

canine.n.02

carnivore.n.01

...

You can just choose the first hypernym in the list of hypernyms

## Similarity

WordNet provides several different kinds of similarity metrics to help you calculate how similar two word senses are.

`path_similarity` tells you how close they are based on hypernum/hyponym relationships
* 0 means unrelated
* 1 means they're the same word sense

For example, notice that `dog.n.01` and `wolf.n.01` are both hyponyms of `canine.n.02`

In [None]:
wn.synset("canine.n.02").hyponyms()

[Synset('bitch.n.04'),
 Synset('dog.n.01'),
 Synset('fox.n.01'),
 Synset('hyena.n.01'),
 Synset('jackal.n.01'),
 Synset('wild_dog.n.01'),
 Synset('wolf.n.01')]

Let's calculate some similarities

dog = wn.synset('dog.n.01')
wolf = wn.synset('wolf.n.01')
canine = wn.synset('canine.n.02')
parrot = wn.synset('parrot.n.01')
cheese = wn.synset('cheese.n.01')
fly_n = wn.synset('fly.n.01')
fly_v = wn.synset('fly.v.01')

print("dog-canine:", dog.path_similarity(canine))
print("dog-wolf:", dog.path_similarity(wolf))
print("dog-dog:", dog.path_similarity(dog))
print("dog-parrot:", dog.path_similarity(parrot))
print("dog-cheese:", dog.path_similarity(cheese))
print("dog-fly.n:", dog.path_similarity(fly_n))
print("dog-fly.v:", dog.path_similarity(fly_v))

## Group Exercise

Write a program that takes two words as input and displays the word definitions corresponding to the closest similarity among all of those words' senses.

## Meronyms/Holonyms and Entailment

**Holonyms:** denotes membership or parts of something else

**Meronym:** denotes thing with members or parts

**Entailments:** implies something else

In [None]:
print( wn.synset('dog.n.01').member_holonyms() )
print( wn.synset('pack.n.06').member_meronyms() )

print( wn.synset("corncob.n.01").part_holonyms() )
print( wn.synset("corn.n.01").part_meronyms() )

print( wn.synset("eat.v.01").entailments() )

[Synset('canis.n.01'), Synset('pack.n.06')]
[Synset('dog.n.01'), Synset('hound.n.01')]
[Synset('corn.n.01')]
[Synset('corn.n.03'), Synset('corncob.n.01'), Synset('cornstalk.n.01'), Synset('ear.n.05')]
[Synset('chew.v.01'), Synset('swallow.v.01')]


## Applied Exploration

To get Applied Exploration credit for this workshop, complete all of the group exercises with good programming practices (include comments, well-written functions, etc.). Test your code out on several different examples and include written descriptions of the results.

## Small Project Prototype Idea

Use WordNet to make a word game that takes advantages of the relationships present in the database. For example, generate possible words sets for Connections.

<div>
    <table><tr>
        <td><img src="https://github.com/ericmanley/f23-CS195NLP/blob/main/images/connections1.png?raw=1" width=500></td>
        <td><img src="https://github.com/ericmanley/f23-CS195NLP/blob/main/images/connections2.png?raw=1" width=500></td>
    </tr></table>
</div>