WordNet is a way of organizing nouns, verbs, adjectives, and adverbs in a hierarchical organization. It provides short definitions of words as well as synonymous words and examples. This all comes together to provide a benefit in classifying and understanding human languages.

In [None]:
import nltk
nltk.download("all")
from nltk.corpus import wordnet as wn

In [3]:
# Getting all synsets of "wolf"

wn.synsets("wolf")

[Synset('wolf.n.01'),
 Synset('wolf.n.02'),
 Synset('wolf.n.03'),
 Synset('wolf.n.04'),
 Synset('beast.n.02'),
 Synset('wolf.v.01')]

In [4]:
# Obtaining the first definition for "wolf" 

wn.synset("wolf.n.01").definition()

'any of various predatory carnivorous canine mammals of North America and Eurasia that usually hunt in packs'

In [5]:
# An example of "wolf" in a sentence. 

wn.synset("wolf.v.01").examples()

['The teenager wolfed down the pizza']

In [6]:
# Extracting lemmas

wn.synset("wolf.v.01").lemmas()

[Lemma('wolf.v.01.wolf'), Lemma('wolf.v.01.wolf_down')]

In [7]:
# Moving up the noun hierarchy for the word "wolf"

wolf = wn.synset("wolf.n.01")
hyp = wolf.hypernyms()[0]
top = wn.synset("entity.n.01")

while hyp:
  print(hyp)
  if hyp == top:
    break
  if hyp.hypernyms():
    hyp = hyp.hypernyms()[0]

Synset('canine.n.02')
Synset('carnivore.n.01')
Synset('placental.n.01')
Synset('mammal.n.01')
Synset('vertebrate.n.01')
Synset('chordate.n.01')
Synset('animal.n.01')
Synset('organism.n.01')
Synset('living_thing.n.01')
Synset('whole.n.02')
Synset('object.n.01')
Synset('physical_entity.n.01')
Synset('entity.n.01')


WordNet appears to be organizing its nouns based on a hierarchy. At least for "wolf" it appears to be starting at a very descriptive definition and moving up to a more generalized one. 

In [8]:
# Printing the Hypernyms, Hyponyms, Meronyms, Holonyms, and Antonyms for "wolf"

print("Hypernyms: ", wolf.hypernyms())
print("Hyponyms: ", wolf.hyponyms())
print("Meronyms: ", wolf.member_meronyms())
print("Holonyms: ", wolf.member_holonyms())
print("Antonyms: ", wolf.lemmas()[0].antonyms())

Hypernyms:  [Synset('canine.n.02')]
Hyponyms:  [Synset('coyote.n.01'), Synset('red_wolf.n.01'), Synset('timber_wolf.n.01'), Synset('white_wolf.n.01'), Synset('wolf_pup.n.01')]
Meronyms:  []
Holonyms:  [Synset('canis.n.01')]
Antonyms:  []


Switching over to the verb form of "wolf" or "wolfs" which has a separate meaning.

In [9]:
# Finding the synsets for "wolf"

wn.synsets("wolf")

[Synset('wolf.n.01'),
 Synset('wolf.n.02'),
 Synset('wolf.n.03'),
 Synset('wolf.n.04'),
 Synset('beast.n.02'),
 Synset('wolf.v.01')]

In [10]:
# Obtaining the verb definition for "wolf" 

wn.synset("wolf.v.01").definition()

'eat hastily'

In [11]:
# An example of "wolf" in a sentence. 

wn.synset("wolf.v.01").examples()

['The teenager wolfed down the pizza']

In [12]:
# Extracting lemmas

wn.synset("wolf.v.01").lemmas()

[Lemma('wolf.v.01.wolf'), Lemma('wolf.v.01.wolf_down')]

In [14]:
# Attempting to move up the hierarchy for the verb form of "wolf"

wolfv = wn.synset("wolf.v.01")
hyp = wolfv.hypernyms()[0]
count = 0

while hyp:
  print(hyp)
  if count > 5:
    break
  if hyp.hypernyms():
    hyp = hyp.hypernyms()[0]
  count += 1

Synset('eat.v.01')
Synset('consume.v.02')
Synset('consume.v.02')
Synset('consume.v.02')
Synset('consume.v.02')
Synset('consume.v.02')
Synset('consume.v.02')


Verbs are structured in a similar way to nouns with definitions, examples, and lemmas. But the main difference is the lack of a top level for verbs, it does not exist in WordNet.

In [15]:
# Using morphy now to find different forms of the word "silliest"

wn.morphy("sillier", wn.ADJ)

'silly'

In [16]:
wn.morphy("silliest")

'silly'

In [17]:
# I will be selecting the words wolf and fox.

wolf = wn.synset("wolf.n.01")
fox = wn.synset("fox.n.01")

print(wolf.definition())
print(fox.definition())

wolf.path_similarity(fox)

any of various predatory carnivorous canine mammals of North America and Eurasia that usually hunt in packs
alert carnivorous mammal with pointed muzzle and ears and a bushy tail; most are predators that do not hunt in packs


0.3333333333333333

In [18]:
# Wu-Palmer similarity metric

wn.wup_similarity(wolf, fox)

0.9285714285714286

In [19]:
# Lesk algorithm

from nltk.wsd import lesk
print(wn.synset("fox.n.03").definition())

sent = ["I", "saw", "a", "fox", "and", "a", "wolf", "today", "in", "the", "wilderness", "."]
print(lesk(sent, "fox", "n"))
print(lesk(sent, "fox"))

the grey or reddish-brown fur of a fox
Synset('fox.n.03')
Synset('fox.n.03')


The Wu-Palmer similarity metric appears to do a decent job when it comes to comparing words at least with the two I picked. The Lesk algorithm seemed to have a bit of trouble knowing what noun to pick for the words used.

SentiWordNet is primarily built upon WordNet and focuses on opinion mining. It uses 3 sentiment scores of positivity, negativity, and objectivity to determine a word's overall sentiment score. 

In [20]:
# Finding the polarity score for agony and its synsets

from nltk.corpus import sentiwordnet as swn

agony = swn.senti_synset("agony.n.01")
print(agony)
print()

senti_list = list(swn.senti_synsets("agony"))
print
for item in senti_list:
  print(item)

<agony.n.01: PosScore=0.0 NegScore=0.625>

<agony.n.01: PosScore=0.0 NegScore=0.625>
<agony.n.02: PosScore=0.0 NegScore=0.375>


In [21]:
# Attempting to find the polarity score for each word in the sentence.

sent = "I was in agony the entire time"
tokens = sent.split()

for token in tokens:
  print(token)
  syn_list = list(swn.senti_synsets(token))

  if syn_list:
    syn = syn_list[0]
    print(syn)

I
<iodine.n.01: PosScore=0.0 NegScore=0.0>
was
<washington.n.02: PosScore=0.0 NegScore=0.0>
in
<inch.n.01: PosScore=0.0 NegScore=0.0>
agony
<agony.n.01: PosScore=0.0 NegScore=0.625>
the
entire
<stallion.n.01: PosScore=0.0 NegScore=0.0>
time
<time.n.01: PosScore=0.0 NegScore=0.0>


SentiWordNet did not appear to really understand all the words within the given sentence, but it was able to pick up some and provide the polarity. When used properly, SentiWordNet is useful for detecting emotion and can help a computer better understand the meaning in conversations.

Collocations are words that come together to create a bigger meaning compared to their individual ones. It's useful for computers to know so they can understand the meaning of words in their given context. 

In [None]:
import nltk
from nltk.book import text4

text4.collocations()

In [23]:
# Finding the mutual information for the collocation "American people"

import math

text = ' '.join(text4.tokens)
text[:50]

vocab = len(set(text4))
ap = text.count("American people")
print("p(American people) = ", ap)
a = text.count("American")
print("p(American) = ", a)
p = text.count("people")
print("p(people) = ", p)
pmi = math.log2(ap / (a * p))
print("pmi = ", pmi)


p(American people) =  40
p(American) =  258
p(people) =  628
pmi =  -11.983919909427518


Given that the pmi results in a negative number, it believes the mutual information to not be related much at all. Presumably that is because "American people" does not show up very often compared to the two words separately. So I believe that the collocation does not think "American people" have much in common with each other.