Meghana Kambhampati

MXK190048

CS4395

Portfolio 3: WordNet

This program demonstrates and explains some of the features of WordNet and SentiWordNet. It details some of the tools and algorithms within WordNet as well as applications for them. 

.

WordNet is a collection of glosses and usage examples for words. It is used for natural language processing. WordNet organizes words by parts of speech in hierarchies. Similar words are grouped together in synsets. 

In [None]:
import nltk
import math
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('sentiwordnet')
nltk.download('nps_chat')
nltk.download('webtext')
nltk.download('treebank')
nltk.download('stopwords')
nltk.download('book')
from nltk.book import * 
from nltk.corpus import sentiwordnet as swn
from nltk.wsd import lesk
from nltk.corpus import wordnet as wn


In [None]:
wn.synsets('plant')

[Synset('plant.n.01'),
 Synset('plant.n.02'),
 Synset('plant.n.03'),
 Synset('plant.n.04'),
 Synset('plant.v.01'),
 Synset('implant.v.01'),
 Synset('establish.v.02'),
 Synset('plant.v.04'),
 Synset('plant.v.05'),
 Synset('plant.v.06')]

In [None]:
print('definition:', wn.synset('plant.n.04').definition())
print('usage:', wn.synset('plant.n.04').examples())
print('lemmas:', wn.synset('plant.n.04').lemmas())

# closure
print('\nhierarchy:')
plant = wn.synset('plant.n.04')
hyper = lambda s: s.hypernyms()
list(plant.closure(hyper))

definition: something planted secretly for discovery by another
usage: ['the police used a plant to trick the thieves', 'he claimed that the evidence against him was a plant']
lemmas: [Lemma('plant.n.04.plant')]

hierarchy:


[Synset('contrivance.n.03'),
 Synset('scheme.n.01'),
 Synset('plan_of_action.n.01'),
 Synset('plan.n.01'),
 Synset('idea.n.01'),
 Synset('content.n.05'),
 Synset('cognition.n.01'),
 Synset('psychological_feature.n.01'),
 Synset('abstraction.n.06'),
 Synset('entity.n.01')]

In WordNet, nouns are organized like a tree or hierarchy. There are several "base" nouns that then get more and more specific as you go down. Nouns have special associations like hypernyms (more general category of word), hyponyms (specific types of given word), holonyms (whole-of relationship), meronyms (part-of relationship), and antonyms (opposite meaning). These relationships make up the larger WordNet hierarchy. 

In [None]:
print('hypernyms:', plant.hypernyms())
print('hyponyms:', plant.hyponyms())
print('meronyms:', plant.part_meronyms(), plant.substance_meronyms())
print('holonyms:', plant.part_holonyms(), plant.substance_holonyms())
plant = wn.synsets('plant', pos=wn.NOUN)[0].lemmas()[0]
print('antonyms:', plant.antonyms())

hypernyms: [Synset('contrivance.n.03')]
hyponyms: []
meronyms: [] []
holonyms: [] []
antonyms: []


In [None]:
wn.synsets('bake')

[Synset('bake.v.01'),
 Synset('bake.v.02'),
 Synset('broil.v.02'),
 Synset('bake.v.04')]

In [None]:
print('definition:', wn.synset('bake.v.01').definition())
print('usage:', wn.synset('bake.v.01').examples())
print('lemmas:', wn.synset('bake.v.01').lemmas())

# closure
print('\nhierarchy:')
bake = wn.synset('bake.v.01')
hyper = lambda s: s.hypernyms()
list(bake.closure(hyper))

definition: cook and make edible by putting in a hot oven
usage: ['bake the potatoes']
lemmas: [Lemma('bake.v.01.bake')]

hierarchy:


[Synset('cook.v.03'), Synset('change_integrity.v.01'), Synset('change.v.02')]

Verbs are stored similarly to nouns in WordNet. There is a hierarchy with more specific types of actions (troponyms) near the bottom and more general verb events near the top. 

In [None]:
print(wn.morphy('baking', wn.VERB))
print(wn.morphy('baked'))
print(wn.morphy('bakes'))

bake
bake
bake


In [None]:
print(wn.synsets('bake'))
print(wn.synsets('cook'))

bake = wn.synset('bake.v.01')
cook = wn.synset('cook.v.01')

# Wu-Palmer
print('\nWu-Palmer similarity metric:')
print(wn.wup_similarity(bake, cook))

# Lesk
print('\nLesk algorithm:')
sent = ['The', 'cook', 'bakes', 'cookies', '.']
print(sent)
print(lesk(sent, 'cook'))
print('fudge.v.01:', wn.synset('fudge.v.01').definition())
print(lesk(sent, 'cook', 'n'))
print('cook.n.02:', wn.synset('cook.n.02').definition())
print(lesk(sent, 'bakes'))
print('bake.v.01:', wn.synset('bake.v.01').definition())

[Synset('bake.v.01'), Synset('bake.v.02'), Synset('broil.v.02'), Synset('bake.v.04')]
[Synset('cook.n.01'), Synset('cook.n.02'), Synset('cook.v.01'), Synset('cook.v.02'), Synset('cook.v.03'), Synset('fudge.v.01'), Synset('cook.v.05')]

Wu-Palmer similarity metric:
0.2222222222222222

Lesk algorithm:
['The', 'cook', 'bakes', 'cookies', '.']
Synset('fudge.v.01')
fudge.v.01: tamper, with the purpose of deception
Synset('cook.n.02')
cook.n.02: English navigator who claimed the east coast of Australia for Britain and discovered several Pacific islands (1728-1779)
Synset('bake.v.01')
bake.v.01: cook and make edible by putting in a hot oven


The words 'cook' and 'bake' are not as similar as I would have guessed, since they have a lot of use case overlap. The Lesk algorithm misidentified the form of 'cook' in the sentence even when the part of speech was specified. The algorithm got the form of 'bake'' correct without any modifications.

SentiWordNet is an extension of WordNet. It can assign positivity, negativity, and objectivity sentiment points to a word. This can be used to determine the sentiment or tone of a sentence based on the words and punctuation used. For example, SentiWordNet can analyze the sentence 'that movie was great!' and determine wheter it is a positive or negative sentence. 

In [None]:
senti_list = list(swn.senti_synsets('fantastic')) 
for item in senti_list:
    print(item)

sent1 = 'This is the worst day of my life'
print('\n' + sent1)
tokens = sent1.split()
for token in tokens:
    syn_list = list(swn.senti_synsets(token))
    if syn_list:
        syn = syn_list[0]
        print(syn)

sent2 = 'This is the best day of my life'
print('\n' + sent2)
tokens = sent2.split()
for token in tokens:
    syn_list = list(swn.senti_synsets(token))
    if syn_list:
        syn = syn_list[0]
        print(syn)

<antic.s.01: PosScore=0.375 NegScore=0.0>
<fantastic.s.02: PosScore=0.75 NegScore=0.0>
<fantastic.s.03: PosScore=0.375 NegScore=0.375>
<fantastic.s.04: PosScore=0.0 NegScore=0.625>
<fantastic.s.05: PosScore=0.375 NegScore=0.375>

This is the worst day of my life
<be.v.01: PosScore=0.25 NegScore=0.125>
<worst.n.01: PosScore=0.0 NegScore=1.0>
<day.n.01: PosScore=0.0 NegScore=0.0>
<life.n.01: PosScore=0.0 NegScore=0.0>

This is the best day of my life
<be.v.01: PosScore=0.25 NegScore=0.125>
<best.n.01: PosScore=0.25 NegScore=0.0>
<day.n.01: PosScore=0.0 NegScore=0.0>
<life.n.01: PosScore=0.0 NegScore=0.0>


The "emotionally charged" words like 'best' or 'worst' have the largest positive or negative scores. More objective words like 'be' and 'life' have a smaller impact on the overall sentiment of a sentence. This information is useful to have when trying to determine the tone or emotion behind a group of text. 

A collocation is a series of words that appear together frequently. They often combine to create a new meaning. If one of the words is changed, the meaning is no longer the same.

In [None]:
text4.collocations()

text = ' '.join(text4.tokens)
vocab = len(set(text4))
hg = text.count('General Government')/vocab
print("\np(General Government) = ", hg )
h = text.count('General')/vocab
print("p(General) = ", h)
g = text.count('Government')/vocab
print('p(Government) = ', g)
pmi = math.log2(hg / (h * g))
print('pmi = ', pmi)

United States; fellow citizens; years ago; four years; Federal
Government; General Government; American people; Vice President; God
bless; Chief Justice; one another; fellow Americans; Old World;
Almighty God; Fellow citizens; Chief Magistrate; every citizen; Indian
tribes; public debt; foreign nations

p(General Government) =  0.002394014962593516
p(General) =  0.002394014962593516
p(Government) =  0.03371571072319202
pmi =  4.890435179947461


The pmi of "General Government" is a positive number. The probability that the words "General" and "Government" will appear together is higher than chance. This means that "General Government" is likely to be a collocation.  

.

WordNet and SentiWordNet are useful tools that can be applied in many ways. They are used in NLP to analyze words and and their contexts. 