# **WordNet**
WordNet is a lexical database and software system developed at Princeton University that provides a large, organized collection of words and their meanings. WordNet consists of a set of interlinked lexical synsets, which are groups of synonyms that represent a single concept. 

First we need to import nltk and WordNet


In [307]:
# Python imports
import nltk
from nltk.corpus import wordnet as wn

**Noun synset**

    The code below gets all the synsets of 'ball'.
    Each synset has an ID such as 'ball.n.01'. The form of the ID is: word.pos.nn

In [308]:
print(wn.synsets('ball'))

[Synset('ball.n.01'), Synset('musket_ball.n.01'), Synset('ball.n.03'), Synset('ball.n.04'), Synset('testis.n.01'), Synset('ball.n.06'), Synset('ball.n.07'), Synset('ball.n.08'), Synset('ball.n.09'), Synset('ball.n.10'), Synset('ball.n.11'), Synset('ball.n.12'), Synset('ball.v.01')]


    Here, I have selected 'ball.n.01' as my choice of synset to extract different parts of linguistics



Definition of the first synset 'ball.n.01'

In [309]:
wn.synset('ball.n.01').definition()

'round object that is hit or thrown or kicked in games'

Usage examples of the synset 'ball.n.01'


In [310]:
wn.synset('ball.n.01').examples()

['the ball travelled 90 mph on his serve',
 'the mayor threw out the first ball',
 'the ball rolled into the corner pocket']

# WordNet Hierarchy

Synsets are organized in a hierarchichal relations. For example:
*  hypernym (higher) -- canine is a hypernym of dog
*  hyponym (lower) -- a dog is a hyponym of canine
*  meronym (part of) -- wheel is a meronym of car
*  holonym (whole) -- car is a holonym of wheel

This hierarchy can be traversed.

**WordNet Hierarchy (Noun)**

Traversing the hierarchy using the synset 'ball.n.01'

    Here, the hierarchy is traversed to the top from the bottom. The synset
    'ball.n.1' is at the bottom, and the code traverses and print every synset
    until it reaches the top. The hierarchy for nouns has 'entity.n.01' at the top.



In [311]:
ball = wn.synset('ball.n.01')
top = wn.synset('entity.n.01')
hyp = ball.hypernyms()[0]
while hyp:
  print(hyp)
  if hyp == top:
    break
  if hyp.hypernyms():
    hyp = hyp.hypernyms()[0]


Synset('game_equipment.n.01')
Synset('equipment.n.01')
Synset('instrumentality.n.03')
Synset('artifact.n.01')
Synset('whole.n.02')
Synset('object.n.01')
Synset('physical_entity.n.01')
Synset('entity.n.01')


**Hyponyms**

    Hyponyms are the words at the lower level of hierarchy of a general word. In other words, a more specific word is a hyponym of that word For example:
    *   Baseball is a hyponym of a ball, a more general word.






In [312]:
wn.synset('ball.n.01').hyponyms()

[Synset('baseball.n.02'),
 Synset('basketball.n.02'),
 Synset('billiard_ball.n.01'),
 Synset('bocce_ball.n.01'),
 Synset('bowl.n.07'),
 Synset('bowling_ball.n.01'),
 Synset('cricket_ball.n.01'),
 Synset('croquet_ball.n.01'),
 Synset('field_hockey_ball.n.01'),
 Synset('football.n.02'),
 Synset('golf_ball.n.01'),
 Synset('handball.n.01'),
 Synset('jack.n.05'),
 Synset('lacrosse_ball.n.01'),
 Synset('marble.n.02'),
 Synset('medicine_ball.n.01'),
 Synset('ninepin_ball.n.01'),
 Synset('ping-pong_ball.n.01'),
 Synset('polo_ball.n.01'),
 Synset('pool_ball.n.01'),
 Synset('punching_bag.n.02'),
 Synset('racquetball.n.01'),
 Synset('roulette_ball.n.01'),
 Synset('rugby_ball.n.01'),
 Synset('soccer_ball.n.01'),
 Synset('softball.n.01'),
 Synset('squash_ball.n.01'),
 Synset('tennis_ball.n.01'),
 Synset('volleyball.n.02'),
 Synset('wiffle.n.01')]

**Hypernyms**

    Hypernyms are at higher hierarcial level. In other words, hypernyms are more gerneral words than specific words. For example:
    *  A baseball can be denoted as a general word ball.

In [313]:
wn.synset('baseball.n.02').hypernyms()

[Synset('ball.n.01'), Synset('baseball_equipment.n.01')]

**Holonyms**

    Holonnyms are either members or part of something whole. For example:
    *  A leg a part of the body
    *  A lion is a member of the pride


In [314]:
wn.synset('leg.n.01').part_holonyms()

[Synset('body.n.01')]

In [315]:
wn.synset('lion.n.01').member_holonyms()

[Synset('panthera.n.01'), Synset('pride.n.04')]

**Meronyms**

    Meronyms are a specific thing from the whole. For example:
    *  A human body has many parts
    *  A pride has many lions



In [316]:
wn.synset('body.n.01').part_meronyms()

[Synset('arm.n.01'),
 Synset('articulatory_system.n.01'),
 Synset('body_substance.n.01'),
 Synset('cavity.n.04'),
 Synset('circulatory_system.n.01'),
 Synset('crotch.n.02'),
 Synset('digestive_system.n.01'),
 Synset('endocrine_system.n.01'),
 Synset('head.n.01'),
 Synset('leg.n.01'),
 Synset('lymphatic_system.n.01'),
 Synset('musculoskeletal_system.n.01'),
 Synset('neck.n.01'),
 Synset('nervous_system.n.01'),
 Synset('pressure_point.n.01'),
 Synset('respiratory_system.n.01'),
 Synset('sensory_system.n.02'),
 Synset('torso.n.01'),
 Synset('vascular_system.n.01')]

In [317]:
wn.synset('pride.n.04').member_meronyms()

[Synset('lion.n.01')]

**Antonyms**

    Antonyms are opposites of the word. For example:
    * Good is the opposite of the bad

In [318]:
wn.synset('good.a.01').lemmas()[0].antonyms()

[Lemma('bad.a.01.bad')]

**WordNet Hierarchy (Verb)**

2) The code below gets all the synsets of 'eat'.


    Each synset has an ID such as 'eat.v.01'. The form of the ID is: word.pos.nn

In [319]:
# eat
print(wn.synsets('eat'))


[Synset('eat.v.01'), Synset('eat.v.02'), Synset('feed.v.06'), Synset('eat.v.04'), Synset('consume.v.05'), Synset('corrode.v.01')]


3) Here, I have selected 'eat.v.01' as my choice of synset to extract
different parts of linguistics

In [320]:
wn.synset('eat.v.01').definition()

'take in solid food'

    Usage examples of the synset 'eat.v.01'

In [321]:
wn.synset('eat.v.01').examples()

['She was eating a banana', 'What did you eat for dinner last night?']

Traversing the hierarchy using the synset 'eat.v.01'



In [322]:
eat = wn.synset('eat.v.01').hypernyms()[0]
while eat:
  print(eat)
  if eat.hypernyms():
        eat = eat.hypernyms()[0]
  else:
      break

Synset('consume.v.02')


**Morphy**

    Morphy gives the base word from the different form a word. For example:
    *  ate becomes eat

In [323]:
print(wn.morphy('fast', wn.NOUN))
print(wn.morphy('fasting', wn.VERB))
print(wn.morphy('faster', wn.ADJ))

fast
fast
fast


**Similarity**

    A similarity score between two word senses can be extracted from WordNet, where the similarity ranges from 0 (little similarity) to 1 (identity).

In [324]:
dog = wn.synset('dog.n.01')
cat = wn.synset('cat.n.01')
dog.path_similarity(cat)

0.2

**Wu_Parmer**

In [325]:
wn.wup_similarity(cat, dog)

0.8571428571428571

**Lesk Algorithm**

In [326]:
from nltk.wsd import lesk


    The Lesk algorithm is a technique used in natural language processing to disambiguate the meaning of words in a sentence.

In [327]:
sent = ['I', 'have', 'two', 'pets', ',', 'a', 'dog', 'and', 'a', 'cat', '.']
print(lesk(sent, 'cat', 'n'))

Synset('caterpillar.n.02')


In [328]:
# look at the definitions for 'ball'
for l in wn.synsets('ball'):
    print(l, l.definition())

Synset('ball.n.01') round object that is hit or thrown or kicked in games
Synset('musket_ball.n.01') a solid projectile that is shot by a musket
Synset('ball.n.03') an object with a spherical shape
Synset('ball.n.04') the people assembled at a lavish formal dance
Synset('testis.n.01') one of the two male reproductive glands that produce spermatozoa and secrete androgens
Synset('ball.n.06') a spherical object used as a plaything
Synset('ball.n.07') United States comedienne best known as the star of a popular television program (1911-1989)
Synset('ball.n.08') a compact mass
Synset('ball.n.09') a lavish dance requiring formal attire
Synset('ball.n.10') a more or less rounded anatomical body or mass
Synset('ball.n.11') the game of baseball
Synset('ball.n.12') a pitch that is not in the strike zone
Synset('ball.v.01') form into a ball by winding or rolling


# SentiWordNet

    SentiWordNet is a lexical resource for natural language processing that provides a sentiment score for each word in a corpus. The positivity and negativity scores range from 0 to 1, with 0 indicating no sentiment and 1 indicating very strong sentiment. The objectivity score indicates the degree to which the word is neutral and ranges from 0 to 1.

In [329]:
# dependencies
from nltk.corpus import sentiwordnet as swn

    Here, I have chosen 'sadness.n.02' as an emotionally charged word's synset.

In [330]:
sadness = swn.senti_synset('sadness.n.02')
print(sadness)
print("Positive score = ", sadness.pos_score())
print("Negative score = ", sadness.neg_score())
print("Objective score = ", sadness.obj_score())

<sadness.n.02: PosScore=0.0 NegScore=0.625>
Positive score =  0.0
Negative score =  0.625
Objective score =  0.375


    Here, I have chosen a sentence "I want to like my dogs" and it polarity results are leaning towards negative.

In [331]:
sent = 'I want to like my dogs'
neg = 0
pos = 0
tokens = sent.split()
for token in tokens:
    syn_list = list(swn.senti_synsets(token))
    if syn_list:
        syn = syn_list[0]
        neg += syn.neg_score()
        pos += syn.pos_score()
    
print("neg\t pos")
print(neg, '\t', pos)

neg	 pos
0.25 	 0.125


# Collections

    In the Natural Language Toolkit (NLTK), collections is a module that provides a set of high-performance data structures for working with collections of items. 

In [332]:
import nltk
nltk.download('stopwords')
from nltk.book import * 
text4

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


<Text: Inaugural Address Corpus>

In [333]:
# get collocations
text4.collocations()

United States; fellow citizens; years ago; four years; Federal
Government; General Government; American people; Vice President; God
bless; Chief Justice; one another; fellow Americans; Old World;
Almighty God; Fellow citizens; Chief Magistrate; every citizen; Indian
tribes; public debt; foreign nations


# Mutual Information

    Here, I calculate mutual information for a couple of the collocations that NLTK identified. Recall that mutual information is the log of the probability:

P(x,y) / [P(x) * P(y)]

In [334]:
text = ' '.join(text4.tokens)
text[:50]

'Fellow - Citizens of the Senate and of the House o'

    Mutual information is a statistical measure used in natural language processing to identify the strength of the association between two terms in a corpus of text. 
    
    It measures the amount of information that is shared between two terms, indicating how much one term can tell us about the other.

In [335]:
import math
vocab = len(set(text6))
hg = text.count('Almighty God')/vocab
print("p(Almighty God) = ",hg )
h = text.count('Almighty')/vocab
print("p(Almighty) = ", h)
g = text.count('God')/vocab
print('p(God) = ', g)
pmi = math.log2(hg/(h * g))
print('pmi = ', pmi)

p(Almighty God) =  0.006925207756232687
p(Almighty) =  0.012927054478301015
p(God) =  0.05170821791320406
pmi =  3.3729982791016377
