# WordNet
Today we will learn about WordNet and its Synsets.

Let’s look at a real example by using nltk ’s WordNet interface to explore synsets
associated with the term, 'fruit' . We can do this using the following code snippet:

In [3]:
from nltk.corpus import wordnet as wn
import pandas as pd

In [4]:
term = 'fruit'
synsets = wn.synsets(term)

In [5]:
# display total synsets
print('Total Synsets:', len(synsets))

Total Synsets: 5


We can see that there are a total of five synsets associated with the term 'fruit' .
What can these synsets indicate? We can dig deeper into each synset and its components
using the following code snippet:

In [6]:
for synset in synsets:
    print('Synset:', synset)
    print('Part of speech:', synset.lexname())
    print('Definition:', synset.definition())
    print('Lemmas:', synset.lemma_names())
    print('Examples:', synset.examples())
    print()

Synset: Synset('fruit.n.01')
Part of speech: noun.plant
Definition: the ripened reproductive body of a seed plant
Lemmas: ['fruit']
Examples: []

Synset: Synset('yield.n.03')
Part of speech: noun.artifact
Definition: an amount of a product
Lemmas: ['yield', 'fruit']
Examples: []

Synset: Synset('fruit.n.03')
Part of speech: noun.event
Definition: the consequence of some effort or action
Lemmas: ['fruit']
Examples: ['he lived long enough to see the fruit of his policies']

Synset: Synset('fruit.v.01')
Part of speech: verb.creation
Definition: cause to bear fruit
Lemmas: ['fruit']
Examples: []

Synset: Synset('fruit.v.02')
Part of speech: verb.creation
Definition: bear fruit
Lemmas: ['fruit']
Examples: ['the trees fruited early this year']



The preceding output shows us details pertaining to each synset associated with
the term 'fruit' , and the definitions give us the sense of each synset and the lemma
associated with it. The part of speech for each synset is also mentioned, which includes
nouns and verbs. Some examples are also depicted in the preceding output that show
how the term is used in actual sentences.

# Exercise 1:
The polysemy of a word is the number of senses it has. 
Using WordNet, we can determine that the noun dog has 7 senses with: len(wn.synsets('dog', 'n')). 

Compute the average polysemy of nouns, verbs, adjectives and adverbs according to WordNet.

Tips:

Learn a bit about WordNet Data File Format:
https://wordnet.princeton.edu/documentation/wndb5wn

Access to all noun Synsets: wn.all_synsets('n')
For each synset we can access to their lemmas: synset.lemmas()


In [7]:
wn.synsets('dog')

[Synset('dog.n.01'),
 Synset('frump.n.01'),
 Synset('dog.n.03'),
 Synset('cad.n.01'),
 Synset('frank.n.02'),
 Synset('pawl.n.01'),
 Synset('andiron.n.01'),
 Synset('chase.v.01')]

In [8]:
# Implement your solution

## Now that we understand synsets better, let’s start exploring various semantic relationships as mentioned.

### Entailments

In [9]:
# Entailments
for action in ['walk', 'eat', 'digest']:
    action_syn = wn.synsets(action, pos='v')[0]
    print(action_syn, '-- entails -->', action_syn.entailments())

Synset('walk.v.01') -- entails --> [Synset('step.v.01')]
Synset('eat.v.01') -- entails --> [Synset('chew.v.01'), Synset('swallow.v.01')]
Synset('digest.v.01') -- entails --> [Synset('consume.v.02')]


### Homonyms and Homographs

In [10]:
for synset in wn.synsets('bank'):
    print(synset.name(),'-',synset.definition())

bank.n.01 - sloping land (especially the slope beside a body of water)
depository_financial_institution.n.01 - a financial institution that accepts deposits and channels the money into lending activities
bank.n.03 - a long ridge or pile
bank.n.04 - an arrangement of similar objects in a row or in tiers
bank.n.05 - a supply or stock held in reserve for future use (especially in emergencies)
bank.n.06 - the funds held by a gambling house or the dealer in some gambling games
bank.n.07 - a slope in the turn of a road or track; the outside is higher than the inside in order to reduce the effects of centrifugal force
savings_bank.n.02 - a container (usually with a slot in the top) for keeping money at home
bank.n.09 - a building in which the business of banking transacted
bank.n.10 - a flight maneuver; aircraft tips laterally about its longitudinal axis (especially in turning)
bank.v.01 - tip laterally
bank.v.02 - enclose with a bank
bank.v.03 - do business with a bank or keep an account at 

### Synonyms and Antonyms

In [11]:
term = 'large'
synsets = wn.synsets(term)
adj_large = synsets[1]
adj_large = adj_large.lemmas()[0]
adj_large_synonym = adj_large.synset()
adj_large_antonym = adj_large.antonyms()[0].synset()
# print synonym and antonym
print('Synonym:', adj_large_synonym.name())
print('Definition:', adj_large_synonym.definition())
print('Antonym:', adj_large_antonym.name())
print('Definition:', adj_large_antonym.definition())

Synonym: large.a.01
Definition: above average in size or number or quantity or magnitude or extent
Antonym: small.a.01
Definition: limited or below average in number or quantity or magnitude or extent


In [12]:
term = 'rich'
synsets = wn.synsets(term)[:3]
# print synonym and antonym for different synsets
for synset in synsets:
    rich = synset.lemmas()[0]
    rich_synonym = rich.synset()
    rich_antonym = rich.antonyms()[0].synset()
    print('Synonym:', rich_synonym.name())
    print('Definition:', rich_synonym.definition())
    print('Antonym:', rich_antonym.name())
    print('Definition:', rich_antonym.definition())
    print()

Synonym: rich_people.n.01
Definition: people who have possessions and wealth (considered as a group)
Antonym: poor_people.n.01
Definition: people without possessions or wealth (considered as a group)

Synonym: rich.a.01
Definition: possessing material wealth
Antonym: poor.a.02
Definition: having little money or few possessions

Synonym: rich.a.02
Definition: having an abundant supply of desirable qualities or substances (especially natural resources)
Antonym: poor.a.04
Definition: lacking in specific resources, qualities or substances



In [13]:
# Try with different words (in the example bellow we are using the word happy)
syn = wn.synsets('happy')
for s in syn:
    print('Name:', s.name())
    print('Definition:', s.definition())
    print('Lemmas:')
    for lemma in s.lemmas():
        print('\tLemma:',lemma.name())
        if len(lemma.antonyms())>0:
            print('\t\tAntonyms:')    
            for ant in lemma.antonyms():
                print('\t\t',ant.name())
        else:
            print('\tLemma has no antonyms')
    print()

Name: happy.a.01
Definition: enjoying or showing or marked by joy or pleasure
Lemmas:
	Lemma: happy
		Antonyms:
		 unhappy

Name: felicitous.s.02
Definition: marked by good fortune
Lemmas:
	Lemma: felicitous
	Lemma has no antonyms
	Lemma: happy
	Lemma has no antonyms

Name: glad.s.02
Definition: eagerly disposed to act or to be of service
Lemmas:
	Lemma: glad
	Lemma has no antonyms
	Lemma: happy
	Lemma has no antonyms

Name: happy.s.04
Definition: well expressed and to the point
Lemmas:
	Lemma: happy
	Lemma has no antonyms
	Lemma: well-chosen
	Lemma has no antonyms



### Hyponyms and Hypernyms

In [14]:
term = 'tree'
synsets = wn.synsets(term)
tree = synsets[0]
# print the entity and its meaning
print('Name:', tree.name())
print('Definition:', tree.definition())

Name: tree.n.01
Definition: a tall perennial woody plant having a main trunk and branches forming a distinct elevated crown; includes both gymnosperms and angiosperms


In [15]:
hyponyms = tree.hyponyms()
print('Total Hyponyms:', len(hyponyms))
print('Sample Hyponyms')
for hyponym in hyponyms[:10]:
    print('\t', hyponym.name(), '-', hyponym.definition())

Total Hyponyms: 180
Sample Hyponyms
	 aalii.n.01 - a small Hawaiian tree with hard dark wood
	 acacia.n.01 - any of various spiny trees or shrubs of the genus Acacia
	 african_walnut.n.01 - tropical African timber tree with wood that resembles mahogany
	 albizzia.n.01 - any of numerous trees of the genus Albizia
	 alder.n.02 - north temperate shrubs or trees having toothed leaves and conelike fruit; bark is used in tanning and dyeing and the wood is rot-resistant
	 angelim.n.01 - any of several tropical American trees of the genus Andira
	 angiospermous_tree.n.01 - any tree having seeds and ovules contained in the ovary
	 anise_tree.n.01 - any of several evergreen shrubs and small trees of the genus Illicium
	 arbor.n.01 - tree (as opposed to shrub)
	 aroeira_blanca.n.01 - small resinous tree or shrub of Brazil


In [16]:
hypernyms = tree.hypernyms()
print(hypernyms)

[Synset('woody_plant.n.01')]


In [17]:
# get total hierarchy pathways for 'tree'
hypernym_paths = tree.hypernym_paths()
print('Total Hypernym paths:', len(hypernym_paths))
# print the entire hypernym hierarchy
print('Hypernym Hierarchy')
print(' -> '.join(synset.name() for synset in hypernym_paths[0]))

Total Hypernym paths: 1
Hypernym Hierarchy
entity.n.01 -> physical_entity.n.01 -> object.n.01 -> whole.n.02 -> living_thing.n.01 -> organism.n.01 -> plant.n.02 -> vascular_plant.n.01 -> woody_plant.n.01 -> tree.n.01


From the preceding output, you can see that 'entity' is the most generic concept in which 'tree' is present, 
and the complete hypernym hierarchy showing the corresponding hypernym or superclass at each level is shown.

### Holonyms and Meronyms

#### Holonyms: entities that contain a specific entity of our interest. Refers to a relationship between a term or entity that denotes the whole and a term denoting a specific part of the whole

#### Meronyms: semantic relationships that relate a term or entity as a part of constituent of another term or entity

In [18]:
member_holonyms = tree.member_holonyms()
print('Total Member Holonyms:', len(member_holonyms))
print('Member Holonyms for [tree]:-')
for holonym in member_holonyms:
    print(holonym.name(), '-', holonym.definition())

Total Member Holonyms: 1
Member Holonyms for [tree]:-
forest.n.01 - the trees and other plants in a large densely wooded area


In [19]:
part_meronyms = tree.part_meronyms()
print('Total Part Meronyms:', len(part_meronyms))
print('Part Meronyms for [tree]:-')
for meronym in part_meronyms:
    print(meronym.name(), '-', meronym.definition())

Total Part Meronyms: 5
Part Meronyms for [tree]:-
burl.n.02 - a large rounded outgrowth on the trunk or branch of a tree
crown.n.07 - the upper branches and leaves of a tree or other plant
limb.n.02 - any of the main branches arising from the trunk or a bough of a tree
stump.n.01 - the base part of a tree that remains standing after the tree has been felled
trunk.n.01 - the main stem of a tree; usually covered with bark; the bole is usually the part that is commercially useful for lumber


In [20]:
# substance based meronyms for tree
substance_meronyms = tree.substance_meronyms()
print('Total Substance Meronyms:', len(substance_meronyms))
print('Substance Meronyms for [tree]:-')
for meronym in substance_meronyms:
    print(meronym.name(), '-', meronym.definition())

Total Substance Meronyms: 2
Substance Meronyms for [tree]:-
heartwood.n.01 - the older inactive central wood of a tree or woody plant; usually darker and denser than the surrounding sapwood
sapwood.n.01 - newly formed outer wood lying between the cambium and the heartwood of a tree or woody plant; usually light colored; active in water conduction


The preceding output shows various meronyms that include various constituents of trees like stump 
and trunk and also various derived substances from trees like heartwood and sapwood

### Semantic Relationships and Similarity

In [21]:
tree = wn.synset('tree.n.01')
lion = wn.synset('lion.n.01')
tiger = wn.synset('tiger.n.02')
cat = wn.synset('cat.n.01')
dog = wn.synset('dog.n.01')
# create entities and extract names and definitions
entities = [tree, lion, tiger, cat, dog]
entity_names = [entity.name().split('.')[0] for entity in entities]
entity_definitions = [entity.definition() for entity in entities]
# print entities and their definitions
for entity, definition in zip(entity_names, entity_definitions):
    print(entity, '-', definition)

tree - a tall perennial woody plant having a main trunk and branches forming a distinct elevated crown; includes both gymnosperms and angiosperms
lion - large gregarious predatory feline of Africa and India having a tawny coat with a shaggy mane in the male
tiger - large feline of forests in most of Asia having a tawny coat with black stripes; endangered
cat - feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats
dog - a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds


#### Correlated entities are expected to have very specific hypernyms and unrelated entities should have very abstract or generic hypernyms

In [22]:
common_hypernyms = []
for entity in entities:
    # get pairwise lowest common hypernyms
    common_hypernyms.append([entity.lowest_common_hypernyms(compared_entity)[0].name().split('.')[0] 
                             for compared_entity in entities])
# build pairwise lower common hypernym matrix
common_hypernym_frame = pd.DataFrame(common_hypernyms, index=entity_names, columns=entity_names)

print(common_hypernym_frame)

           tree       lion      tiger        cat        dog
tree       tree   organism   organism   organism   organism
lion   organism       lion    big_cat     feline  carnivore
tiger  organism    big_cat      tiger     feline  carnivore
cat    organism     feline     feline        cat  carnivore
dog    organism  carnivore  carnivore  carnivore        dog


In [23]:
similarities = []
for entity in entities:
    # get pairwise similarities
    similarities.append([round(entity.path_similarity(compared_entity), 2) for compared_entity in entities])
# build pairwise similarity matrix
similarity_frame = pd.DataFrame(similarities, index=entity_names, columns=entity_names)
# print the matrix
print(similarity_frame)

       tree  lion  tiger   cat   dog
tree   1.00  0.07   0.07  0.08  0.12
lion   0.07  1.00   0.33  0.25  0.17
tiger  0.07  0.33   1.00  0.25  0.17
cat    0.08  0.25   0.25  1.00  0.20
dog    0.12  0.17   0.17  0.20  1.00


In [24]:
entity.lch_similarity

<bound method Synset.lch_similarity of Synset('dog.n.01')>

# Exercise 2:

Use at least two of the predefined similarity measures to score the similarity of each of the following pairs of words. Rank the pairs in order of decreasing similarity. 

How close is your ranking to the order given here, an order that was established experimentally by (Miller & Charles, 1998): 

car-automobile, gem-jewel, journey-voyage, boy-lad, coast-shore, asylum-madhouse, magician-wizard, midday-noon, furnace- stove, food-fruit, bird-cock, bird-crane, tool-implement, brother-monk, lad- brother, crane-implement, journey-car, monk-oracle, cemetery-woodland, food- rooster, coast-hill, forest-graveyard, shore-woodland, monk-slave, coast-forest, lad-wizard, chord-smile, glass-magician, rooster-voyage, noon-string

##### Tip: To simplify your solution use the first synset of each word (e.g. wn.synsets('car')[0])

In [25]:
# Implement your solution

## Exercise 3:
Implement a function that computes the similarity of two sentences by computing the path_similarity between the terms in sentences

##### Tip: To simplify your solution use the first synset of each word (e.g. wn.synsets(word)[0])

In [26]:
from nltk import word_tokenize, pos_tag

# Annotate text tokens with POS tags
def pos_tag_text(text):
    def convert_tags(pos_tag):
        if pos_tag.startswith('J'):
            return wn.ADJ
        elif pos_tag.startswith('V'):
            return wn.VERB
        elif pos_tag.startswith('N'):
            return wn.NOUN
        elif pos_tag.startswith('R'):
            return wn.ADV
        else:
            return None

    tagged_text = pos_tag(text)
    tagged_lower_text = [(word.lower(), convert_tags(pos_tag)) for word, pos_tag in tagged_text]
    return tagged_lower_text

In [None]:
def sentence_similarity(sentence1, sentence2):
    """ compute the sentence similarity using Wordnet """
    # Tokenize and tag
    sentence1 = word_tokenize(sentence1)
    sentence2 = word_tokenize(sentence2)
 
    # Get the synsets for the tagged words
    synsets1 = [pos_tag_text([word]) for word in sentence1]
    synsets2 = [pos_tag_text([word]) for word in sentence2]
    
    # Filter out the Nones
    temp1 = # write your code here
    temp2 = # write your code here
    
    # Write your code here
 
    # score has the sum of the max similarity between each word from sentence1 and sentence2
    # count has the total number computed similarities - if the similarity is None this is not considered
    score = round(score/count, 2)
    return score

In [None]:
sentences = [
    "Dogs are awesome.",
    "Some gorgeous creatures are felines.",
    "Dolphins are swimming mammals.",
    "Cats are beautiful animals.",
]
 
focus_sentence = "Cats are beautiful animals."
 
for sentence in sentences:
    print("Similarity(\"%s\", \"%s\") = %s" % (focus_sentence, sentence, 
                                               sentence_similarity(focus_sentence, sentence)))
    print("Similarity(\"%s\", \"%s\") = %s" % (sentence, focus_sentence, 
                                               sentence_similarity(sentence, focus_sentence)))
    print 
 

## References

### Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from Your Data, By Dipanjan Sarkar (Chapter 7: Semantic and Sentiment Analysis)

### Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit, By Steven Bird, Ewan Klein, and Edward Loper (exercises)