## WordNet

WordNet is a semantically-oriented dictionary of English, similar to a traditional thesaurus but with a richer structure. NLTK includes the English WordNet, with 155,287 words and 117,659 synonym sets. We'll begin by looking at synonyms and how they are accessed in WordNet.



###  Senses and Synonyms

Consider the sentence below. If we replace the word motorcar in by automobile, the meaning of the sentence stays pretty much the same:

* Benz is credited with the invention of the motorcar.
* Benz is credited with the invention of the automobile.

Since everything else in the sentence has remained unchanged, we can conclude that the words motorcar and automobile have the same meaning, i.e. they are **synonyms**. We can explore these words with the help of WordNet:

In [1]:
from nltk.corpus import wordnet as wn
wn.synsets('motorcar')

[Synset('car.n.01')]

Thus, motorcar has just one possible meaning and it is identified as car.n.01, the first noun sense of car. The entity car.n.01 is called a **synset**, or **"synonym set"**, a collection of synonymous words (or "lemmas"):

In [2]:
 wn.synset('car.n.01').lemma_names()

['car', 'auto', 'automobile', 'machine', 'motorcar']

Each word of a synset can have several meanings, e.g., car can also signify a train carriage, a gondola, or an elevator car. However, we are only interested in the single meaning that is common to all words of the above synset. Synsets also come with a prose definition and some example sentences:

In [3]:
wn.synset('car.n.01').definition()

'a motor vehicle with four wheels; usually propelled by an internal combustion engine'

In [4]:
wn.synset('car.n.01').examples()

['he needs a car to get to work']

Although definitions help humans to understand the intended meaning of a synset, the words of the synset are often more useful for our programs. To eliminate ambiguity, we will identify these words as car.n.01.automobile, car.n.01.motorcar, and so on. This pairing of a synset with a word is called a **lemma**. We can get all the lemmas for a given synset, look up a particular lemma, get the synset corresponding to a lemma, and get the "name" of a lemma:

##### What are the lemmas for the synset car 

In [5]:
wn.synset('car.n.01').lemmas() 

[Lemma('car.n.01.car'),
 Lemma('car.n.01.auto'),
 Lemma('car.n.01.automobile'),
 Lemma('car.n.01.machine'),
 Lemma('car.n.01.motorcar')]

#### look up car,n.01.automobile

In [6]:
wn.lemma('car.n.01.automobile')

Lemma('car.n.01.automobile')

#### What synset does the lemma car.n.01.automobile  belong to?

In [7]:
wn.lemma('car.n.01.automobile').synset()

Synset('car.n.01')

In [11]:
#### and what is it's name?

In [8]:
wn.lemma('car.n.01.automobile').name()

'automobile'

Now let's analyze the word `car`, which has multiple **senses** (ie., meanings of the word)

In [12]:
wn.synsets('car')

[Synset('car.n.01'),
 Synset('car.n.02'),
 Synset('car.n.03'),
 Synset('car.n.04'),
 Synset('cable_car.n.01')]

#### What are all the different types of 'cars'? and their definitions and examples?

In [13]:
senses = [(s.lemma_names(), s.definition(), s.examples()) for s in wn.synsets('car')]
for s in senses:
    print("Lemma name:", s[0])
    print("Definition:", s[1])
    print("Examples  :", s[2])
    print("=======================")

Lemma name: ['car', 'auto', 'automobile', 'machine', 'motorcar']
Definition: a motor vehicle with four wheels; usually propelled by an internal combustion engine
Examples  : ['he needs a car to get to work']
Lemma name: ['car', 'railcar', 'railway_car', 'railroad_car']
Definition: a wheeled vehicle adapted to the rails of railroad
Examples  : ['three cars had jumped the rails']
Lemma name: ['car', 'gondola']
Definition: the compartment that is suspended from an airship and that carries personnel and the cargo and the power plant
Examples  : []
Lemma name: ['car', 'elevator_car']
Definition: where passengers ride up and down
Examples  : ['the car was on the top floor']
Lemma name: ['cable_car', 'car']
Definition: a conveyance for passengers or freight on a cable railway
Examples  : ['they took a cable car to the top of the mountain']


#### and the different meanings for 'race'?

In [11]:
wn.synsets('race')

[Synset('race.n.01'),
 Synset('race.n.02'),
 Synset('race.n.03'),
 Synset('subspecies.n.01'),
 Synset('slipstream.n.01'),
 Synset('raceway.n.01'),
 Synset('rush.v.01'),
 Synset('race.v.02'),
 Synset('race.v.03'),
 Synset('race.v.04')]

In [12]:
## your code here: Analyze the word "bank"
senses = [(s.lemma_names(), s.definition(), s.examples()) for s in wn.synsets('race')]
for s in senses:
    print("Lemma name:", s[0])
    print("Definition:", s[1])
    print("Examples  :", s[2])
    print("=======================")

Lemma name: ['race']
Definition: any competition
Examples  : ['the race for the presidency']
Lemma name: ['race']
Definition: a contest of speed
Examples  : ['the race is to the swift']
Lemma name: ['race']
Definition: people who are believed to belong to the same genetic stock
Examples  : ['some biologists doubt that there are important genetic differences between races of human beings']
Lemma name: ['subspecies', 'race']
Definition: (biology) a taxonomic group that is a division of a species; usually arises as a consequence of geographical isolation within a species
Examples  : []
Lemma name: ['slipstream', 'airstream', 'race', 'backwash', 'wash']
Definition: the flow of air that is driven backwards by an aircraft propeller
Examples  : []
Lemma name: ['raceway', 'race']
Definition: a canal for a current of water
Examples  : []
Lemma name: ['rush', 'hotfoot', 'hasten', 'hie', 'speed', 'race', 'pelt_along', 'rush_along', 'cannonball_along', 'bucket_along', 'belt_along', 'step_on_it']
D

### The WordNet Hierarchy

WordNet synsets correspond to abstract concepts, and they don't always have corresponding words in English. These concepts are linked together in a hierarchy. Some concepts are very general, such as Entity, State, Event — these are called unique beginners or root synsets. Others, such as gas guzzler and hatchback, are much more specific. A small portion of a concept hierarchy is illustrated below:

<img src="http://www.nltk.org/images/wordnet-hierarchy.png" width="50%">

### Hyponyms

WordNet makes it easy to navigate between concepts. For example, given a concept like motorcar, we can look at the concepts that are more specific; the (immediate) hyponyms.

In [16]:
motorcar = wn.synset('car.n.01')

In [17]:
types_of_motorcar = motorcar.hyponyms()
types_of_motorcar

[Synset('ambulance.n.01'),
 Synset('beach_wagon.n.01'),
 Synset('bus.n.04'),
 Synset('cab.n.03'),
 Synset('compact.n.03'),
 Synset('convertible.n.01'),
 Synset('coupe.n.01'),
 Synset('cruiser.n.01'),
 Synset('electric.n.01'),
 Synset('gas_guzzler.n.01'),
 Synset('hardtop.n.01'),
 Synset('hatchback.n.01'),
 Synset('horseless_carriage.n.01'),
 Synset('hot_rod.n.01'),
 Synset('jeep.n.01'),
 Synset('limousine.n.01'),
 Synset('loaner.n.02'),
 Synset('minicar.n.01'),
 Synset('minivan.n.01'),
 Synset('model_t.n.01'),
 Synset('pace_car.n.01'),
 Synset('racer.n.02'),
 Synset('roadster.n.01'),
 Synset('sedan.n.01'),
 Synset('sport_utility.n.01'),
 Synset('sports_car.n.01'),
 Synset('stanley_steamer.n.01'),
 Synset('stock_car.n.01'),
 Synset('subcompact.n.01'),
 Synset('touring_car.n.01'),
 Synset('used-car.n.01')]

In [18]:
types_of_motorcar[0]

Synset('ambulance.n.01')

Let's go over all the types of motorcar, and for each of the returned `synsets` (synonym sets), create a list of lemmas.

In [19]:
print(sorted(lemma.name() 
             for synset in types_of_motorcar 
                 for lemma in synset.lemmas()))

['Model_T', 'S.U.V.', 'SUV', 'Stanley_Steamer', 'ambulance', 'beach_waggon', 'beach_wagon', 'bus', 'cab', 'compact', 'compact_car', 'convertible', 'coupe', 'cruiser', 'electric', 'electric_automobile', 'electric_car', 'estate_car', 'gas_guzzler', 'hack', 'hardtop', 'hatchback', 'heap', 'horseless_carriage', 'hot-rod', 'hot_rod', 'jalopy', 'jeep', 'landrover', 'limo', 'limousine', 'loaner', 'minicar', 'minivan', 'pace_car', 'patrol_car', 'phaeton', 'police_car', 'police_cruiser', 'prowl_car', 'race_car', 'racer', 'racing_car', 'roadster', 'runabout', 'saloon', 'secondhand_car', 'sedan', 'sport_car', 'sport_utility', 'sport_utility_vehicle', 'sports_car', 'squad_car', 'station_waggon', 'station_wagon', 'stock_car', 'subcompact', 'subcompact_car', 'taxi', 'taxicab', 'tourer', 'touring_car', 'two-seater', 'used-car', 'waggon', 'wagon']


### Hypernyms

We can also navigate up the hierarchy by visiting hypernyms. Some words have multiple paths, because they can be classified in more than one way. There are two paths between car.n.01 and entity.n.01 because wheeled_vehicle.n.01 can be classified as both a vehicle and a container.

In [21]:
motorcar.hypernyms()
paths = motorcar.hypernym_paths()
len(paths)

2

In [22]:
print([synset.name() for synset in paths[0]])

['entity.n.01', 'physical_entity.n.01', 'object.n.01', 'whole.n.02', 'artifact.n.01', 'instrumentality.n.03', 'container.n.01', 'wheeled_vehicle.n.01', 'self-propelled_vehicle.n.01', 'motor_vehicle.n.01', 'car.n.01']


In [23]:
print([synset.name() for synset in paths[1]])

['entity.n.01', 'physical_entity.n.01', 'object.n.01', 'whole.n.02', 'artifact.n.01', 'instrumentality.n.03', 'conveyance.n.03', 'vehicle.n.01', 'wheeled_vehicle.n.01', 'self-propelled_vehicle.n.01', 'motor_vehicle.n.01', 'car.n.01']


### More Lexical Relations: Meronyms, Holonyms, Antonyms, Entailment

Hypernyms and hyponyms are called lexical relations because they relate one synset to another. These two relations navigate up and down the "is-a" hierarchy. Another important way to navigate the WordNet network is from items to their components (**meronyms**) or to the things they are contained in (**holonyms**). For example, the parts of a tree are its trunk, crown, and so on; the part_meronyms(). The substance a tree is made of includes heartwood and sapwood; the substance_meronyms(). A collection of trees forms a forest; the member_holonyms():

In [27]:
wn.synset('tree.n.01').part_meronyms()

[Synset('burl.n.02'),
 Synset('crown.n.07'),
 Synset('limb.n.02'),
 Synset('stump.n.01'),
 Synset('trunk.n.01')]

In [28]:
wn.synset('tree.n.01').member_holonyms()

[Synset('forest.n.01')]

In [29]:
wn.synset('tree.n.01').substance_meronyms()

[Synset('heartwood.n.01'), Synset('sapwood.n.01')]

In [30]:
wn.synsets("USA")

[Synset('united_states.n.01'), Synset('united_states_army.n.01')]

In [31]:
wn.synset('united_states.n.01').part_meronyms()

[Synset('alabama.n.01'),
 Synset('alaska.n.01'),
 Synset('american_state.n.01'),
 Synset('arizona.n.01'),
 Synset('arkansas.n.01'),
 Synset('california.n.01'),
 Synset('colony.n.03'),
 Synset('colorado.n.01'),
 Synset('connecticut.n.01'),
 Synset('connecticut.n.02'),
 Synset('dakota.n.02'),
 Synset('delaware.n.04'),
 Synset('district_of_columbia.n.01'),
 Synset('east.n.03'),
 Synset('florida.n.01'),
 Synset('georgia.n.01'),
 Synset('great_lakes.n.01'),
 Synset('hawaii.n.01'),
 Synset('idaho.n.01'),
 Synset('illinois.n.01'),
 Synset('indiana.n.01'),
 Synset('iowa.n.02'),
 Synset('kansas.n.01'),
 Synset('kentucky.n.01'),
 Synset('louisiana.n.01'),
 Synset('louisiana_purchase.n.01'),
 Synset('maine.n.01'),
 Synset('maryland.n.01'),
 Synset('massachusetts.n.01'),
 Synset('michigan.n.01'),
 Synset('mid-atlantic_states.n.01'),
 Synset('midwest.n.01'),
 Synset('minnesota.n.01'),
 Synset('mississippi.n.01'),
 Synset('mississippi.n.02'),
 Synset('missouri.n.01'),
 Synset('missouri.n.02'),
 Syns

To see just how intricate things can get, consider the word mint, which has several closely-related senses. We can see that mint.n.04 is part of mint.n.02 and the substance from which mint.n.05 is made.

In [32]:
for synset in wn.synsets('mint', wn.NOUN):
    print(synset.name() + ':', synset.definition())

batch.n.02: (often followed by `of') a large number or amount or extent
mint.n.02: any north temperate plant of the genus Mentha with aromatic leaves and small mauve flowers
mint.n.03: any member of the mint family of plants
mint.n.04: the leaves of a mint plant used fresh or candied
mint.n.05: a candy that is flavored with a mint oil
mint.n.06: a plant where money is coined by authority of the government


In [33]:
wn.synset('mint.n.04').part_holonyms()

[Synset('mint.n.02')]

In [34]:
wn.synset('mint.n.04').substance_holonyms()

[Synset('mint.n.05')]

### Exercise with Meronyms

* Find the meronyms of the human body
* Iterate so that you can find all the meronyms of the meronyms, and so on


In [28]:
# Get the synset for "human"

# Select the right synset, and then get the meronyms for that synset using the part_meronyms() function

# Repeat the process for the returned synset. You will need to write a function that takes as input
# a synset, gets its meronyms, and processes them again using the same function. This is a technique
# called "recursion"

In [35]:
wn.synsets("human")

[Synset('homo.n.02'),
 Synset('human.a.01'),
 Synset('human.a.02'),
 Synset('human.a.03')]

#### Solution 

In [36]:
wn.synsets("human")

[Synset('homo.n.02'),
 Synset('human.a.01'),
 Synset('human.a.02'),
 Synset('human.a.03')]

In [37]:
human = wn.synset('homo.n.02')
human.part_meronyms()

[Synset('arm.n.01'),
 Synset('body_hair.n.01'),
 Synset('face.n.01'),
 Synset('foot.n.01'),
 Synset('hand.n.01'),
 Synset('human_body.n.01'),
 Synset('human_head.n.01'),
 Synset('loin.n.02'),
 Synset('mane.n.02')]

In [38]:
def find_meronyms(synset):
    result = []
    meronyms = synset.part_meronyms()
    
    if len(meronyms) == 0:
        # This one has no meronyms
        return result
    for part in meronyms:
        # Append the meronym in the results
        result.append(part)
        # 
        part_meronyms = find_meronyms(part)
        result.extend(part_meronyms)

    return result

human_parts = set(find_meronyms(human))

In [40]:
human_parts_lemmas = set()
for human_part in human_parts:
    for lemma in human_part.lemmas():
        human_parts_lemmas.add(lemma)
        print(lemma)

Lemma('wrist.n.01.wrist')
Lemma('wrist.n.01.carpus')
Lemma('wrist.n.01.wrist_joint')
Lemma('wrist.n.01.radiocarpal_joint')
Lemma('wrist.n.01.articulatio_radiocarpea')
Lemma('visual_purple.n.01.visual_purple')
Lemma('visual_purple.n.01.rhodopsin')
Lemma('visual_purple.n.01.retinal_purple')
Lemma('tastebud.n.01.tastebud')
Lemma('tastebud.n.01.taste_bud')
Lemma('tastebud.n.01.gustatory_organ')
Lemma('lacrimal_apparatus.n.01.lacrimal_apparatus')
Lemma('fingertip.n.01.fingertip')
Lemma('iodopsin.n.01.iodopsin')
Lemma('venae_palpebrales.n.01.venae_palpebrales')
Lemma('lingual_vein.n.01.lingual_vein')
Lemma('lingual_vein.n.01.vena_lingualis')
Lemma('dentition.n.02.dentition')
Lemma('dentition.n.02.teeth')
Lemma('iris.n.02.iris')
Lemma('ocular_muscle.n.01.ocular_muscle')
Lemma('ocular_muscle.n.01.eye_muscle')
Lemma('olecranon.n.01.olecranon')
Lemma('olecranon.n.01.olecranon_process')
Lemma('chin.n.01.chin')
Lemma('chin.n.01.mentum')
Lemma('deltoid_tuberosity.n.01.deltoid_tuberosity')
Lemma('de

In [41]:
for lemma in human_parts_lemmas:
    print(lemma.name())

retina
tongue
face_fungus
choroid
internasal_suture
facial
musculus_anconeus
nervus_facialis
rima_oris
anatomy
facial_nerve
buccinator_muscle
vena_metatarsus
carpus
human_foot
anterior_naris
arteria_metatarsea
bridge
lip
arteria_lacrimalis
lid
hallux
canthus
eyelash
ulnar_vein
cone_cell
ciliary_artery
vena_lacrimalis
elbow
central_artery_of_the_retina
lens_cortex
arteria_buccalis
ocular_muscle
os_tarsi_fibulare
hand
palpebra
arteria_metacarpea
lens_capsule
carpal_bone
yellow_spot
lingual_vein
arteria_labialis
intercapitular_vein
seventh_cranial_nerve
crinion
epicanthic_fold
form
tendon_of_Achilles
lacrimal_sac
gustatory_organ
arteria_ethmoidalis
eyebrow
third_eyelid
buccal_cavity
Achilles_tendon
humerus
chin
little_toe
sutura_internasalis
turbinal
choroid_coat
oculus
taste_cell
arm
rod_cell
moustache
head_of_hair
wrist_bone
iris
olecranon_process
tear_sac
musculus_biceps_brachii
lacrimal_artery
rhodopsin
pate
musculus_sphincter_pupillae
salivary_gland
macular_area
shape
saliva
nervus_u

### Entailment

There are also relationships between verbs. For example, the act of walking involves the act of stepping, so walking **entails** stepping. Some verbs have multiple entailments:

In [42]:
wn.synset('walk.v.01').entailments()

[Synset('step.v.01')]

In [43]:
wn.synset('eat.v.01').entailments()

[Synset('chew.v.01'), Synset('swallow.v.01')]

In [44]:
wn.synset('tease.v.03').entailments()

[Synset('arouse.v.07'), Synset('disappoint.v.01')]

### Antonyms

Some lexical relationships hold between lemmas, e.g., **antonymy**:

In [45]:
wn.lemma('supply.n.02.supply').antonyms()

[Lemma('demand.n.02.demand')]

In [46]:
wn.lemma('rush.v.01.rush').antonyms()

[Lemma('linger.v.04.linger')]

In [47]:
wn.lemma('horizontal.a.01.horizontal').antonyms()

[Lemma('vertical.a.01.vertical'), Lemma('inclined.a.02.inclined')]

In [48]:
wn.lemma('staccato.r.01.staccato').antonyms()

[Lemma('legato.r.01.legato')]

You can see the (numerous!) lexical relations, and the other methods defined on a synset, using dir(), for example: dir(wn.synset('harmony.n.02')).

In [49]:
print(dir(wn.synset('harmony.n.02')))

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__unicode__', '__weakref__', '_all_hypernyms', '_definition', '_examples', '_frame_ids', '_hypernyms', '_instance_hypernyms', '_iter_hypernym_lists', '_lemma_names', '_lemma_pointers', '_lemmas', '_lexname', '_max_depth', '_min_depth', '_name', '_needs_root', '_offset', '_pointers', '_pos', '_related', '_shortest_hypernym_paths', '_wordnet_corpus_reader', 'also_sees', 'attributes', 'causes', 'closure', 'common_hypernyms', 'definition', 'entailments', 'examples', 'frame_ids', 'hypernym_distances', 'hypernym_paths', 'hypernyms', 'hyponyms', 'instance_hypernyms', 'instance_hyponyms', 'jcn_similarity', 'lch_similarity', 'lemma_names', 'lemmas', 'lexnam

### Semantic similarity

We have seen that synsets are linked by a complex network of lexical relations. Given a particular synset, we can traverse the WordNet network to find synsets with related meanings. Knowing which words are semantically related is useful for indexing a collection of texts, so that a search for a general term like vehicle will match documents containing specific terms like limousine.

Recall that each synset has one or more hypernym paths that link it to a root hypernym such as entity.n.01. Two synsets linked to the same root may have several hypernyms in common. If two synsets share a very specific hypernym — one that is low down in the hypernym hierarchy — they must be closely related.

In [52]:
right = wn.synset('right_whale.n.01')
orca = wn.synset('orca.n.01')
minke = wn.synset('minke_whale.n.01')
tortoise = wn.synset('tortoise.n.01')
novel = wn.synset('novel.n.01')

In [53]:
right.lowest_common_hypernyms(minke)

[Synset('baleen_whale.n.01')]

In [54]:
right.lowest_common_hypernyms(orca)

[Synset('whale.n.02')]

In [55]:
right.lowest_common_hypernyms(tortoise)

[Synset('vertebrate.n.01')]

In [56]:
right.lowest_common_hypernyms(novel)

[Synset('entity.n.01')]

Of course we know that whale is very specific (and baleen whale even more so), while vertebrate is more general and entity is completely general. We can quantify this concept of generality by looking up the depth of each synset:

In [57]:
rwhale_minke = right.lowest_common_hypernyms(minke)
rwhale_minke[0].min_depth()

14

In [58]:
rwhale_orca = right.lowest_common_hypernyms(orca)
rwhale_orca[0].min_depth()

13

In [59]:
rwhale_vertebrate = right.lowest_common_hypernyms(tortoise)
rwhale_vertebrate[0].min_depth()

8

In [60]:
rwhale_novel = right.lowest_common_hypernyms(novel)
rwhale_novel[0].min_depth()

0

Similarity measures have been defined over the collection of WordNet synsets which incorporate the above insight. For example, path_similarity assigns a score in the range 0–1 based on the shortest path that connects the concepts in the hypernym hierarchy (-1 is returned in those cases where a path cannot be found). Comparing a synset with itself will return 1. Consider the following similarity scores, relating right whale to minke whale, orca, tortoise, and novel. Although the numbers won't mean much, they decrease as we move away from the semantic space of sea creatures to inanimate objects.

In [61]:
print("Right whale - Minke :", right.path_similarity(minke))
print("Right whale - Orca :", right.path_similarity(orca))
print("Right whale - Tortoise :", right.path_similarity(tortoise))
print("Right whale - Novel :", right.path_similarity(novel))

Right whale - Minke : 0.25
Right whale - Orca : 0.16666666666666666
Right whale - Tortoise : 0.07692307692307693
Right whale - Novel : 0.043478260869565216
