## Getting started with Wordnet
Wordnet4 is a lexical database created at Princeton University. Due to its size and features, Wordnet is one of the most useful tools you can have in your NLP arsenal.


### Wordnet Structure
You can think of Wordnet as a graph of concepts. The edges between the concepts
describe the relationship between them. A concept node is, in fact, a set of synonyms
representing the same concept, and is called a Synset. The synonyms within the
synset are represented by their Lemma. A Lemma is the base form (the dictionary
form) of the word.
Using this type of structure, we are able to build mechanisms that can reason. Let’s
take an example: I can say either “Bob drives the car” or “Bob dives the automobile”,
both being true because the nouns “car” and “automobile” are synonyms. Also, “Bob
drives the vehicle” can be true since all cars are vehicles.
NLTK provides one of the best interfaces for Wordnet. It is extremely easy to use
and before we dive into it, make this exercise: try to write down all the meanings
of the word “car”.

**Getting started with Wordnet**

In [2]:
from nltk.corpus import wordnet as wn
# Let's investigate what are the various synsets for `car`
# Remember that each synset represents a separate sense of the word `car`
for car in wn.synsets('car'):
    print([l.name() for l in car.lemmas()])
    print(car.definition())
    print()

['car', 'auto', 'automobile', 'machine', 'motorcar']
a motor vehicle with four wheels; usually propelled by an internal combustion engine

['car', 'railcar', 'railway_car', 'railroad_car']
a wheeled vehicle adapted to the rails of railroad

['car', 'gondola']
the compartment that is suspended from an airship and that carries personnel and the cargo and the power plant

['car', 'elevator_car']
where passengers ride up and down

['cable_car', 'car']
a conveyance for passengers or freight on a cable railway



The most important relationships in Wordnet other than the synonymy relationship
are the hyponymy/hypernymy. Hyponyms are more specific concepts while
hypernyms are more general concepts.
Let’s build a concept tree. Due to aesthetics, we won’t use all the concepts because
it would be hard to visualize. I have conveniently chosen only 4 hyponyms of the
concept Vehicle and 5 more for its Wheeled-Vehicle hyponym:

**Wordnet concept tree**

In [8]:
import os
import matplotlib as mpl
if os.environ.get('DISPLAY','') == '':
    print('no display found. Using non-interactive Agg backend')
    mpl.use('Agg')
import matplotlib.pyplot as plt
%matplotlib inline

no display found. Using non-interactive Agg backend


This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

The backend was *originally* set to 'module://ipykernel.pylab.backend_inline' by the following code:
  File "/home/frank/miniconda3/envs/nlp/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/frank/miniconda3/envs/nlp/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/frank/miniconda3/envs/nlp/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/frank/miniconda3/envs/nlp/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/frank/miniconda3/envs/nlp/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 486, in start
    self.io_loop.start()
  File "/home/frank/m

In [9]:
import nltk
from nltk.corpus import wordnet as wn
# Let's get the first sense of vehicle
vehicle = wn.synsets('vehicle')[0]
# Let's build a concept tree
t = nltk.Tree(vehicle.name(), children=[
    nltk.Tree(vehicle.hyponyms()[3].name(), children=[]),
    nltk.Tree(vehicle.hyponyms()[4].name(), children=[]),
    nltk.Tree(vehicle.hyponyms()[5].name(), children=[]),
    nltk.Tree(vehicle.hyponyms()[7].name(), children=[
        nltk.Tree(vehicle.hyponyms()[7].hyponyms()[1].name(), children=[]),
        nltk.Tree(vehicle.hyponyms()[7].hyponyms()[3].name(), children=[]),
        nltk.Tree(vehicle.hyponyms()[7].hyponyms()[4].name(), children=[]),
        nltk.Tree(vehicle.hyponyms()[7].hyponyms()[5].name(), children=[]),
        nltk.Tree(vehicle.hyponyms()[7].hyponyms()[6].name(), children=[]),]),])


t.draw()

TclError: no display name and no $DISPLAY environment variable

Maybe it’s not obvious from the previous queries, but synsets have an associated
part of speech. You can see it in the visualization on node labels: vehicle.n.01 where
n stands for noun. We can perform generic queries without specifying the part of
speech like this:

**Querying Wordnet for synsets**

In [10]:
from nltk.corpus import wordnet as wn
print(wn.synsets('fight'))
# [
# Synset('battle.n.01'),
# Synset('fight.n.02'),
# Synset('competitiveness.n.01'),
# Synset('fight.n.04'),
# Synset('fight.n.05'),
# Synset('contend.v.06'),
# Synset('fight.v.02'),
# Synset('fight.v.03'),
#

[Synset('battle.n.01'), Synset('fight.n.02'), Synset('competitiveness.n.01'), Synset('fight.n.04'), Synset('fight.n.05'), Synset('contend.v.06'), Synset('fight.v.02'), Synset('fight.v.03'), Synset('crusade.v.01')]


We can also perform particular queries, specifying the part of speech like this:

**Querying Wordnet for synsets and filtering by part of speech**


In [11]:
from nltk.corpus import wordnet as wn
print(wn.synsets('fight', wn.NOUN))
# [
# Synset('battle.n.01'),
# Synset('fight.n.02'),
# Synset('competitiveness.n.01'),
# Synset('fight.n.04'),
# Synset('fight.n.05')
# ]

[Synset('battle.n.01'), Synset('fight.n.02'), Synset('competitiveness.n.01'), Synset('fight.n.04'), Synset('fight.n.05')]


Moreover, we can perform a query for a very specific synset, like this one:

**Fetching specific synset**

In [12]:
from nltk.corpus import wordnet as wn
# Synset id format = {lemma}.{part_of_speech}.{sense_number}
walk = wn.synset('walk.v.01')

### Lemma Operations
Until now, we’ve been looking into Wordnet synsets relationships, but lemmas have
some interesting properties as well. To begin with, the lemmas in a synset are sorted
by their frequency, like this:

**Lemma operations**

In [13]:
from nltk.corpus import wordnet as wn
talk = wn.synset('talk.v.01')
print({lemma.name(): lemma.count() for lemma in talk.lemmas()})
# {'talk': 108, 'speak': 53}
# Get antonyms for the adjective `beautiful`
beautiful = wn.synset('beautiful.a.01')
print(beautiful.lemmas()[0].antonyms())
# Lemma('ugly.a.01.ugly')
# Get the derivationally related forms of a lemma
able = wn.synset('able.a.01')
print(able.lemmas()[0].derivationally_related_forms())
# [Lemma('ability.n.02.ability'), Lemma('ability.n.01.ability')]

{'talk': 108, 'speak': 53}
[Lemma('ugly.a.01.ugly')]
[Lemma('ability.n.02.ability'), Lemma('ability.n.01.ability')]


As we discovered so far, Wordnet is great because it gives us a way of getting
synonyms, antonyms, different senses of a word, related words, how common a
word is and so on. Let’s go even further and discover some more useful features of
Wordnet.