# Session 2, part III

- Representing words and meanings
- Language modeling

<img src="images/_99.jpg" width="100%">

Human annotated onthologies: The case of Wordnet
=======

[WordNet®](https://wordnet.princeton.edu/) is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.

WordNet superficially resembles a thesaurus, in that it groups words together based on their meanings. However, Wordnet presents some key nuances (see graph on the right).

<img src="images/_11.png" width="100%">

[Fragment of WordNet Concept Hierarchy](http://nltk.sourceforge.net/doc/en/ch01.html)

The anatomy of WordNet
=====================

**Structure**

The main relation among words in WordNet is synonymy, as between the words shut and close or car and automobile. Synonyms ― words that denote the same concept and are interchangeable in many contexts ― are grouped into unordered sets (synsets).

Each of WordNet’s 117 000 synsets is linked to other synsets by means of a small number of 'conceptual relations.' 

Additionally, a synset contains a brief definition ('gloss') and, in most cases, one or more short sentences illustrating the use of the synset members.

Word forms with several distinct meanings are represented in as many distinct synsets.

**Relations**

The most frequently encoded relation among synsets is the super-subordinate relation (also called hyperonymy, hyponymy or ISA relation).

It links more general synsets like $\texttt{furniture}$ or $\texttt{piece_of_furniture}$ to increasingly specific ones like $\texttt{bed}$ and $\texttt{bunkbed}$.

Thus, WordNet states that the category furniture includes bed, which in turn includes bunkbed; conversely, concepts like bed and bunkbed make up the category furniture.

WordNet in action: synonims
===============

In [7]:
'''
let's import wordnet from nltk.corpus

this assumes that you've downloaded the 'wordnet' corpus before

if not, you can do that with:

nltk.download('wordnet')
'''
from nltk.corpus import wordnet as wn

'''
# the tags adopted by nltk are not very informative
# let's make them self-explanatory
'''
poses = { 'n':'noun', 'v':'verb', 's':'adj (s)', 'a':'adj', 'r':'adv'}

for synset in wn.synsets("mesmerizing"):
    print("{}: {}".format(poses[synset.pos()], 
                          ", ".join([l.name() for l in synset.lemmas()])))

verb: magnetize, mesmerize, mesmerise, magnetise, bewitch, spellbind
verb: hypnotize, hypnotise, mesmerize, mesmerise
adj (s): hypnotic, mesmeric, mesmerizing, spellbinding


WordNet in action: hyperonimy relationship
===============

In [24]:
# from nltk.corpus import wordnet as wn
panda = wn.synset("bear.n.1")
hyper = lambda s: s.hypernyms()
list(panda.closure(hyper))

[Synset('carnivore.n.01'),
 Synset('placental.n.01'),
 Synset('mammal.n.01'),
 Synset('vertebrate.n.01'),
 Synset('chordate.n.01'),
 Synset('animal.n.01'),
 Synset('organism.n.01'),
 Synset('living_thing.n.01'),
 Synset('whole.n.02'),
 Synset('object.n.01'),
 Synset('physical_entity.n.01'),
 Synset('entity.n.01')]

So, is WordNet enough to do NLP?
============================

**PROS**

+ it bring usable meanings to machines (e.g., it can inform our chat-bot)
+ great as a resource for research & teaching 

### **CONS**

+ bottleneck: WordNet is an annotated dataset
+ it requires human labor to adapt
  - perhaps, it's impossible to keep-up-to date
  - at least, it misses new meanings of words
+ it is subjective
+ missing nuance (Manning, 2019): e.g. 'proficient' is listed as a synonym for 'good'.
+ it doesn't offer a continuous measure of word similarity