# Ontology and Taxonomy

## Table of Contents

* [Install WordNet](#Install-WordNet)
* [Synsets](#Synsets)
* [Lemmas](#Lemmas)

## Reference

* https://www.nltk.org/howto/wordnet.html

## Install WordNet

Verify the SSL (Secure Sockets Layer) certificate:

In [30]:
import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

Install WordNet from [NLTK](https://www.nltk.org)

In [31]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /Users/jdchoi/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.


True

## Synsets

Retrieve all synsets of a word as the list of [Synset](https://www.nltk.org/api/nltk.corpus.reader.html?highlight=synset#nltk.corpus.reader.wordnet.Synset):

In [32]:
from nltk.corpus import wordnet as wn
wn.synsets('dog')

[Synset('dog.n.01'),
 Synset('frump.n.01'),
 Synset('dog.n.03'),
 Synset('cad.n.01'),
 Synset('frank.n.02'),
 Synset('pawl.n.01'),
 Synset('andiron.n.01'),
 Synset('chase.v.01')]

You can specify the part-of-speech (tag) of the word:

* `n`: noun
* `v`: verb
* `a`: adjective
* `r`: adverb

In [49]:
dogs = wn.synsets('dog', pos='n')
dogs

[Synset('dog.n.01'),
 Synset('frump.n.01'),
 Synset('dog.n.03'),
 Synset('cad.n.01'),
 Synset('frank.n.02'),
 Synset('pawl.n.01'),
 Synset('andiron.n.01')]

Retrieve the synset directly from the sense ID:

In [46]:
dog_0 = wn.synset('dog.n.01')

Synset == dogs[0]
Synset == dogs[1]

False

Retrieve the direct hyponyms

In [51]:
dog_0.hyponyms()

[Synset('basenji.n.01'),
 Synset('corgi.n.01'),
 Synset('cur.n.01'),
 Synset('dalmatian.n.02'),
 Synset('great_pyrenees.n.01'),
 Synset('griffon.n.02'),
 Synset('hunting_dog.n.01'),
 Synset('lapdog.n.01'),
 Synset('leonberg.n.01'),
 Synset('mexican_hairless.n.01'),
 Synset('newfoundland.n.01'),
 Synset('pooch.n.01'),
 Synset('poodle.n.01'),
 Synset('pug.n.01'),
 Synset('puppy.n.01'),
 Synset('spitz.n.01'),
 Synset('toy_dog.n.01'),
 Synset('working_dog.n.01')]

Retreive the direct hypernyms:

In [50]:
dog_0.hypernyms()

[Synset('canine.n.02'), Synset('domestic_animal.n.01')]

Retrieve all indrect hypernyms:

In [94]:
dog_0.hypernym_paths()

[[Synset('entity.n.01'),
  Synset('physical_entity.n.01'),
  Synset('object.n.01'),
  Synset('whole.n.02'),
  Synset('living_thing.n.01'),
  Synset('organism.n.01'),
  Synset('animal.n.01'),
  Synset('chordate.n.01'),
  Synset('vertebrate.n.01'),
  Synset('mammal.n.01'),
  Synset('placental.n.01'),
  Synset('carnivore.n.01'),
  Synset('canine.n.02'),
  Synset('dog.n.01')],
 [Synset('entity.n.01'),
  Synset('physical_entity.n.01'),
  Synset('object.n.01'),
  Synset('whole.n.02'),
  Synset('living_thing.n.01'),
  Synset('organism.n.01'),
  Synset('animal.n.01'),
  Synset('domestic_animal.n.01'),
  Synset('dog.n.01')]]

## Lemmas

Retrieve all [Lemmas](https://www.nltk.org/api/nltk.corpus.reader.html?highlight=lemma#nltk.corpus.reader.wordnet.Lemma) (synonyms) of the Synset

In [61]:
dog_0_lemmas = dog_0.lemmas()
dog_0_lemmas

[Lemma('dog.n.01.dog'),
 Lemma('dog.n.01.domestic_dog'),
 Lemma('dog.n.01.Canis_familiaris')]

Retrieve only the names:

In [63]:
for l in dog_0_lemmas:
    print(l.name())

dog
domestic_dog
Canis_familiaris


Retrieve the lemma directly from the sense ID:

In [71]:
dog_0_lemma = wn.lemma('dog.n.01.dog')
l = dog_0_lemma

Retrieve the frequency of the lemma:

In [73]:
for l in dog_0.lemmas():
    print(l.name(), l.count())

dog 42
domestic_dog 0
Canis_familiaris 0


### Exercise

Write a function that takes a word and an optional POS tag, and returns the set of all synonyms of the word:

```python
def synonyms(word: str, pos: Optional[str]=None, count: Optional[int]=0) -> Set[str]:
    # To be filled
```

In [76]:
from typing import Set, Optional

def synonyms(word: str, pos: Optional[str]=None, count: Optional[int]=0) -> Set[str]:
    syns = set()

    for synset in wn.synsets(word, pos):
        for lemma in synset.lemmas():
            if lemma.count() >= count:
                syns.add(lemma.name())

    return syns

In [78]:
synonyms('dog', 'n')

{'Canis_familiaris',
 'andiron',
 'blackguard',
 'bounder',
 'cad',
 'click',
 'detent',
 'dog',
 'dog-iron',
 'domestic_dog',
 'firedog',
 'frank',
 'frankfurter',
 'frump',
 'heel',
 'hot_dog',
 'hotdog',
 'hound',
 'pawl',
 'weenie',
 'wiener',
 'wienerwurst'}

In [79]:
synonyms('dog', 'n', 1)

{'dog', 'hound'}

In [80]:
synonyms('dog', count=1)

{'chase', 'dog', 'go_after', 'hound', 'track', 'trail'}

You can also rerieve antonyms of the lemma:

In [86]:
buy = wn.synset('buy.v.01')
for l in buy.lemmas():
    print(l.name(), l.antonyms())

buy [Lemma('sell.v.01.sell')]
purchase []


## Lowest Common Hypernyms

NLTK already provides a method to find the lowest common hypernyms:

In [89]:
dog = wn.synset('dog.n.01')
cat = wn.synset('cat.n.01')
dog.lowest_common_hypernyms(cat)

[Synset('carnivore.n.01')]

NLTK also provides methods to measure the similarty between two senses:

In [92]:
print(dog.path_similarity(cat))
print(dog.lch_similarity(cat))
print(dog.wup_similarity(cat))

0.2
2.0281482472922856
0.8571428571428571


### Exercise

Write a function that takes two sense IDs, finds the lowest common hypernyms, and returns the path from each lowest common hypernym to its root:

```python
def lch_paths(sense_0: str, sense_1: str) -> List[List[Synset]]:
    # To be filled
```

In [103]:
from nltk.corpus.reader import Synset

def lch_paths(sense_0: str, sense_1: str) -> List[List[Synset]]:
    synset_0 = wn.synset(sense_0)
    synset_1 = wn.synset(sense_1)
    hypernym_paths_0 = synset_0.hypernym_paths()
    lch = synset_0.lowest_common_hypernyms(synset_1)
    paths = []

    for hypernym in lch:
        for syn_list in hypernym_paths_0:
            i = next((i for i, syn in enumerate(syn_list) if syn == hypernym), -1)
            if i >= 0: paths.append(syn_list[:i+1])

    return paths

In [104]:
paths = lch_paths('dog.n.01', 'cat.n.01')
for path in paths:
    print(' -> '.join([syn.name() for syn in path]))

entity.n.01 -> physical_entity.n.01 -> object.n.01 -> whole.n.02 -> living_thing.n.01 -> organism.n.01 -> animal.n.01 -> chordate.n.01 -> vertebrate.n.01 -> mammal.n.01 -> placental.n.01 -> carnivore.n.01
