## TL;DR

- Different ways words are represented by computers
  - WordNet : manual labeling, traditional method
  - WordVectors
    - One-Hot Vectors
    - Word Vectors

## WordNet
- Wordnet is a lexical database of semantic relations between words in English first created by CogSys Lab of Princeton University.
- It includes N, V, ADJ, ADV but omits PREP, DET, and other function words.
- WordVec for other langauges exists too.

### WordNet Example

Downloading nltk and wordnet

In [7]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/joohunhyun/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [8]:
from nltk.corpus import wordnet as wn

print('Synsets for the word "invite" in WordNet:\n\n', wn.synsets('invite'))

Synsets for the word "invite" in WordNet:

 [Synset('invite.n.01'), Synset('invite.v.01'), Synset('invite.v.02'), Synset('tempt.v.03'), Synset('invite.v.04'), Synset('invite.v.05'), Synset('invite.v.06'), Synset('invite.v.07'), Synset('receive.v.05')]


In [9]:
# We can constrain the search by specifying the part of speech
# parts of speech available: ADJ, ADV, ADJ_SAT, NOUN, VERB
# ADJ_SAT: see https://stackoverflow.com/questions/18817396/what-part-of-speech-does-s-stand-for-in-wordnet-synsets

# Way one
print(f'{"-"*20}Way one{"-"*20}')
print('Synsets for the noun "invite" in WordNet:\n\n', wn.synsets('invite', pos=wn.NOUN))

# Way two
print(f'\n\n{"-"*20}Way two{"-"*20}')
# pos: {'n':'noun', 'v':'verb', 's':'adj (s)', 'a':'adj', 'r':'adv'}
print('Synsets for the noun "invite" in WordNet:\n\n', [s for s in wn.synsets('invite') if s.pos()=='n'])


--------------------Way one--------------------
Synsets for the noun "invite" in WordNet:

 [Synset('invite.n.01')]


--------------------Way two--------------------
Synsets for the noun "invite" in WordNet:

 [Synset('invite.n.01')]


In [10]:
# check definition of a synset
print(f'{"-"*20}Definition{"-"*20}')
print('The definition for invite as a noun:\n\n', wn.synset('invite.n.01').definition())

# check the related examples
print(f'\n\n{"-"*20}Examples{"-"*20}')
print('The definition for invite as a noun:\n\n', wn.synset('invite.n.01').examples())

# check the hypernyms
print(f'\n\n{"-"*20}Hypernyms{"-"*20}')
print('The hypernyms for invite as a noun:\n\n', wn.synset('invite.n.01').hypernyms())


--------------------Definition--------------------
The definition for invite as a noun:

 a colloquial expression for invitation


--------------------Examples--------------------
The definition for invite as a noun:

 ["he didn't get no invite to the party"]


--------------------Hypernyms--------------------
The hypernyms for invite as a noun:

 [Synset('invitation.n.01')]


### Limitations
- Requires human labor
  - Impossible to update every word
- Missing **nuance**
  - "proficient" is listed as a synoynm for "good"
- Misses new words
  - badass, nifty, etc
- Cannot compute word similarity accurately (score range : 0~1)

In [11]:
dog = wn.synset('dog.n.01')
cat = wn.synset('cat.n.01')
print('The path similarity between cat(noun) and dog(noun): ', dog.path_similarity(cat))

The path similarity between cat(noun) and dog(noun):  0.2


## Word Vectors