# Task 2: Semantic similarity
## WordNet similarity
Import NLTK and its WordNet tools, download WordNet data

In [33]:
import nltk
# nltk.download('wordnet')
from nltk.corpus import wordnet as wn

You can get a list of all synsets containing a specific word and also restrict the list by a part of speech.
Further, you can access the synset's `lemmas()` and the actual words using `name()`.

In [35]:
for s in wn.synsets('cat', pos=wn.NOUN):
    for l in s.lemmas():
        print(l.name(), end='|')
    print (' gloss:', s.definition())

cat|true_cat| gloss: feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats
guy|cat|hombre|bozo| gloss: an informal term for a youth or man
cat| gloss: a spiteful woman gossip
kat|khat|qat|quat|cat|Arabian_tea|African_tea| gloss: the leaves of the shrub Catha edulis which are chewed like tobacco or used to make tea; has the effect of a euphoric stimulant
cat-o'-nine-tails|cat| gloss: a whip with nine knotted cords
Caterpillar|cat| gloss: a large tracked vehicle that is propelled by two endless metal belts; frequently used for moving earth in construction and farm work
big_cat|cat| gloss: any of several large cats typically able to roar and living in the wild
computerized_tomography|computed_tomography|CT|computerized_axial_tomography|computed_axial_tomography|CAT| gloss: a method of examining body organs by scanning them with X rays and using a computer to construct a series of cross-sectional scans along a single axis


## Word embeddings
Import the `fasttext` library and load the model. 

*In my experience, it's faster to manually download/unzip the model and provide a path to the local file.*

In [16]:
import fasttext
model = fasttext.load_model("cc.en.300.bin")



Import *cosine distance* from scipy. Note that it returns *1 – cos(u,v)*, since it is a distance (not similarity) measure.

In [27]:
from scipy.spatial.distance import cosine 

Get vectors for specific words, calculate cosine similarity between them

In [28]:
v1 = model.get_sentence_vector('cat')
v2 = model.get_sentence_vector('dog')

In [29]:
1 - cosine(v1, v2)

0.707861065864563

In [30]:
v1 = model.get_sentence_vector('cat')
v2 = model.get_sentence_vector('rocket')

In [31]:
1 - cosine(v1, v2)

0.1682339906692505

Nearest neighbors and analogies

In [38]:
model.get_nearest_neighbors('dog')

[(0.8463464975357056, 'dogs'),
 (0.7873005270957947, 'puppy'),
 (0.7692237496376038, 'pup'),
 (0.7435278296470642, 'canine'),
 (0.733370840549469, 'pet'),
 (0.7326501607894897, 'doggie'),
 (0.7242385745048523, 'dog--'),
 (0.7231176495552063, 'beagle'),
 (0.7229929566383362, 'dachshund'),
 (0.7078613042831421, 'cat')]

In [48]:
model.get_analogies('rome', 'italy', 'france')

[(0.6936634182929993, 'paris'),
 (0.6143798828125, 'france.'),
 (0.6094319224357605, 'strasbourg'),
 (0.6016628742218018, 'versailles'),
 (0.5976198315620422, 'avignon'),
 (0.5809492468833923, 'paris.'),
 (0.5772124528884888, 'montpellier'),
 (0.5710934996604919, 'lyon'),
 (0.57098788022995, 'england'),
 (0.5672321915626526, 'rennes')]