This notebook explores the use of retrofitted word vectors (vectors that have been transformed to respect proximity constraints in resources like WordNet (so that words in the same synsets are encouraged to have similar representations).  See [Faruqui et al. 2015](https://github.com/mfaruqui/retrofitting).

In [None]:
from gensim.scripts.glove2word2vec import glove2word2vec
from gensim.models import Word2Vec, KeyedVectors

In [None]:
glove_file="../data/glove.42B.300d.50K.txt"
original_file="../data/glove.42B.300d.50K.w2v.txt"
_ = glove2word2vec(glove_file, original_file)

Download vectors that have already been retrofit [here](https://drive.google.com/file/d/1sr0xEUzlLtjbrs0NY4-vek60SY7Gk9bQ/view?usp=sharing).  These vectors have been fit using the code of [Faruqui et al. 2015](https://github.com/mfaruqui/retrofitting):

```sh
python retrofit.py -i glove.42B.300d.50K.txt -l lexicons/wordnet-synonyms.txt -n 10 -o glove.42B.300d.50K.txt.retrofit
```

In [None]:
glove_file="../data/glove.42B.300d.50K.txt.retrofit"
retrofit_file="../data/glove.42B.300d.50K.w2v.txt.retrofit"
_ = glove2word2vec(glove_file, retrofit_file)

In [None]:
original = KeyedVectors.load_word2vec_format(original_file, binary=False)

In [None]:
retrofit = KeyedVectors.load_word2vec_format(retrofit_file, binary=False)

Explore these two vectors to see how the retrofit vectors encode the similarity found in WordNet in their nearest neighbors.

In [None]:
original.most_similar("hate", topn=10)

In [None]:
retrofit.most_similar("hate", topn=10)