## Word2vec models

EstNLTK's resources include Word2vec models, which are described here: https://github.com/estnltk/word2vec-models

Use `ResourceView` to get an overview about models available for downloading:

In [1]:
from estnltk.resource_utils import ResourceView

ResourceView(name='word2vec')

name,description,license,downloaded
word2vec_lemmas_cbow_s100_2015-06-21,word2vec lemma-based embeddings model created by Alexander Tkachenko. More info: https://github.com/estnltk/word2vec-models (size: 174M),CC BY-SA 4.0,False
word2vec_lemmas_cbow_s200_2015-06-21,word2vec lemma-based embeddings model created by Alexander Tkachenko. More info: https://github.com/estnltk/word2vec-models (size: 342M),CC BY-SA 4.0,False
word2vec_lemmas_sg_s100_2015-06-21,word2vec lemma-based embeddings model created by Alexander Tkachenko. More info: https://github.com/estnltk/word2vec-models (size: 174M),CC BY-SA 4.0,True
word2vec_lemmas_sg_s200_2015-06-21,word2vec lemma-based embeddings model created by Alexander Tkachenko. More info: https://github.com/estnltk/word2vec-models (size: 342M),CC BY-SA 4.0,True
word2vec_words_cbow_s100_2015-06-21,word2vec word-based embeddings model created by Alexander Tkachenko. More info: https://github.com/estnltk/word2vec-models (size: 322M),CC BY-SA 4.0,False
word2vec_words_cbow_s200_2015-06-21,word2vec word-based embeddings model created by Alexander Tkachenko. More info: https://github.com/estnltk/word2vec-models (size: 633M),CC BY-SA 4.0,False
word2vec_words_sg_s100_2015-06-21,word2vec word-based embeddings model created by Alexander Tkachenko. More info: https://github.com/estnltk/word2vec-models (size: 322M),CC BY-SA 4.0,True
word2vec_words_sg_s200_2015-06-21,word2vec word-based embeddings model created by Alexander Tkachenko. More info: https://github.com/estnltk/word2vec-models (size: 633M),CC BY-SA 4.0,True


### Downloading models

In [1]:
# Download a specific word2vec model
from estnltk import download
download("word2vec_lemmas_cbow_s200_2015-06-21")

Downloading word2vec_lemmas_cbow_s200_2015-06-21: 333MB [00:03, 95.6MB/s] 


True

For downloading all Word2vec models, use:

```python
download("word2vec", only_latest=False)
```

Use function `get_resource_paths` to get the path to the downloaded model:

In [2]:
from estnltk import get_resource_paths
get_resource_paths("word2vec_lemmas_cbow_s200_2015-06-21", only_latest=True)

'C:\\Programmid\\Miniconda3\\envs\\py39_devel\\lib\\site-packages\\estnltk-1.7.0-py3.9-win-amd64.egg\\estnltk\\estnltk_resources\\word2vec\\embeddings_2015-06-21\\lemmas.cbow.s200.w2v.bin'

For getting paths to all Word2vec models (a list of paths), use:

```python
get_resource_paths("word2vec", only_latest=False)
```

### Using models

You can use models via [gensim package](https://radimrehurek.com/gensim/) (which needs to be [installed](https://radimrehurek.com/gensim/index.html#install) separately):

In [3]:
from gensim.models import KeyedVectors

model_path = get_resource_paths("word2vec_lemmas_cbow_s200_2015-06-21", only_latest=True)
model = KeyedVectors.load_word2vec_format(model_path, binary=True)

INFO:keyedvectors.py:2051: loading projection weights from C:\Programmid\Miniconda3\envs\py39_devel\lib\site-packages\estnltk-1.7.0-py3.9-win-amd64.egg\estnltk\estnltk_resources\word2vec\embeddings_2015-06-21\lemmas.cbow.s200.w2v.bin
INFO:utils.py:448: KeyedVectors lifecycle event {'msg': 'loaded (441391, 200) matrix of type float32 from C:\\Programmid\\Miniconda3\\envs\\py39_devel\\lib\\site-packages\\estnltk-1.7.0-py3.9-win-amd64.egg\\estnltk\\estnltk_resources\\word2vec\\embeddings_2015-06-21\\lemmas.cbow.s200.w2v.bin', 'binary': True, 'encoding': 'utf8', 'datetime': '2022-06-30T17:49:15.524810', 'gensim': '4.2.0', 'python': '3.9.12 (main, Apr  4 2022, 05:22:27) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.19043-SP0', 'event': 'load_word2vec_format'}


In [4]:
model.most_similar('harjumaa')

[('lääne-virumaa', 0.7403180003166199),
 ('järvamaa', 0.7391915321350098),
 ('tartumaa', 0.7278605699539185),
 ('pärnumaa', 0.7234846353530884),
 ('viljandimaa', 0.7169548869132996),
 ('ida-virumaa', 0.7112703919410706),
 ('raplamaa', 0.6816533803939819),
 ('läänemaa', 0.6799672842025757),
 ('jõgevamaa', 0.6791231632232666),
 ('põlvamaa', 0.6766212582588196)]