## EstNLTK's resources

As of version 1.7.0, EstNLTK contains tools for automatically downloading resources required by taggers and other tools. 
Resources are usually large (model) files, which are not included in the package and can be downloaded on user's demand. 

### In a nutshell

For manually downloading a resource, use the function `download`:

In [1]:
from estnltk import download
# Download models for UDPipeTagger
download('udpipetagger')

Downloading resources index: 3it [00:00, 497.56it/s]
Downloading udpipe_syntax_2021-05-29: 39925it [00:05, 6955.76it/s] 


Unpacked resource into subfolder 'udpipe_syntax/models_2021-05-29/' of the resources dir.


True

The function returns True if the downloading was successful or if the resource already exists, and False otherwise.

You can download a resource either by its alias (like in previous example: 'udpipetagger') or by its specific version name ('udpipe_syntax_2021-05-29' in the previous example).
Resource names and aliases can be looked up from the resources index json file: https://github.com/estnltk/estnltk_resources.
The index file also gives other information about resources, such as resource size, url, license, and unpacking path relative to the resources directory.

**Downloading all resources.** By default, only one version -- the latest version -- of the resource will be downloaded, even if there are multiple resources available.
However, if you set `only_latest=False`, then all resources with the given alias will be downloaded:

In [2]:
# Download all word2vec skip-gram models (alias: 'word2vec_sg')
download('word2vec_sg', only_latest=False)

Downloading word2vec_lemmas_sg_s100_2015-06-21: 165006it [00:21, 7615.01it/s] 
Downloading word2vec_lemmas_sg_s200_2015-06-21: 325253it [00:41, 7862.16it/s] 
Downloading word2vec_words_sg_s100_2015-06-21: 305214it [00:30, 10048.17it/s]
Downloading word2vec_words_sg_s200_2015-06-21: 601937it [01:11, 8434.39it/s] 


True

### Where to find downloaded resources?

EstNLTK provides function `get_resource_paths`, which returns a list of all paths to downloaded resources associated with the given name or alias:

In [3]:
from estnltk import get_resource_paths
# Get paths to downloaded UDPipeTagger's models
get_resource_paths('udpipetagger')

['C:\\Programmid\\Miniconda3\\envs\\py37_estnltk_neural\\lib\\site-packages\\estnltk\\estnltk_resources\\udpipe_syntax\\models_2021-05-29\\']

In [4]:
# Get paths to downloaded word2vec skip-gram models
get_resource_paths('word2vec_sg')

['C:\\Programmid\\Miniconda3\\envs\\py37_estnltk_neural\\lib\\site-packages\\estnltk\\estnltk_resources\\word2vec\\embeddings_2015-06-21\\lemmas.sg.s100.w2v.bin',
 'C:\\Programmid\\Miniconda3\\envs\\py37_estnltk_neural\\lib\\site-packages\\estnltk\\estnltk_resources\\word2vec\\embeddings_2015-06-21\\lemmas.sg.s200.w2v.bin',
 'C:\\Programmid\\Miniconda3\\envs\\py37_estnltk_neural\\lib\\site-packages\\estnltk\\estnltk_resources\\word2vec\\embeddings_2015-06-21\\words.sg.s100.w2v.bin',
 'C:\\Programmid\\Miniconda3\\envs\\py37_estnltk_neural\\lib\\site-packages\\estnltk\\estnltk_resources\\word2vec\\embeddings_2015-06-21\\words.sg.s200.w2v.bin']

In [6]:
# Get paths to downloaded stanzatagger's
get_resource_paths('stanzatagger')

[]

The function returns an empty list if resource has not been downloaded yet (or if there is no such resource).

If there are multiple versions of the resource, then versions are _sorted by resource dates_ : the latest resources come first in the list.

You can request only a single resource (the latest resource) by setting `only_latest=True`:

In [5]:
get_resource_paths('word2vec_sg', only_latest=True)

'C:\\Programmid\\Miniconda3\\envs\\py37_estnltk_neural\\lib\\site-packages\\estnltk\\estnltk_resources\\word2vec\\embeddings_2015-06-21\\lemmas.sg.s100.w2v.bin'

Note that this returns a string instead of a list. 
And if the requested resource is missing, `None` value will be returned.

### Where is the resources directory and how to change it?

By default, EstNLTK attempts to download resources into sub directory `estnltk_resources` inside the installation directory of the `estnltk` package.
If that fails (e.g. due to insufficient permissions), then EstNLTK creates sub directory `estnltk_resources` into [user's home directory](https://docs.python.org/3/library/pathlib.html#pathlib.Path.home) and stores resources there. 

If you want to force your own resources location, then you can set system environment variable ESTNLTK_RESOURCES to a full path of the new resources directory.
Note that this must be an existing directory where writing is permitted.
Naturally, it is advisable to set the environment variable _before_ downloading any resources.

### Removing resources

Use the function `delete_resource` to remove a downloaded resource:

In [7]:
from estnltk.resource_utils import delete_resource
delete_resource('word2vec_words_sg_s100_2015-06-21')

True

The function returns True in case of a successful deletion. 
Note that resources can be deleted only by their specific names, not by their aliases.
E.g. `delete_resource('word2vec_sg')` would not have worked in the previous example.

### Integrating automatic resource downloading ( for developers )

If you are creating a tagger that needs some of external / downloadable resources, then you can use the function `get_resource_paths` with an autodownload option.

Namely, if you set `download_missing=True` and the requested resource has not been downloaded yet, then the user is prompted with a question asking for a permission to download the missing resource. 
If the user gives the permission, then the resource will be downloaded automatically and it's path returned as a result:

In [8]:
from estnltk.downloader import get_resource_paths
get_resource_paths('word2vec_words_sg_s100_2015-06-21', only_latest=True, download_missing=True)

This requires downloading resource 'word2vec_words_sg_s100_2015-06-21' (size: 322M). Proceed with downloading? [Y/n] Y


Downloading word2vec_words_sg_s100_2015-06-21: 305214it [00:31, 9579.56it/s] 


'C:\\Programmid\\Miniconda3\\envs\\py37_estnltk_neural\\lib\\site-packages\\estnltk\\estnltk_resources\\word2vec\\embeddings_2015-06-21\\words.sg.s100.w2v.bin'

So, you can use `get_resource_paths` in the constructor of tagger to get the path to a required resource, and download the resource (with user's permission) automatically if it is missing.