# datasets

This module has the necessary functions to be able to download several useful datasets that we might be interested in using in our models.

In [None]:
from fastai.gen_doc.nbdoc import *
from fastai.datasets import * 

In [None]:
show_doc(URLs)

## <a id=URLs></a>`class` `URLs`
> `URLs`()
<a href="https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L9">[source]</a>

The datasets module has a 'URLs' class that contains all the datasets' URLs. The supported datasets are (with their calling name): S3_NLP, S3_COCO, MNIST_SAMPLE, MNIST_TINY, IMDB_SAMPLE, ADULT_SAMPLE, ML_SAMPLE, PLANET_SAMPLE, CIFAR, PETS, MNIST. To get details on the datasets you can see the fast.ai datasets webpage. Datasets with SAMPLE in their name are subsets of the original datasets. In the case of MNIST, we also have a TINY dataset which is even smaller than MNIST_SAMPLE.

In [None]:
URLs.MNIST_SAMPLE

'http://files.fast.ai/data/examples/mnist_sample'

In [None]:
URLs.MNIST_TINY

'http://files.fast.ai/data/examples/mnist_tiny'

The fastai library also has a few functions that allow us to directly download the most popular datasets, including the `wikitext103` langugage model weights. These are: `get_adult`, `get_mnist`, `get_imdb`, `get_movie_lens`, `download_wt103_model`. 

In [None]:
mnist_data = URLs.get_mnist()

For the rest of the datasets you will need to manually download them. This can be done with `untar_data` or `download_data`. `untar_data` will decompress the data file and download it while `download_data` will just download and save the compressed file in `.tgz` format. 

In [None]:
show_doc(untar_data)

#### <a id=untar_data></a>`untar_data`
> `untar_data`(`url`:`str`, `fname`:`PathOrStr`=`None`, `dest`:`PathOrStr`=`None`)


Download `url` if doesn't exist to `fname` and un-tgz to folder `dest`  <a href="https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L72">[source]</a>

In [None]:
untar_data(URLs.PLANET_SAMPLE)

Downloading http://files.fast.ai/data/examples/planet_sample


HBox(children=(IntProgress(value=0, max=15523994), HTML(value='')))

PosixPath('/home/chewing/fastai/fastai/../data/planet_sample')

In [None]:
show_doc(download_data)

#### <a id=download_data></a>`download_data`
> `download_data`(`url`:`str`, `fname`:`PathOrStr`=`None`)


Download `url` to destination `fname`  <a href="https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L63">[source]</a>

In [None]:
download_data(URLs.PLANET_SAMPLE)

PosixPath('/home/chewing/fastai/fastai/../data/planet_sample.tgz')

## Undocumented Methods - Methods moved below this line will intentionally be hidden