# datasets

This module has the necessary functions to be able to download several useful datasets that we might be interested in using in our models.

In [None]:
from fastai.gen_doc.nbdoc import *
from fastai.datasets import * 
from fastai.datasets import Config
from pathlib import Path

In [None]:
show_doc(URLs)

This contains all the datasets' URLs, and some classmethods to help use them - you don't create objects of this class. The supported datasets are (with their calling name): `S3_NLP`, `S3_COCO`, `MNIST_SAMPLE`, `MNIST_TINY`, `IMDB_SAMPLE`, `ADULT_SAMPLE`, `ML_SAMPLE`, `PLANET_SAMPLE`, `CIFAR`, `PETS`, `MNIST`. To get details on the datasets you can see the [fast.ai datasets webpage](http://course.fast.ai/datasets). Datasets with SAMPLE in their name are subsets of the original datasets. In the case of MNIST, we also have a TINY dataset which is even smaller than MNIST_SAMPLE.

In [None]:
URLs.MNIST_SAMPLE

### Convenience functions

The fastai library also has a few functions that allow us to directly download the most popular datasets, and return an object to access them:

In [None]:
show_doc(URLs.get_mnist)

In [None]:
mnist_data = URLs.get_mnist()
mnist_data.train_ds

In [None]:
show_doc(URLs.get_imdb)

In [None]:
show_doc(URLs.get_movie_lens)

In [None]:
show_doc(URLs.get_adult)

### Other functions

In [None]:
show_doc(URLs.download_wt103_model)

Downloads a pre-trained ULMFiT model.

## Downloading Data

For the rest of the datasets you will need to download them with [`untar_data`](/datasets.html#untar_data) or [`download_data`](/datasets.html#download_data). [`untar_data`](/datasets.html#untar_data) will decompress the data file and download it while [`download_data`](/datasets.html#download_data) will just download and save the compressed file in `.tgz` format. 

By default, data will be downloaded to `~/.fastai/data` folder.  
Configure the default `data_path` by editing `~/.fastai/config.yml`.  

In [None]:
show_doc(untar_data)

In [None]:
untar_data(URLs.PLANET_SAMPLE)

In [None]:
show_doc(download_data)

Note: If the data file already exists in a <code>data</code> directory inside the notebook, that data file will be used instead of <code>~/.fasta/data</code>. Paths are resolved by calling the function [`datapath4file`](/datasets.html#datapath4file) - which checks if data exists locally (`data/`) first, before downloading to `~/.fastai/data` home directory.

Example:

In [None]:
download_data(URLs.PLANET_SAMPLE)

In [None]:
show_doc(datapath4file)

All the downloading functions use this to decide where to put the tgz and expanded folder. If `filename` already exists in a <code>data</code> directory in the same place as the calling notebook/script, that is used as the parent directly, otherwise `~/.fastai/config.yml` is read to see what path to use, which defaults to <code>~/.fastai/data</code> is used. To override this default, simply modify the value in your `~/.fastai/config.yml`:

    data_path: ~/.fastai/data

In [None]:
show_doc(Config)

You probably won't need to use this yourself - it's used by `URLs.datapath4file`.

## Undocumented Methods - Methods moved below this line will intentionally be hidden

## New Methods - Please document or move to the undocumented section