# Widgets

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
from fastai.gen_doc.nbdoc import *
from fastai.vision import *
from fastai.widgets import DatasetFormatter, ImageCleaner, ImageDownloader, download_google_images

fastai offers several widgets to support the workflow of a deep learning practitioner. The purpose of the widgets are to help you organize, clean, and prepare your data for your model. Widgets are separated by data type.

## Images

### DatasetFormatter
The [`DatasetFormatter`](/widgets.image_cleaner.html#DatasetFormatter) class prepares your image dataset for widgets by returning a formatted [`DatasetTfm`](/vision.data.html#DatasetTfm) based on the [`DatasetType`](/basic_data.html#DatasetType) specified. Use `from_toplosses` to grab the most problematic images directly from your learner. Optionally, you can restrict the formatted dataset returned to `n_imgs`.

In [None]:
path = untar_data(URLs.MNIST_SAMPLE)
data = ImageDataBunch.from_folder(path)

In [None]:
data.show_batch()

In [None]:
learn = create_cnn(data, models.resnet18, metrics=error_rate)

In [None]:
learn.fit_one_cycle(2)

In [None]:
learn.save('stage-1')

We create a databunch with all the data in the training set and no validation set (DatasetFormatter uses only the training set)

In [None]:
db = (ImageItemList.from_folder(path)
                   .no_split()
                   .label_from_folder()
                   .databunch())

In [None]:
learn = create_cnn(db, models.resnet18, metrics=[accuracy])
learn.load('stage-1');

### ImageCleaner

[`ImageCleaner`](/widgets.image_cleaner.html#ImageCleaner) is for cleaning up images that don't belong in your dataset. It renders images in a row and gives you the opportunity to delete the file from your file system. To use [`ImageCleaner`](/widgets.image_cleaner.html#ImageCleaner) we must first use `DatasetFormatter().from_toplosses` to get the suggested indices for misclassified images.

In [None]:
ds, idxs = DatasetFormatter().from_toplosses(learn)

In [None]:
ImageCleaner(ds, idxs, path)

### ImageDownloader

[`ImageDownloader`](/widgets.image_downloader.html#ImageDownloader) widget gives you a way to quickly bootstrap your image dataset without leaving the notebook. It searches and downloads images that match the search criteria and resolution / quality requirements and stores them on your filesystem within the provided `path`.

Images for each search query (or label) are stored in a separate folder within `path`. For example, if you pupulate `tiger` with a `path` setup to `./data`, you'll get a folder `./data/tiger/` with the tiger images in it.

[`ImageDownloader`](/widgets.image_downloader.html#ImageDownloader) will automatically clean up and verify the downloaded images with [`verify_images()`](/vision.data.html#verify_images) after downloading them.

In [None]:
path = Path('./image_downloader_data')
ImageDownloader(path)

After populating images with [`ImageDownloader`](/widgets.image_downloader.html#ImageDownloader), you can get a an [`ImageDataBunch`](/vision.data.html#ImageDataBunch) by calling `ImageDataBunch.from_folder(path, size=size)`, or using the data block API.

In [None]:
path.ls()

In [None]:
src = (ImageItemList.from_folder(path)
       .random_split_by_pct()
       .label_from_folder()
       .transform(get_transforms(), size=224))
db  = src.databunch(bs=16)

In [None]:
learn = create_cnn(db, models.resnet34, metrics=[accuracy])

In [None]:
learn.fit_one_cycle(3)

#### Downloading more than a hundred images

To fetch more than a hundred images, [`ImageDownloader`](/widgets.image_downloader.html#ImageDownloader) uses `selenium` and `chromedriver` to scroll through the Google Images search results page and scrape image URLs. They're not required as dependencies by default. If you don't have them installed on your system, the widget will show you an error message.

To install `selenium`, just `pip install selenium` in your fastai environment.

**On a mac**, you can install `chromedriver` with `brew cask install chromedriver`.

**On Ubuntu**
Take a look at the latest Chromedriver version available, then something like:

```
wget https://chromedriver.storage.googleapis.com/2.45/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
```

#### Downloading images in python scripts outside Jupyter notebooks

In [None]:
path = Path('image_downloader_data')

In [None]:
download_google_images(path, 'aussie shepherd', size='>1024*768', n_images=150)

In [None]:
show_doc(download_google_images)

<h4 id="download_google_images"><code>download_google_images</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_downloader.py#L93" class="source_link">[source]</a></h4>

> <code>download_google_images</code>(`path`:`PathOrStr`, `search_term`:`str`, `size`:`str`=`'>400*300'`, `n_images`:`int`=`10`, `format`:`str`=`'jpg'`, `max_workers`:`int`=`8`, `timeout`:`int`=`4`) → `FilePathList`

Search for `n_images` images on Google, matching `search_term` and `size` requirements, and download them into `path`/`search_term` directory.

Automatically [`verify_images`](/vision.data.html#verify_images) and return the image file names list.

Uses `max_workers` threads to download and verify images. 

Note that downloading under 100 images doesn't require any dependencies other than fastai itself, however downloading more than a hundred images [uses `selenium` and `chromedriver`](/widgets.ipynb#Downloading-more-than-a-hundred-images).

`size` can be one of:

```
'>400*300'
'>640*480'
'>800*600'
'>1024*768'
'>2MP'
'>4MP'
'>6MP'
'>8MP'
'>10MP'
'>12MP'
'>15MP'
'>20MP'
'>40MP'
'>70MP'
```