# Widgets

In [None]:
from fastai import *
from fastai.vision import *
from fastai.widgets import DatasetFormatter, ImageCleaner

fastai offers several widgets to support the workflow of a deep learning practitioner. The purpose of the widgets are to help you organize, clean, and prepare your data for your model. Widgets are separated by data type.

## Images

### DatasetFormatter
The [`DatasetFormatter`](/widgets.image_cleaner.html#DatasetFormatter) class prepares your image dataset for widgets by returning a formatted [`DatasetTfm`](/vision.data.html#DatasetTfm) based on the [`DatasetType`](/basic_data.html#DatasetType) specified. Use `from_toplosses` to grab the most problematic images directly from your learner. Optionally, you can restrict the formatted dataset returned to `n_imgs`.

In [None]:
path = untar_data(URLs.MNIST_SAMPLE)
data = ImageDataBunch.from_folder(path)

In [None]:
learn = create_cnn(data, models.resnet18, metrics=error_rate)

In [None]:
learn.fit_one_cycle(2)

e,p,o,c,h,Unnamed: 5_level_0,Unnamed: 6_level_0,t,r,a,i,n,_,l,o,s,s,Unnamed: 17_level_0,Unnamed: 18_level_0,v,a,l,i,d,_,l,o,s,s,Unnamed: 29_level_0,Unnamed: 30_level_0,e,r,r,o,r,_,r,a,t,e
1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,0,.,1,7,3,0,2,2,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,0,.,0,9,7,9,8,5,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,0,.,0,3,6,3,1,0,Unnamed: 39_level_1,Unnamed: 40_level_1
2,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,0,.,1,0,6,7,1,6,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,0,.,0,7,3,3,9,4,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,0,..1,0,2,7,9,6,9,Unnamed: 39_level_2,Unnamed: 40_level_2


In [None]:
learn.save('stage-1')

In [None]:
# We create a databunch with all the data in the training set and no validation set (DatasetFormatter uses only the training set)
np.random.seed(42)
src_del = (ImageItemList.from_folder(path)
                           .split_by_folder()
                           .label_from_folder())

items = ImageItemList([*src_del.train.x.items, *src_del.valid.x.items])
cats = CategoryList([*src_del.train.y.items, *src_del.valid.y.items], data.classes)

ll = LabelList(items, cats)

db = LabelLists(path, ll, None).databunch()

In [None]:
learn = create_cnn(data, models.resnet18, metrics=[accuracy])
learn.load('stage-1')

Learner(data=ImageDataBunch;
Train: LabelList
y: CategoryList (12396 items)
[Category 7, Category 7, Category 7, Category 7, Category 7]...
Path: /home/chewing/.fastai/data/mnist_sample
x: ImageItemList (12396 items)
[Image (3, 28, 28), Image (3, 28, 28), Image (3, 28, 28), Image (3, 28, 28), Image (3, 28, 28)]...
Path: /home/chewing/.fastai/data/mnist_sample;
Valid: LabelList
y: CategoryList (2038 items)
[Category 7, Category 7, Category 7, Category 7, Category 7]...
Path: /home/chewing/.fastai/data/mnist_sample
x: ImageItemList (2038 items)
[Image (3, 28, 28), Image (3, 28, 28), Image (3, 28, 28), Image (3, 28, 28), Image (3, 28, 28)]...
Path: /home/chewing/.fastai/data/mnist_sample;
Test: None, model=Sequential(
  (0): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dil

[`ImageCleaner`](/widgets.image_cleaner.html#ImageCleaner) is for cleaning up images that don't belong in your dataset. It renders images in a row and gives you the opportunity to delete the file from your file system. To use [`ImageCleaner`](/widgets.image_cleaner.html#ImageCleaner) we must first use `DatasetFormatter().from_toplosses` to get the suggested indices for misclassified images.

### ImageCleaner

In [None]:
ds, idxs = DatasetFormatter().from_toplosses(learn)

In [None]:
ImageCleaner(ds, idxs, path)

HBox(children=(VBox(children=(Image(value=b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x01\x00d\x00d\x00\x00\xff…

Button(button_style='primary', description='Next Batch', layout=Layout(width='auto'), style=ButtonStyle())

You can also use ImageCleaner to find duplicates in your dataset. For this, you need to run `DatasetFormatter().from_similars` and then `ImageCleaner` with `duplicates=True`.

In [None]:
ds, idxs = DatasetFormatter().from_similars(learn, layer_ls=[0,7,1], pool_dim=1)

Getting activations...


Computing similarities...


In [None]:
ImageCleaner(ds, idxs, path, duplicates=True)

HBox(children=(VBox(children=(Image(value=b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x01\x00d\x00d\x00\x00\xff…

Button(button_style='primary', description='Next Batch', layout=Layout(width='auto'), style=ButtonStyle())