# widgets.image_cleaner

fastai offers several widgets to support the workflow of a deep learning practitioner. The purpose of the widgets are to help you organize, clean, and prepare your data for your model. Widgets are separated by data type.

In [None]:
from fastai.vision import *
from fastai.widgets import DatasetFormatter, ImageCleaner
from fastai.gen_doc.nbdoc import show_doc

In [None]:
%reload_ext autoreload
%autoreload 2

In [None]:
show_doc(DatasetFormatter)

<h2 id="DatasetFormatter"><code>class</code> <code>DatasetFormatter</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L14" class="source_link">[source]</a></h2>

> <code>DatasetFormatter</code>()

The [`DatasetFormatter`](/widgets.image_cleaner.html#DatasetFormatter) class prepares your image dataset for widgets by returning a formatted [`DatasetTfm`](/vision.data.html#DatasetTfm) based on the [`DatasetType`](/basic_data.html#DatasetType) specified. Use `from_toplosses` to grab the most problematic images directly from your learner. Optionally, you can restrict the formatted dataset returned to `n_imgs`.

In [None]:
path = untar_data(URLs.MNIST_SAMPLE)
data = ImageDataBunch.from_folder(path)

In [None]:
learn = create_cnn(data, models.resnet18, metrics=error_rate)

In [None]:
learn.fit_one_cycle(2)

In [None]:
learn.save('stage-1')

We create a databunch with all the data in the training set and no validation set (DatasetFormatter uses only the training set)

In [None]:
db = (ImageItemList.from_folder(path)
                   .no_split()
                   .label_from_folder()
                   .databunch())

In [None]:
learn = create_cnn(db, models.resnet18, metrics=[accuracy])
learn.load('stage-1');

In [None]:
show_doc(ImageCleaner)

<h2 id="ImageCleaner"><code>class</code> <code>ImageCleaner</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L92" class="source_link">[source]</a></h2>

> <code>ImageCleaner</code>(`dataset`, `fns_idxs`, `path`, `batch_size`:`int`=`5`, `duplicates`=`False`)

Display images with their current label.  

[`ImageCleaner`](/widgets.image_cleaner.html#ImageCleaner) is for cleaning up images that don't belong in your dataset. It renders images in a row and gives you the opportunity to delete the file from your file system. To use [`ImageCleaner`](/widgets.image_cleaner.html#ImageCleaner) we must first use `DatasetFormatter().from_toplosses` to get the suggested indices for misclassified images.

In [None]:
ds, idxs = DatasetFormatter().from_toplosses(learn)

In [None]:
ImageCleaner(ds, idxs, path)

[`ImageCleaner`](/widgets.image_cleaner.html#ImageCleaner) does not change anything on disk (neither labels or existence of images). Instead, it creates a 'cleaned.csv' file in your data path from which you need to load your new databunch for the files to changes to be applied. 

In [None]:
df = pd.read_csv(path/'cleaned.csv', header='infer')

In [None]:
# We create a databunch from our csv. We include the data in the training set and we don't use a validation set (DatasetFormatter uses only the training set)
np.random.seed(42)
db = (ImageItemList.from_df(df, path)
                   .no_split()
                   .label_from_df()
                   .databunch(bs=64))

In [None]:
learn = create_cnn(db, models.resnet18, metrics=error_rate)
learn = learn.load('stage-1')

You can then use [`ImageCleaner`](/widgets.image_cleaner.html#ImageCleaner) again to find duplicates in the dataset. To do this, you can specify `duplicates=True` while calling ImageCleaner after getting the indices and dataset from `.from_similars`. Note that if you are using a layer's output which has dimensions [n_batches, n_features, 1, 1] then you don't need any pooling (this is the case with the last layer). The suggested use of `.from_similars()` with resnets is using the last layer and no pooling, like in the following cell.

In [None]:
ds, idxs = DatasetFormatter().from_similars(learn, layer_ls=[0,7,1], pool=None)

Getting activations...


Computing similarities...


In [None]:
ImageCleaner(ds, idxs, path, duplicates=True)

HBox(children=(VBox(children=(Image(value=b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x01\x00d\x00d\x00\x00\xff…

Button(button_style='primary', description='Next Batch', layout=Layout(width='auto'), style=ButtonStyle())

## Methods

In [None]:
from fastai.gen_doc.nbdoc import *
from fastai.widgets.image_cleaner import * 

In [None]:
show_doc(ImageCleaner.make_horizontal_box)

<h4 id="ImageCleaner.make_horizontal_box"><code>make_horizontal_box</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L131" class="source_link">[source]</a></h4>

> <code>make_horizontal_box</code>(`children`, `layout`=`Layout()`)

Make a horizontal box with [`children`](/torch_core.html#children) and `layout`.  

In [None]:
show_doc(DatasetFormatter.from_similars)

<h4 id="DatasetFormatter.from_similars"><code>from_similars</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L34" class="source_link">[source]</a></h4>

> <code>from_similars</code>(`learn`, `layer_ls`:`list`=`[0, 7, 2]`, `kwargs`)

Gets the indices for the most similar images in training and validation datasets  

In [None]:
show_doc(ImageCleaner.chunks)

<h4 id="ImageCleaner.chunks"><code>chunks</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L142" class="source_link">[source]</a></h4>

> <code>chunks</code>(`l`, `n`)

Yield successive n-sized chunks from l.  

In [None]:
show_doc(ImageCleaner.make_vertical_box)

<h4 id="ImageCleaner.make_vertical_box"><code>make_vertical_box</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L136" class="source_link">[source]</a></h4>

> <code>make_vertical_box</code>(`children`, `layout`=`Layout()`, `duplicates`=`False`)

Make a vertical box with [`children`](/torch_core.html#children) and `layout`.  

In [None]:
show_doc(DatasetFormatter.get_toplosses_idxs)

<h4 id="DatasetFormatter.get_toplosses_idxs"><code>get_toplosses_idxs</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L21" class="source_link">[source]</a></h4>

> <code>get_toplosses_idxs</code>(`learn`, `n_imgs`, `kwargs`)

Sorts `ds_type` dataset by top losses and returns dataset and sorted indices.  

In [None]:
show_doc(ImageCleaner.delete_image)

<h4 id="ImageCleaner.delete_image"><code>delete_image</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L184" class="source_link">[source]</a></h4>

> <code>delete_image</code>(`file_path`)

In [None]:
show_doc(DatasetFormatter.sort_idxs)

<h4 id="DatasetFormatter.sort_idxs"><code>sort_idxs</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L85" class="source_link">[source]</a></h4>

> <code>sort_idxs</code>(`similarities`)

Sorts `similarities` and return the indexes in pairs ordered by highest similarity.  

In [None]:
show_doc(ImageCleaner.empty_batch)

<h4 id="ImageCleaner.empty_batch"><code>empty_batch</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L182" class="source_link">[source]</a></h4>

> <code>empty_batch</code>()

In [None]:
show_doc(ImageCleaner.get_widgets)

<h4 id="ImageCleaner.get_widgets"><code>get_widgets</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L190" class="source_link">[source]</a></h4>

> <code>get_widgets</code>(`duplicates`)

Create and format widget set.  

In [None]:
show_doc(DatasetFormatter.get_similars_idxs)

<h4 id="DatasetFormatter.get_similars_idxs"><code>get_similars_idxs</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L40" class="source_link">[source]</a></h4>

> <code>get_similars_idxs</code>(`learn`, `layer_ls`, `kwargs`)

Gets the indices for the most similar images in `ds_type` dataset  

In [None]:
show_doc(ImageCleaner.next_batch)

<h4 id="ImageCleaner.next_batch"><code>next_batch</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L166" class="source_link">[source]</a></h4>

> <code>next_batch</code>(`_`)

Handler for 'Next Batch' button click. Delete all flagged images and renders next batch.  

In [None]:
show_doc(ImageCleaner.render)

<h4 id="ImageCleaner.render"><code>render</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L222" class="source_link">[source]</a></h4>

> <code>render</code>()

Re-render Jupyter cell for batch of images.  

In [None]:
show_doc(DatasetFormatter.padded_ds)

<h4 id="DatasetFormatter.padded_ds"><code>padded_ds</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L30" class="source_link">[source]</a></h4>

> <code>padded_ds</code>(`ll_input`, `size`=`(250, 300)`, `do_crop`=`False`, `padding_mode`=`'zeros'`, `kwargs`)

For a LabelList `ll_input`, resize each image to `size`. Optionally `do_crop` or pad with `padding_mode`.  

In [None]:
show_doc(ImageCleaner.on_delete)

<h4 id="ImageCleaner.on_delete"><code>on_delete</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L177" class="source_link">[source]</a></h4>

> <code>on_delete</code>(`btn`)

Flag this image as delete or keep.  

In [None]:
show_doc(DatasetFormatter.largest_indices)

<h4 id="DatasetFormatter.largest_indices"><code>largest_indices</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L77" class="source_link">[source]</a></h4>

> <code>largest_indices</code>(`arr`, `n`)

Returns the `n` largest indices from a numpy array `arr`.  

In [None]:
show_doc(ImageCleaner.create_image_list)

<h4 id="ImageCleaner.create_image_list"><code>create_image_list</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L148" class="source_link">[source]</a></h4>

> <code>create_image_list</code>(`dataset`, `fns_idxs`)

Create a list of images, filenames and labels but first removing files that are not supposed to be displayed.  

In [None]:
show_doc(ImageCleaner.make_img_widget)

<h4 id="ImageCleaner.make_img_widget"><code>make_img_widget</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L107" class="source_link">[source]</a></h4>

> <code>make_img_widget</code>(`img`, `layout`=`Layout()`, `format`=`'jpg'`)

Returns an image widget for specified file name `img`.  

In [None]:
show_doc(ImageCleaner.make_button_widget)

<h4 id="ImageCleaner.make_button_widget"><code>make_button_widget</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L112" class="source_link">[source]</a></h4>

> <code>make_button_widget</code>(`label`, `file_path`=`None`, `handler`=`None`, `style`=`None`, `layout`=`Layout(width='auto')`)

Return a Button widget with specified `handler`.  

In [None]:
show_doc(ImageCleaner.batch_contains_deleted)

<h4 id="ImageCleaner.batch_contains_deleted"><code>batch_contains_deleted</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L204" class="source_link">[source]</a></h4>

> <code>batch_contains_deleted</code>()

Check if current batch contains already deleted images.  

In [None]:
show_doc(ImageCleaner.write_csv)

<h4 id="ImageCleaner.write_csv"><code>write_csv</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L210" class="source_link">[source]</a></h4>

> <code>write_csv</code>()

In [None]:
show_doc(DatasetFormatter.comb_similarity)

<h4 id="DatasetFormatter.comb_similarity"><code>comb_similarity</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L65" class="source_link">[source]</a></h4>

> <code>comb_similarity</code>(`t1`:`Tensor`, `t2`:`Tensor`, `sim_func`=`CosineSimilarity()`, `kwargs`)

Computes the similarity function `sim_func` between each embedding of `t1` and `t2` matrices.  

In [None]:
show_doc(DatasetFormatter.from_toplosses)

<h4 id="DatasetFormatter.from_toplosses"><code>from_toplosses</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L15" class="source_link">[source]</a></h4>

> <code>from_toplosses</code>(`learn`, `n_imgs`=`None`, `kwargs`)

Gets indices with top losses for both training and validation sets in `learn`.  

In [None]:
show_doc(ImageCleaner.empty)

<h4 id="ImageCleaner.empty"><code>empty</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L187" class="source_link">[source]</a></h4>

> <code>empty</code>()

In [None]:
show_doc(DatasetFormatter.get_actns)

<h4 id="DatasetFormatter.get_actns"><code>get_actns</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L51" class="source_link">[source]</a></h4>

> <code>get_actns</code>(`learn`, `hook`:[`Hook`](/callbacks.hooks.html#Hook), `dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `pool`=`'AdaptiveConcatPool2d'`, `pool_dim`:`int`=`4`, `kwargs`)

Gets activations at the layer specified by `hook`, applies `pool` of dim `pool_dim` and concatenates  

In [None]:
show_doc(ImageCleaner.relabel)

<h4 id="ImageCleaner.relabel"><code>relabel</code><a href="https://github.com/fastai/fastai/blob/master/fastai/widgets/image_cleaner.py#L159" class="source_link">[source]</a></h4>

> <code>relabel</code>(`change`)

Relabel images by moving from parent dir with old label `class_old` to parent dir with new label `class_new`.  

## Undocumented Methods - Methods moved below this line will intentionally be hidden

## New Methods - Please document or move to the undocumented section

In [None]:
show_doc(ImageCleaner.make_dropdown_widget)