Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated usage of ImageCleaner #234

Merged
merged 3 commits into from Mar 27, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
78 changes: 75 additions & 3 deletions nbs/dl1/lesson2-download.ipynb
Expand Up @@ -503,7 +503,7 @@
"source": [
"# If you already cleaned your data, run this cell instead of the one before\n",
"# np.random.seed(42)\n",
"# data = ImageDataBunch.from_csv(\".\", folder=\".\", valid_pct=0.2, csv_labels='cleaned.csv',\n",
"# data = ImageDataBunch.from_csv(path, folder=\".\", valid_pct=0.2, csv_labels='cleaned.csv',\n",
"# ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)"
]
},
Expand Down Expand Up @@ -807,13 +807,78 @@
"Notice that the widget will not delete images directly from disk but it will create a new csv file `cleaned.csv` from where you can create a new ImageDataBunch with the corrected labels to continue training your model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to clean the entire set of images, we need to create a new dataset without the split. The video lecture demostrated the use of the `ds_type` param which no longer has any effect. See [the thread](https://forums.fast.ai/t/duplicate-widget/30975/10) for more details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"db = (ImageList.from_folder(path)\n",
" .no_split()\n",
" .label_from_folder()\n",
" .transform(get_transforms(), size=224)\n",
" .databunch()\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# If you already cleaned your data using indexes from `from_toplosses`,\n",
"# run this cell instead of the one before to proceed with removing duplicates.\n",
"# Otherwise all the results of the previous step would be overwritten by\n",
"# the new run of `ImageCleaner`.\n",
"\n",
"# db = (ImageList.from_csv(path, 'cleaned.csv', folder='.')\n",
"# .no_split()\n",
"# .label_from_df()\n",
"# .transform(get_transforms(), size=224)\n",
"# .databunch()\n",
"# )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then we create a new learner to use our new databunch with all the images."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds, idxs = DatasetFormatter().from_toplosses(learn, ds_type=DatasetType.Valid)"
"learn_cln = cnn_learner(db, models.resnet34, metrics=error_rate)\n",
"\n",
"learn_cln.load('stage-2');"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds, idxs = DatasetFormatter().from_toplosses(learn_cln)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Make sure you're running this notebook in Jupyter Notebook, not Jupyter Lab. That is accessible via [/tree](/tree), not [/lab](/lab). Running the `ImageCleaner` widget in Jupyter Lab is [not currently supported](https://github.com/fastai/fastai/issues/1539)."
]
},
{
Expand Down Expand Up @@ -849,6 +914,13 @@
"You can also find duplicates in your dataset and delete them! To do this, you need to run `.from_similars` to get the potential duplicates' ids and then run `ImageCleaner` with `duplicates=True`. The API works in a similar way as with misclassified images: just choose the ones you want to delete and click 'Next Batch' until there are no more images left."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Make sure to recreate the databunch and `learn_cln` from the `cleaned.csv` file. Otherwise the file would be overwritten from scratch, loosing all the results from cleaning the data from toplosses."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -934,7 +1006,7 @@
}
],
"source": [
"ds, idxs = DatasetFormatter().from_similars(learn, ds_type=DatasetType.Valid)"
"ds, idxs = DatasetFormatter().from_similars(learn_cln)"
]
},
{
Expand Down