fastai · jph00 · Mar 27, 2019 · Feb 26, 2019 · Feb 28, 2019 · Mar 13, 2019
diff --git a/nbs/dl1/lesson2-download.ipynb b/nbs/dl1/lesson2-download.ipynb
@@ -503,7 +503,7 @@
    "source": [
     "# If you already cleaned your data, run this cell instead of the one before\n",
     "# np.random.seed(42)\n",
-    "# data = ImageDataBunch.from_csv(\".\", folder=\".\", valid_pct=0.2, csv_labels='cleaned.csv',\n",
+    "# data = ImageDataBunch.from_csv(path, folder=\".\", valid_pct=0.2, csv_labels='cleaned.csv',\n",
     "#         ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)"
    ]
   },
@@ -807,13 +807,78 @@
     "Notice that the widget will not delete images directly from disk but it will create a new csv file `cleaned.csv` from where you can create a new ImageDataBunch with the corrected labels to continue training your model."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In order to clean the entire set of images, we need to create a new dataset without the split. The video lecture demostrated the use of the `ds_type` param which no longer has any effect. See [the thread](https://forums.fast.ai/t/duplicate-widget/30975/10) for more details."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "db = (ImageList.from_folder(path)\n",
+    "                   .no_split()\n",
+    "                   .label_from_folder()\n",
+    "                   .transform(get_transforms(), size=224)\n",
+    "                   .databunch()\n",
+    "     )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# If you already cleaned your data using indexes from `from_toplosses`,\n",
+    "# run this cell instead of the one before to proceed with removing duplicates.\n",
+    "# Otherwise all the results of the previous step would be overwritten by\n",
+    "# the new run of `ImageCleaner`.\n",
+    "\n",
+    "# db = (ImageList.from_csv(path, 'cleaned.csv', folder='.')\n",
+    "#                    .no_split()\n",
+    "#                    .label_from_df()\n",
+    "#                    .transform(get_transforms(), size=224)\n",
+    "#                    .databunch()\n",
+    "#      )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Then we create a new learner to use our new databunch with all the images."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
-    "ds, idxs = DatasetFormatter().from_toplosses(learn, ds_type=DatasetType.Valid)"
+    "learn_cln = cnn_learner(db, models.resnet34, metrics=error_rate)\n",
+    "\n",
+    "learn_cln.load('stage-2');"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ds, idxs = DatasetFormatter().from_toplosses(learn_cln)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Make sure you're running this notebook in Jupyter Notebook, not Jupyter Lab. That is accessible via [/tree](/tree), not [/lab](/lab). Running the `ImageCleaner` widget in Jupyter Lab is [not currently supported](https://github.com/fastai/fastai/issues/1539)."
    ]
   },
   {
@@ -849,6 +914,13 @@
     "You can also find duplicates in your dataset and delete them! To do this, you need to run `.from_similars` to get the potential duplicates' ids and then run `ImageCleaner` with `duplicates=True`. The API works in a similar way as with misclassified images: just choose the ones you want to delete and click 'Next Batch' until there are no more images left."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Make sure to recreate the databunch and `learn_cln` from the `cleaned.csv` file. Otherwise the file would be overwritten from scratch, loosing all the results from cleaning the data from toplosses."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -934,7 +1006,7 @@
     }
    ],
    "source": [
-    "ds, idxs = DatasetFormatter().from_similars(learn, ds_type=DatasetType.Valid)"
+    "ds, idxs = DatasetFormatter().from_similars(learn_cln)"
    ]
   },
   {