# Data preparation
**If you are just contributing to this project or reproducing results, you should never need to run this notebook! It's included just to track our data preparation.**

We need a small subset of ImageNet data to work with. To reproduce the dataset we used, you could run the following cells. We'll use Martins Frolovs's [ImageNet datsets downloader tool](https://github.com/mf1024/ImageNet-datasets-downloader), which must first be installed to use this notebook. The cell below may not pull precisely the same images we used. A list of image id's is given in the Data directory as `image_ids.txt`.

In [None]:
!python ../../ImageNet-Datasets-Downloader/downloader.py \
-data_root ../Data/ -use_class_list True -class_list n02510455\
n02483362 n02503517 n02523877 n02672831 n02850732 n02906734 \
n02951585 n03000134 n03005285 -images_per_class 1000 \
-multiprocessing_workers 4

The output of the above cell should look like this:
```
Picked the following clases:
['giant panda', 'gibbon', 'elephant', 'haddock', 'accordion', 'blender', 'broom', 'can opener', 'chainlink fence', 'chandelier']
Scraping images for class "giant panda"
Multiprocessing workers: 4

Scraping stats:
STATS For class is_flickr:
 tried 246.0 urls with 236.0 successes
95.9349593495935% success rate for is_flickr urls 
0.11591261423240275 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 246.0 urls with 236.0 successes
95.9349593495935% success rate for all urls 
0.11591495598776864 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 496.0 urls with 470.0 successes
94.75806451612904% success rate for is_flickr urls 
0.11586135397566127 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 496.0 urls with 470.0 successes
94.75806451612904% success rate for all urls 
0.11586216713519806 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 746.0 urls with 708.0 successes
94.90616621983914% success rate for is_flickr urls 
0.11787304568425411 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 746.0 urls with 708.0 successes
94.90616621983914% success rate for all urls 
0.117873992286833 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 996.0 urls with 942.0 successes
94.57831325301204% success rate for is_flickr urls 
0.1186178803697007 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 996.0 urls with 942.0 successes
94.57831325301204% success rate for all urls 
0.11861852754200088 seconds spent per all succesful image download
Scraping images for class "gibbon"
Multiprocessing workers: 4

Scraping stats:
STATS For class is_flickr:
 tried 1246.0 urls with 1156.0 successes
92.776886035313% success rate for is_flickr urls 
0.11479351842279666 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 1246.0 urls with 1156.0 successes
92.776886035313% success rate for all urls 
0.11479384738268737 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 1496.0 urls with 1349.0 successes
90.17379679144385% success rate for is_flickr urls 
0.10916695301403727 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 1496.0 urls with 1349.0 successes
90.17379679144385% success rate for all urls 
0.10916730083298737 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 1746.0 urls with 1538.0 successes
88.08705612829324% success rate for is_flickr urls 
0.10593618574626114 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 1746.0 urls with 1538.0 successes
88.08705612829324% success rate for all urls 
0.10593652787227159 seconds spent per all succesful image download
Scraping images for class "elephant"
Multiprocessing workers: 4

Scraping stats:
STATS For class is_flickr:
 tried 1997.0 urls with 1733.0 successes
86.78017025538307% success rate for is_flickr urls 
0.11299261535032595 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 1997.0 urls with 1733.0 successes
86.78017025538307% success rate for all urls 
0.11299272458538182 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 2246.0 urls with 1928.0 successes
85.84149599287622% success rate for is_flickr urls 
0.11400169480391063 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 2246.0 urls with 1928.0 successes
85.84149599287622% success rate for all urls 
0.11400192555550223 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 2497.0 urls with 2129.0 successes
85.26231477773328% success rate for is_flickr urls 
0.11498618025127964 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 2497.0 urls with 2129.0 successes
85.26231477773328% success rate for all urls 
0.1149864125106188 seconds spent per all succesful image download
Scraping images for class "haddock"
Multiprocessing workers: 4
Scraping images for class "accordion"
Multiprocessing workers: 4

Scraping stats:
STATS For class is_flickr:
 tried 2746.0 urls with 2324.0 successes
84.63219227967953% success rate for is_flickr urls 
0.12178581286215331 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 2746.0 urls with 2324.0 successes
84.63219227967953% success rate for all urls 
0.1217859208891601 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 2997.0 urls with 2527.0 successes
84.31765098431765% success rate for is_flickr urls 
0.12101243605574084 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 2997.0 urls with 2527.0 successes
84.31765098431765% success rate for all urls 
0.12101253955601042 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 3246.0 urls with 2731.0 successes
84.1343191620456% success rate for is_flickr urls 
0.120314846628972 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 3246.0 urls with 2731.0 successes
84.1343191620456% success rate for all urls 
0.1203149902388448 seconds spent per all succesful image download
Scraping images for class "blender"
Multiprocessing workers: 4

Scraping stats:
STATS For class is_flickr:
 tried 3497.0 urls with 2944.0 successes
84.18644552473549% success rate for is_flickr urls 
0.12358420139745525 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 3497.0 urls with 2944.0 successes
84.18644552473549% success rate for all urls 
0.12358438134517359 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 3746.0 urls with 3156.0 successes
84.24986652429259% success rate for is_flickr urls 
0.12254666171599704 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 3746.0 urls with 3156.0 successes
84.24986652429259% success rate for all urls 
0.12254676317231888 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 3999.0 urls with 3363.0 successes
84.0960240060015% success rate for is_flickr urls 
0.12515377175974554 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 3999.0 urls with 3363.0 successes
84.0960240060015% success rate for all urls 
0.12515390461624784 seconds spent per all succesful image download
Scraping images for class "broom"
Multiprocessing workers: 4

Scraping stats:
STATS For class is_flickr:
 tried 4246.0 urls with 3576.0 successes
84.22044276966557% success rate for is_flickr urls 
0.12526651216833384 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 4246.0 urls with 3576.0 successes
84.22044276966557% success rate for all urls 
0.12526663271106062 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 4497.0 urls with 3787.0 successes
84.21169668668001% success rate for is_flickr urls 
0.12498957323215869 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 4497.0 urls with 3787.0 successes
84.21169668668001% success rate for all urls 
0.12498965746877311 seconds spent per all succesful image download
Scraping images for class "can opener"
Multiprocessing workers: 4

Scraping stats:
STATS For class is_flickr:
 tried 4746.0 urls with 3992.0 successes
84.11293721028234% success rate for is_flickr urls 
0.13034970577589736 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 4746.0 urls with 3992.0 successes
84.11293721028234% success rate for all urls 
0.13034982534353146 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 4997.0 urls with 4188.0 successes
83.81028617170303% success rate for is_flickr urls 
0.12969082753318997 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 4997.0 urls with 4188.0 successes
83.81028617170303% success rate for all urls 
0.12969091247123202 seconds spent per all succesful image download
Scraping images for class "chainlink fence"
Multiprocessing workers: 4

Scraping stats:
STATS For class is_flickr:
 tried 5246.0 urls with 4384.0 successes
83.56843309187953% success rate for is_flickr urls 
0.1322762328234032 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 5246.0 urls with 4384.0 successes
83.56843309187953% success rate for all urls 
0.13227634876966476 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 5496.0 urls with 4593.0 successes
83.56986899563319% success rate for is_flickr urls 
0.13195338546126698 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 5496.0 urls with 4593.0 successes
83.56986899563319% success rate for all urls 
0.13195350132241843 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 5746.0 urls with 4807.0 successes
83.65819700661329% success rate for is_flickr urls 
0.13153475660630018 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 5746.0 urls with 4807.0 successes
83.65819700661329% success rate for all urls 
0.13153489905234952 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 5996.0 urls with 5004.0 successes
83.45563709139427% success rate for is_flickr urls 
0.13120535137556155 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 5996.0 urls with 5004.0 successes
83.45563709139427% success rate for all urls 
0.13120550955895136 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 6246.0 urls with 5209.0 successes
83.39737431956452% success rate for is_flickr urls 
0.1309091048864056 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 6246.0 urls with 5209.0 successes
83.39737431956452% success rate for all urls 
0.13090923020605805 seconds spent per all succesful image download
Scraping images for class "chandelier"
Multiprocessing workers: 4

Scraping stats:
STATS For class is_flickr:
 tried 6496.0 urls with 5419.0 successes
83.42056650246306% success rate for is_flickr urls 
0.13096019697708494 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 6496.0 urls with 5419.0 successes
83.42056650246306% success rate for all urls 
0.13096028950232425 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 6746.0 urls with 5625.0 successes
83.38274533056627% success rate for is_flickr urls 
0.1305030758327908 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 6746.0 urls with 5625.0 successes
83.38274533056627% success rate for all urls 
0.13050314517550998 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 6998.0 urls with 5844.0 successes
83.50957416404687% success rate for is_flickr urls 
0.12992064631368752 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 6998.0 urls with 5844.0 successes
83.50957416404687% success rate for all urls 
0.12992071999335109 seconds spent per all succesful image download

Scraping stats:
STATS For class is_flickr:
 tried 7246.0 urls with 6059.0 successes
83.61854816450456% success rate for is_flickr urls 
0.12949373690990315 seconds spent per is_flickr succesful image download
STATS For class not_flickr:
 tried 0.0 urls with 0.0 successes
STATS For class all:
 tried 7246.0 urls with 6059.0 successes
83.61854816450456% success rate for all urls 
0.12949382064562462 seconds spent per all succesful image download
```

In [4]:
!ls -1 ../Data/imagenet_images/*/* > ../Data/image_ids.txt