ImageNet downloader and PyTorch Dataset implementation in PyTorch.
Requirements: Please see requirements.yml
for details.
To install new environment called imagenet
, run the following command:
> conda env create -f requirements.yml
Downloading imagenet samples works by running the script download_imagenet_images.py [num_images]
. It will download the number of images specified by first downloading image urls from the ImageNet API, then randomly shuffeling all the urls, and finally downloading from these urls until [num_images]
were successfully downloaded.
> python download_imagenet_images.py 100
Will download 100 images to a subdirectory with the name images
.
It takes quite a while... so let it run over night ;-)
.
To use the dataset, add the parent directory of the project to the sys.path
list.
For example, if you cloned this repo into /foo/bar
, i.e., this repo is located
at /foo/bar/torch_imagenet
, then /foo/bar
should be in your path:
import sys
sys.append('/foo/bar')
Afterwards, you can import the ImageNetDataset
as follows:
from torch_imagenet import ImageNetDataset
Note that some of the downloaded images may be gray scale and thus only have one channel. In such cases, the dataset may get some hickups. To fix this, you can run
> python identify_bad_images.py
This script will generate a pickle file with all the filenames of images that
does not have exactly three channels. The ImageNetDataset
will automatically
pick up the file next time the dataset is used.
To see the code in action, please see this example code.