# **Downloading data and models**

## Navigation

- [**Basic imports and initialization**](#Basic-imports-and-initialization)
- [**1. ILSVRC dataset**](#1.-ILSVRC-dataset)
- [**2. CNN models**](#2.-CNN-models)

## Basic imports and initialization

$\qquad$ [[Back to top]](#Navigation) $\qquad$ [[Next part $\to$]](#1.-ILSVRC-dataset)

- [Limiting the hardware resources](#Limiting-the-hardware-resources)
- [Initializing Kaggle API](#Initializing-Kaggle-API)
- [Setting up variables](#Setting-up-variables)

### Limiting the hardware resources

$\quad$[[Back to section]](#Basic-imports-and-initialization)$\quad$[[Next subsect.$\to$]](#Initializing-Kaggle-API)

To provide easy access to modules stored in the ```../src/``` directory, we use the following workaround:

In [1]:
import sys
sys.path.append('../src/')

Next, we limit the hardware usage:

In [2]:
# set limitations on hardware
import hardware_setup
hardware_setup.mkl_set_num_threads(num_threads=4)

[mkl]: set up num_threads=4/4


### Initializing Kaggle API

[[$\leftarrow$ Prev. subsect]](#Limiting-the-hardware-resources) $\quad$[[Back to section]](#Basic-imports-and-initialization)$\quad$[[Next subsect.$\to$]](#Setting-up-variables)

In [3]:
import os

Please fill the following environment entries by valid username and password (you need to have your own Kaggle's account):

In [4]:
os.environ['KAGGLE_USERNAME'] = ''
os.environ['KAGGLE_KEY'] = ''

In [5]:
from kaggle.api.kaggle_api_extended import KaggleApi

In [6]:
api = KaggleApi()
api.authenticate()

### Setting up variables

[[$\leftarrow$ Prev. subsect]](#Initializing-Kaggle-API) $\quad$[[Back to section]](#Basic-imports-and-initialization)$\quad$

In [21]:
model_dirname = '../torch-models'

data_dirname = '../data'
imagenet_dirname = 'imagenet'
imagenet_dirname_path = os.path.join(data_dirname, imagenet_dirname)

#imagenet_filename = 'imagenet_object_localization_patched2019.tar.gz'
imagenet_filename = 'imagenet-object-localization-challenge.zip'

kaggle_ilsvrc_challenge_name = 'imagenet-object-localization-challenge'

remove_downloaded_archives = True

## 1. ILSVRC dataset

[[$\leftarrow$ Prev. part]](#Basic-imports-and-initialization) $\qquad$ [[Back to top]](#Navigation) $\qquad$ [[Next part $\to$]](#2.-CNN-models)

- [1.1 Download files](#1.1-Download-files)
- [1.2 Unpack downloaded files](1.2-Unpack-downloaded-files)
- [1.3 Fix structure of the dataset's directory tree](#1.3-Fix-structure-of-the-dataset's-directory-tree)

The ILSVRC dataset is available at the following page:
- https://www.kaggle.com/c/imagenet-object-localization-challenge

### 1.1 Download files

$\quad$[[Back to section]](#1.-ILSVRC-dataset)$\quad$[[Next subsect.$\to$]](#1.2-Unpack-downloaded-files)


In this study we used old version of the data provided by the same Kaggle page:
```imagenet_object_localization_patched2019.tar.gz```

At the moment, this archive is not available, thus the downloading procedure has been rewritten.

In [None]:
api.competition_download_files(
    kaggle_ilsvrc_challenge_name,
    path=imagenet_dirname_path
)

### 1.2 Unpack downloaded files

[[$\leftarrow$ Prev. subsect]](#1.1-Download-files) $\quad$[[Back to section]](#1.-ILSVRC-dataset)$\quad$[[Next subsect.$\to$]](#1.3-Fix-structure-of-the-dataset's-directory-tree)

In [19]:
imagenet_path = os.path.join(imagenet_dirname_path, imagenet_filename)
log_path = os.path.join(imagenet_dirname_path, 'imagenet_unzip.log')
#! tar -zxvf {imagenet_path} -C {imagenet_dirname_path} > {log_path}
! unzip {imagenet_path} -d {imagenet_dirname_path} > {log_path}
if remove_downloaded_archives:
    os.remove(imagenet_path)

### 1.3 Fix structure of the dataset's directory tree

[[$\leftarrow$ Prev. subsect]](#1.2-Unpack-downloaded-files) $\quad$[[Back to section]](#1.-ILSVRC-dataset)$\quad$

In [None]:
# hotfix, from https://discuss.pytorch.org/t/issues-with-dataloader-for-imagenet-should-
# i-use-datasets-imagefolder-or-datasets-imagenet/115742/8
current_dirname = os.getcwd()
%cd '{imagenet_dirname_path}/ILSVRC/Data/CLS-LOC/val'
! wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bash
! rm -rf .ipynb_checkpoints
%cd '{current_dirname}'

## 2. CNN models

[[$\leftarrow$ Prev. part]](#1.-ILSVRC-dataset) $\qquad$ [[Back to top]](#Navigation)

In this study 3 CNN models were considered:
- [Alexnet](https://pytorch.org/hub/pytorch_vision_alexnet/)
- [VGG11](https://pytorch.org/hub/pytorch_vision_vgg/)
- [ResNet18](https://pytorch.org/hub/pytorch_vision_resnet/)

In [22]:
# alexnet-owt-4df8aa71.pth
# resnet18-5c106cde.pth
# vgg11-bbd30ac9.pth

model_urls = {
    'alexnet': 'https://download.pytorch.org/models/alexnet-owt-4df8aa71.pth',
    'vgg11': 'https://download.pytorch.org/models/vgg11-bbd30ac9.pth',
    'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth'
}

In [None]:
os.makedirs(model_dirname, exist_ok=True)
for model_name, model_url in model_urls.items():
    current_path = os.path.join(model_dirname, model_url.split('/')[-1])
    ! wget {model_url} -O {current_path}
    print(f'Model {model_name} was successfully downloaded to {current_path}')