# Lab 3 Image Classification Labs

## Lab 3-1 Training ResNet with CIFAR 10 dataset using MXNet (Single Node)

This lab is to train ResNet neural network with CIFAR-10 training data to classify an image into 10 known categories. The code is written in MXNet, and it has been modified to use S3 bucket instead of a local file system.

> **AWS Deep Learning AMI Ubuntu Version (1.5_Jun2017)**
>
> * Upgraded to latest Ubuntu base AMI for 14.04
> * MXNet compiled with **S3 Support**
> * 7 Deep Learning Frameworks - contains the most popular Deep Learning Frameworks (MXNet, Caffe, Caffe2, Tensorflow, Theano, Torch and CNTK)


This python source code is based on *example/image-classification/train_cifar10.py* which is included in mxnet package of Deep Learning AMI. Below is the code changes made for S3 support.

* In train_cifar10.py, *download_cifar10()* function is modified to use the training data from Amazon S3 bucket.

```python
def download_cifar10():
    #data_dir="data"
    data_dir="s3://<S3 bucket name>/deeplearning"
    fnames = (os.path.join(data_dir, "cifar10_train.rec"),
              os.path.join(data_dir, "cifar10_val.rec"))
    #download_file('http://data.mxnet.io/data/cifar10/cifar10_val.rec', fnames[1])
    #download_file('http://data.mxnet.io/data/cifar10/cifar10_train.rec', fnames[0])
    return fnames
```

* In common/fit.py, 3 lines are commented out which are not for S3.

```python
def _save_model(args, rank=0):
    if args.model_prefix is None:
        return None
    #dst_dir = os.path.dirname(args.model_prefix)
    #if not os.path.isdir(dst_dir):
    #    os.mkdir(dst_dir)
    return mx.callback.do_checkpoint(args.model_prefix if rank == 0 else "%s-%d" % (
        args.model_prefix, rank))
```

# What to do with images 

* Classification
* Localization
* Segmentation
* Scene classification
* [Scene parsing](http://sceneparsing.csail.mit.edu/) : to segment and parse an image into different image regions associated with semantic categories, such as sky, road, person, and bed

<img src='./computer-vision-tasks.png' width=600>

If you want to learn more about deep learning on image, here is a good lecture [CS231n: Convolutional Neural Networks for Visual Recognition](http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf)

# Quick Look at Image Data Sets

### CIFAR-10 (https://www.cs.toronto.edu/~kriz/cifar.html)

CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. 

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

Here are the classes in the dataset, as well as 10 random images from each:

<img src="cifar10-classes.png" width="400">

### ImageNet (http://www.image-net.org/) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC)

An image database organized according to the [WordNet, a large lexical database of English,](https://wordnet.princeton.edu/) hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently we have an average of over five hundred images per node.

* 14,197,122 images, 21841 synsets indexed (as of July 5)
* For example, you can search ImagetNet for a synset (or synonym set), car - http://www.image-net.org/search?q=car

#### ILSVRC History Winner

* 2012 : AlexNet (https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf)
* 2013 : ZFNet (https://arxiv.org/pdf/1311.2901v3.pdf)
* 2014 : GoogLeNet (http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf)
* 2015 : Residual Neural Network (ResNet) (https://arxiv.org/pdf/1512.03385v1.pdf)
* 2016 : CUImage

> Want to learn more about DL? [The 9 Deep Learning Papers You Need To Know About](https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html)

#### ILSVRC 2016
 * Object localization for 1000 categories.
 * Object detection for 200 fully labeled categories.
 * Object detection from video for 30 fully labeled categories.
 * Scene classification for 365 scene categories 
 * Scene parsing for 150 stuff and discrete object categories 

### Mapillary Vistas Dataset and Large-Scale Scene Understanding (LSUN) Challenge

> [**AWS AI Blog**](https://aws.amazon.com/blogs/ai/) AWS Partners with Mapillary to Support the Large-Scale Scene Understanding Challenge at CVPR 2017
>
> https://aws.amazon.com/blogs/ai/aws-partners-with-mapillary-to-support-the-large-scale-scene-understanding-challenge-at-cvpr-2017/#more-1096

#### Mapillary (https://www.mapillary.com/dataset/vistas?lat=20&lng=0&z=1.5)

* the world’s largest and most diverse publicly available, pixel-accurately and instance-specifically annotated <font color='blue'>street-level imagery dataset</font> for empowering autonomous mobility and transport at the global scale
* 25,000 Images | 100 Categories | 60 Instance-wise Categories | 6 Continents | Variety of Weather, Season, Time of Day, Camera, and Viewpoint

<img src='https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2017/06/28/mapillary.gif' width=400>

#### LSUN Challenge (http://lsun.cs.princeton.edu/2017/)

* focusing on major tasks in scene understanding
 * scene classification
 * scene segmentation
 * saliency prediction
 * RGB-D detection

### COCO, Common Objects in Content (http://mscoco.org/)

a new image recognition, segmentation, and captioning dataset. 

COCO 2016 Detection and Keypoint Challenges

<img src='http://mscoco.org/static/images/dataset.jpg'>

## ResNet (Residual Net) [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)

* residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity
* achieves 3.57% error on the ImageNet test set
* 1st place on the ILSVRC (Large Scale Visual Recognition Challenge) 2015 classification task

<img src="new-resnet-arch.png">

## Overall Steps of  Lab 3-1

<img src='./lab3-1-figure-1.png' width=400>

Here I assume you've cloned CodeCommit repository to your EC2 instances (Deep Learning AMI)

**Step 1** Create an Amazon S3 bucket for training data and checkpoint

**Step 2** Upload data/cifar10_train.rec and data/ficar10_val.rec into the S3 bucket (Use S3 prefix as *deeplearning/*)

```
$ cd lab3/mxnet-resnet_cifar10/data
$ aws s3 cp cifar10_train.rec s3://<bucket-name>/deeplearning/cifar10_train.rec
$ aws s3 cp cifar10_val.rec s3://<bucket-name>/deeplearning/cifar10_val.rec
```
**Step 3** Config AWS credential for Amazon S3 access 

> **NOTE** MXNet does not support EC2 role. AWS Credential along with the region should be defined as environment variables.

```
$ export AWS_ACCESS_KEY_ID=
$ export AWS_SECRET_ACCESS_KEY=
$ export AWS_REGION=
```

**Step 4** Set S3 bucket name within train_cifar10.py.

```python
  9 def download_cifar10():
 10     #data_dir="data"
 11     data_dir="s3://<YOUR S3 BUCKET NAME HERE>/deeplearning"
 12     fnames = (os.path.join(data_dir, "cifar10_train.rec"),
 13               os.path.join(data_dir, "cifar10_val.rec"))
 14     #download_file('http://data.mxnet.io/data/cifar10/cifar10_val.rec', fnames[1])
 15     #download_file('http://data.mxnet.io/data/cifar10/cifar10_train.rec', fnames[0])
 16     return fnames
```

**Step 5** Run train_cifar10.py using the below command

This will load CIFAR-10 load training and validation data set from S3 bucket (Step 2), and train ResNet with 8 hidden layers. Once the training is completed, the model checkpoint will be saved to the specified S3 bucket (--model-prefix).

```
$ python ./train_cifar10.py \
    --model-prefix s3://<bucket-name>/deeplearning/trained-model/mxnet-resnet-cifar10 \
    --num-layers 8
```

### [Optional Task] Upload trained model into S3 (if S3 is not supported by MXNet)

For the Lab 4, upload trained model files (symbol and parameter) into S3 bucket. If you specify S3 bucket name in --model-prefix parameter, you do not need to upload the files. Just check the files in the S3 bucket.

```
$ aws s3 sync ./trained-model s3://<bucket name>/deeplearning
```

### [Optional Task] Change parameters and observe how the training is changing

* num_layers : how many hidden layers to have in the ResNet
* lr : learning rate
* batch_size : the batch size

### [Optional Task] Use different EC2 instance size (c4.large, c4.xlarge, c4.2xlarge, c4.4xlarge, and c4.8xlarge)


Instace size | Speed (samples/sec for each batch) | Time cost | Train accuracy | Valiation accuracy 
:---:| :---: | :---: | :---: | :---:
c4.large | 000 | 000 | 0.00 | 0.00
c4.xlarge | 000 | 000 | 0.00 | 0.00
c4.2xlarge | 000 | 000 | 0.00 | 0.00
c4.4xlarge | 000 | 000 | 0.00 | 0.00
c4.8xlarge | 000 | 000 | 0.00 | 0.00
p2.xlarge | 000 | 000 | 0.00 | 0.00





## Source Code Explained

### 1. Importing necessary modules including mxnet

find_mxnet, data, and fit module are located in common directory, and download_file is in common/util.py. If you are curious about how to implement ResNet, take a look at common/fit.py.

In [None]:
import os
import argparse
import logging
logging.basicConfig(level=logging.DEBUG)
from common import find_mxnet, data, fit
from common.util import download_file
import mxnet as mx

### 2. Define the location of training dataset

The original code downloads the training dataset from mxnet.io and store them into the local file system. In this example, it is modified to use training data in Amazon S3 bucket.

Refer to *Use data from S3 for training* (http://mxnet.io/how_to/s3_integration.html)

In [None]:
def download_cifar10():
    data_dir="s3://<bucket-name>/deeplearning"
    fnames = (os.path.join(data_dir, "cifar10_train.rec"),
              os.path.join(data_dir, "cifar10_val.rec"))
    return fnames

### 2. Prepare the training dataset, parse arguments for training

This ResNet implementation could be tuned through parameters such as;

* network : the name of neural network to use in the training. Default is ResNet.
* num_layers : the number of hidden layers. Keep it small for this lab. More layers, higher accuracy, but taking long time for training
* num_classes : the number of classificiation targets (CIFAR-10 has 10 classes while CIFAR-100 has 100)
* image_shape : input image shape in (number of channels, pixels in height, pixels in width). For CIFAR-10, use (3,28,28)
* batch_size : the size of mini-batch

> **NOTE** Stochastic Gradient Descent, Stochastic Gradient Descent
>
> * Batch Gradient Descent : Use all n examples in each iteration for a parameter update
> * Stochastic Gradient Descent : Use 1 example in each iteration for a parameter update
> * Mini-Batch Gradient Descent : Use b example in each iteration for a parameter update
>
> https://visualstudiomagazine.com/articles/2014/08/01/batch-training.aspx
>
> https://www.quora.com/What-are-the-meanings-of-batch-size-mini-batch-iterations-and-epoch-in-neural-networks
* num_epochs :
* lr : learning rate

In [None]:
# download data
(train_fname, val_fname) = download_cifar10()
   
# parse args
parser = argparse.ArgumentParser(description="train cifar10",
                                 formatter_class=argparse.ArgumentDefaultsHelpFormatter,conflict_handler='resolve')
fit.add_fit_args(parser)
data.add_data_args(parser)
data.add_data_aug_args(parser)
data.set_data_aug_level(parser, 2)

parser.set_defaults(
    # network
    network        = 'resnet',
    num_layers     = 110,
    #num_layers     = 5,
    # data
    data_train     = train_fname,
    data_val       = val_fname,
    num_classes    = 10,
    num_examples  = 50000,
    image_shape    = '3,28,28',
    pad_size       = 4,
    # train
    batch_size     = 128,
    num_epochs     = 300,
    lr             = .05,
    lr_step_epochs = '200,250',
)

args = parser.parse_known_args()
print(args)

### 3. Load the pre-defined ResNet module

ResNet is defined in a python code, symbols/resnet.py in this example. There are other nerual network definitions in symbols directory; *AlexNet, GoogleNet, Inception, LeNet, VGG, and MLP.*

Trained parameters are saved at the end of each epoch, so that a prediction can be made using the trained model. We will do this part in Lab 4 using AWS Lambda. The original example saves checkpoints into the local directory, but this workshop code modified common/fit.py to save the checkpoint into Amazon S3 bucket.

The checkpoint files are;

* **Symbol file (network design and graph)**: prefix_symbols.json
* **Parameters (or weights of NN)** : prefix_{epoch count}.json


In [None]:
# load network
from importlib import import_module
net = import_module('symbols.'+args.network)
sym = net.get_symbol(**vars(args))

model_prefix = 'mx_mlp'
checkpoint = mx.callback.do_checkpoint(model_prefix)

### 4. Run training

Training data is split into batches, and it repeats training the whole dataset num_epochs times. Each iteration is called as epoch. Upon every epoch completion, it runs a validation againt the validation dataset and gives out the accuracy.

> **NOTE** There are two accuracy values are give for each epoch; Traning accuracy and Validation accuracy. 
>
> * What do you expect from these two? 
> * What if training accuray gets higher while validation accuracy does not? 
> * What if training accuracy is saturated? 
>
> In that case, what you are going to do?

```
$ python ./train_cifar10.py --model-prefix s3://<bucket-name>/deeplearning/trained-model/mxnet-resnet-cifar10 --num-layers 8
<Symbol softmax>
INFO:root:start with arguments Namespace(batch_size=128, benchmark=0, data_nthreads=4, data_train='data/cifar10_train.rec', data_val='data/cifar10_val.rec', disp_batches=20, dtype='float32', gpus=None, image_shape='3,28,28', kv_store='device', load_epoch=None, lr=0.05, lr_factor=0.1, lr_step_epochs='200,250', max_random_aspect_ratio=0, max_random_h=36, max_random_l=50, max_random_rotate_angle=0, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0, min_random_scale=1, model_prefix='./trained-model/mxnet-resnet-cifar10', mom=0.9, monitor=0, network='resnet', num_classes=10, num_epochs=300, num_examples=50000, num_layers=8, optimizer='sgd', pad_size=4, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=0, wd=0.0001)
[14:37:14] src/io/iter_image_recordio_2.cc:135: ImageRecordIOParser2: data/cifar10_train.rec, use 1 threads for decoding..
[14:37:15] src/io/iter_image_recordio_2.cc:135: ImageRecordIOParser2: data/cifar10_val.rec, use 1 threads for decoding..
INFO:root:Epoch[0] Batch [20]	Speed: 148.28 samples/sec	accuracy=0.171875
INFO:root:Epoch[0] Batch [40]	Speed: 180.31 samples/sec	accuracy=0.225781
INFO:root:Epoch[0] Batch [60]	Speed: 208.81 samples/sec	accuracy=0.255469
INFO:root:Epoch[0] Batch [80]	Speed: 120.90 samples/sec	accuracy=0.267969
INFO:root:Epoch[0] Batch [100]	Speed: 168.35 samples/sec	accuracy=0.302734
INFO:root:Epoch[0] Batch [120]	Speed: 169.01 samples/sec	accuracy=0.311719
INFO:root:Epoch[0] Batch [140]	Speed: 229.92 samples/sec	accuracy=0.309375
INFO:root:Epoch[0] Batch [160]	Speed: 253.50 samples/sec	accuracy=0.330078
INFO:root:Epoch[0] Batch [180]	Speed: 182.88 samples/sec	accuracy=0.331641
INFO:root:Epoch[0] Batch [200]	Speed: 186.12 samples/sec	accuracy=0.348828
INFO:root:Epoch[0] Batch [220]	Speed: 134.41 samples/sec	accuracy=0.368750
INFO:root:Epoch[0] Batch [240]	Speed: 165.69 samples/sec	accuracy=0.365625
INFO:root:Epoch[0] Batch [260]	Speed: 209.33 samples/sec	accuracy=0.387500
INFO:root:Epoch[0] Batch [280]	Speed: 188.32 samples/sec	accuracy=0.381250
INFO:root:Epoch[0] Batch [300]	Speed: 192.40 samples/sec	accuracy=0.420703
INFO:root:Epoch[0] Batch [320]	Speed: 159.61 samples/sec	accuracy=0.417578
INFO:root:Epoch[0] Batch [340]	Speed: 127.66 samples/sec	accuracy=0.431250
INFO:root:Epoch[0] Batch [360]	Speed: 178.55 samples/sec	accuracy=0.446484
INFO:root:Epoch[0] Batch [380]	Speed: 205.05 samples/sec	accuracy=0.443359
INFO:root:Epoch[0] Train-accuracy=0.433594
INFO:root:Epoch[0] Time cost=288.064
INFO:root:Saved checkpoint to "./trained-model/mxnet-resnet-cifar10-0001.params"
INFO:root:Epoch[0] Validation-accuracy=0.493275
```

In [None]:
# train
fit.fit(args, sym, data.get_rec_iter, epoch_end_callback=checkpoint)