# Computer Vision: Image Classification

The first step to train a model for image recognition is finding images that belong to the desired class (or classes).

We will use for this task the **ImageNet** dataset. This is the dataset of the annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC) which is a popular challenge that has brought many important innovations. ImageNet currently has 14,197,122 images with 21841 synsets indexed (see [Imagenet](http://www.image-net.org/)).

ImageNet aims to provide on average 1k images to illustrate each one of their 100k synsets, the majority of the synsets are nouns (80.000+). The **synsets** (synonym sets) come from WordNet which is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.

More information about ImageNet can be found here: http://www.image-net.org/about-overview

More information about WordNet can be found here: https://wordnet.princeton.edu/
These useful classified images can be obtained using Python with the following steps:


### ImageNet

This dataset is publicly available which the goal of promoting the development of computer vision methods. It includes 1M images and 1k different object classes.
Typical challenges performed in this dataset include:

- Image classification: Predict the classes of objects present.
- Object localization: Image classification (and draw a bounding box around one example of each object present).
- Object detection: Image classification (and draw a bounding box around each object present).
- Labeling videos.

During the first five years the pace of improvements have been dramatic, with great success using CNNs and the papers published have become a must read.

<img src="./fig/ILSVRC_improvements.png" alt="RGB explination" width="500"/>


Some of the techniques applied to improve the accuracy of the predictions include:

#### Basics:
* CNNs: Convolutional Neural Networks
* ReLUS: Neurons with nonlinearity as Rectified Linear Units as they train faster than for example tanh units and they do not require input normalization to prevent saturation, although local normalization still helps generalization. Response normalization helps to reduce the final error.

#### Reducing Overfitting:
* Data Augmentation: 
    - Applying transformations (translation, rotations, zooming...) having the dataset multiplied by factors of 2048.
    - PCA performed on the RGB values, altering the intensities of the channels and adding to each image multiples of the principal componenets found with magnitues proportional to the corresponding eigenvalues itmes a random variables (from a Gaussian with mean 0 and st dev 01)
* Dropout: Combining predictions of many models is very effective to reduce errors but it's way too expensive for big neural networks that take days to train. Dropouts sets to zero the output of each hidden node with a probability of 0.5 and this way the drop out neurons do not contribute to the forward pass nor they participate on the back-propagation. So every time the NN presents a different architecture but all of them share weights. This technique reduces co-adaptations of neurons as they cannot rely on the presence of other particular neurons and they are forced to learn more robust features.

#### Details:
- Stochastic gradient descent for training with batches of 128, momentum of 0.9 and weight decay of 0.0005 which was not merely a regularizer but it reduces as well the training error.
- The initialization was done with a Gaussian distribution with st. dev. 0.01
- The learning rate at 0.01 and reduced three times prior to termination.

In [1]:
%matplotlib inline
%load_ext watermark
%watermark -v -m -p numpy,pandas,sklearn,cv2 -g

import os
import sys
import pickle
import argparse
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import watermark

CPython 3.7.3
IPython 7.8.0

numpy 1.17.2
pandas 0.25.1
sklearn 0.21.3
cv2 3.4.7

compiler   : Clang 4.0.1 (tags/RELEASE_401/final)
system     : Darwin
release    : 19.0.0
machine    : x86_64
processor  : i386
CPU cores  : 16
interpreter: 64bit
Git hash   : ba32e6742de096f4d2f8f37f15230974390412b7


[1] [ImageNet: A Large-Scale Hierarchical Image Database](https://ieeexplore.ieee.org/document/5206848), 2009.

[2] [ImageNet Classification with Deep Convolutional Neural Networks](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks), 2012.

[3] [Visualizing and Understanding Convolutional Networks](https://arxiv.org/abs/1311.2901), 2013.

[4] [Going Deeper with Convolutions](https://arxiv.org/abs/1409.4842), 2014.

[5] [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556), 2015.

[6] [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385), 2015.

[7] [ImageNet Large Scale Visual Recognition Challenge](https://link.springer.com/article/10.1007/s11263-015-0816-y), 2015.

[8] [Image Classification transfer learning with Inception v3](https://codelabs.developers.google.com/codelabs/cpb102-txf-learning/index.html#0)

[9] [Advanced Guide to Inception v3 on Cloud TPU](https://cloud.google.com/tpu/docs/inception-v3-advanced)