# Image Classification

## Intro to Image Classification

**Responsible Person: Akshi**

**Estimated Duration: 15 mins**

### What is Image Classification?

Image classification is the process of determining what is shown in an image.

[Silicon Valley - Hot Dog Not Hot Dog](https://www.youtube.com/watch?v=ACmydtFDTGs)

We can use deep learning to do this for us. When classifying images using deep learning, we use a convolutional neural network (CNN). CNNs are specifically designed to process images. For this session, we will steer clear of the theory behind CNN's and focus on the practical stuff.

![https://www.google.ca/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=2ahUKEwi5gPbtgofdAhUVIDQIHaWvCvgQjRx6BAgBEAU&url=https%3A%2F%2Fgithub.com%2Ftavgreen%2Flanduse_classification&psig=AOvVaw0a9VXQ-t1Hm1QcJnpfawaa&ust=1535245687702429](../additional/img/number_3_cnn.png)

## How do CNNs learn to classify?

First we need to decide what we want to teach our model.

Do we want our model to correctly identify:

* Cats and dogs?

* Different types of cats?

* Different types of dogs?

* Different types of flowers?

* Everything?

CNNs work in a similar way as a human brain (inspired by the way the visual cortex works). If we, as humans, are exposed to something new, it takes time for us to learn what it is.

### Can you identify this berry?

![](../additional/img/Wild_red_baneberry_1.jpg)

If our brain hasn't been exposed to to something, classification becomes a guessing game. This applies to deep learning as well.

We need to teach our model what different berries look like.




We need to train our model what the difference is between the different classes.

After training, when the model is faced with a new image that it hasn't seen before, it needs to decide for itself what is most likely shown in the image.

![](../additional/img/cat_dog.png)

![](../additional/img/cat_dog_2.jpg)

## Model Training / Retraining

**Responsible Person: Xinbin**

**Estimated Duration: 15 mins**

### Pretrained architechture & Transfer Learning

**Architechture used**: [MobileNet](https://ai.googleblog.com/2017/06/mobilenets-open-source-models-for.html) is a a small efficient convolutional neural network, which is designed to accomodate the restricted resources for an on-device or embedded application.

The MobileNet is configurable in two ways:

- Input image resolution: 128,160,192, or 224px. Unsurprisingly, feeding in a higher resolution image takes more processing time, but results in better classification accuracy.
- The relative size of the model as a fraction of the largest MobileNet: 1.0, 0.75, 0.50, or 0.25.

We will use 224 and 0.5 for the first run.

In [4]:
%%bash
IMAGE_SIZE=224
MODEL_SIZE=0.5
ARCHITECTURE="mobilenet_${REL_SIZE}_${IMAGE_SIZE}"

# print the architure name
echo $ARCHITECTURE

mobilenet_0.5_224


### Retraining script
The retrain script is from the [TensorFlow Hub repo](https://github.com/tensorflow/hub/blob/master/examples/image_retraining/retrain.py), and we have included in the workshop repo.

Before running the script, there are a few arguments worth mentioning:

- **bottleneck_dir** : path to cache bottleneck layer values as files
- **how_many_training_steps** : How many training steps to run before ending
- **model_dir** : path to save the trained model information, e.g. graph, parameters, and etc.
- **summaries_dir** : Where to save summary logs for TensorBoard
- **output_graph** : Where to save the trained graph
- **output_labels** : path to save the trained graph's labels
- **architecture** : the model architecture to use  
- **image_dir** : path to labeled images for training

You can retrive the whole list of arguments using the following command.

```bash
python -m scripts.retrain -h
```

Let's run the training with the following commands:

```bash
python -m scripts.retrain \
  --bottleneck_dir=tf_files/bottlenecks \
  --how_many_training_steps=500 \
  --model_dir=tf_files/models/ \
  --summaries_dir=tf_files/training_summaries/"${ARCHITECTURE}" \
  --output_graph=tf_files/retrained_graph.pb \
  --output_labels=tf_files/retrained_labels.txt \
  --architecture="${ARCHITECTURE}" \
  --image_dir=tf_files/data/train
```

### How does it work? 
The above script downloads the pre-trained model, adds a new final layer, and trains that layer on the cat/dog photos we provided. It contains two main phases:
1. Calculates and caches the bottleneck values for each image
2. Actual training of the final layer which makes the classification

The techinques that make the training possible is **Transfer Learning**.

### Transfer Learning
**Transfer learning** is a machine learning method where a model developed for a related task is reused as the starting point for a new model. It has the following benefits

- Utilize the power of pre-trained model to extract features from images
- Faster...
- Less data & less resource (Google: 1000x computing power to replace ml expert)

The image below summarize the process. (Image retrived from a talk at [Google Cloud Next '17](https://www.youtube.com/watch?v=EnFyneRScQ8&feature=youtu.be&t=4m17s) by *Yufeng Guo*)
![](../additional/img/retrain.png)

### Bottlenecks 
A **bottleneck** is an informal term we often use for the layer just before the final output layer that actually does the classification (TensorFlow Hub calls this an "image feature vector"). This penultimate layer has been trained to output a set of values that's good enough for the classifier to use to distinguish between all the classes it's been asked to recognize.

Because every image is reused multiple times during training and calculating each bottleneck takes a significant amount of time, it speeds things up to cache these bottleneck values on disk so they don't have to be repeatedly recalculated. The command you ran saves these files to the `bottlenecks/` directory. If you rerun the script, they'll be reused, so you don't have to wait for this part again.

### Actual training
You'll see a series of step outputs, each one showing training accuracy, validation accuracy, and the cross entropy. 
- **training accuracy** : percent of the images used in the current training batch were labeled with the correct class. 
- **validation accuracy** : the precision on a randomly-selected group of images different from the training.
    - **Overfitting** : model may overfit to the noise during training, so we use **validation accuracy** to measure the true performance. If the train accuracy is high but the validation accuracy remains low, that means the network is overfitting and remembering noise
- **cross entropy** : a loss function which gives a glimpse into how well the learning process is progressing. It should keep going down.

## Tensorboard

**Responsible Person: Johannes**

**Estimated Duration: 10 mins**



In [1]:
# %%bash

# python scripts/retrain.py --image_dir ../tf_files/data/train \
#     --tfhub_module https://tfhub.dev/google/imagenet/mobilenet_v1_075_160/feature_vector/1 \
#     --how_many_training_steps 1000 \
#     --train_batch_size 25 \
#     --summaries_dir tmp/retrain_logs \
#     --output_graph tmp/output_graph.pb \
#     --output_labels tmp/output_labels.txt

# %%capture capt

# %run -i scripts/retrain.py --image_dir ../additional/data/train \
#     --tfhub_module https://tfhub.dev/google/imagenet/mobilenet_v1_075_160/feature_vector/1 \
#     --how_many_training_steps 1000 \
#     --train_batch_size 25 \
#     --summaries_dir tmp/retrain_logs \
#     --output_graph tmp/output_graph.pb \
#     --output_labels tmp/output_labels.txt


## What is TensorBoard?

TensorBoard is a suite of visualization tools. The goal of Tensorboard is to remove some of the complexity and confusion behind deep learning. TensorBoard can be used to:

* visualize your Tensorflow graph
* plot quantitative metrics about training and validation of your model

## How do I access TensorBoard?

Run the following following command in bash:

`tensorboard --logdir tmp/retrain_logs`

or alternatively run the code chunk below

In [None]:
! tensorboard --logdir tmp/retrain_logs

  from ._conv import register_converters as _register_converters
2018-09-10 16:08:35.654920: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-09-10 16:08:35.748898: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-09-10 16:08:35.749550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 3.30GiB
2018-09-10 16:08:35.749582: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-09-10 16:08:36.074260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-1

Check out the last line. It should say something like:

`TensorBoard 1.8.0 at http://jharmse:6006 (Press CTRL+C to quit)`

This means that TensorBoard is available at `http://jharmse:6006`

Type in your equivalent of `jharmse:6006` into your browser (Chrome, Firefox, etc.)

You should see TensorBoard.

**Remember: Once you are done exploring Tensorboard, go to the place where you launched TensorBoard and press `CTRL+C` to quit TensorBoard. Otherwise it will keep on running in the background.**

## A few interesting Tensorflow features

### Scalars

Here you can visualize any recording you decided to make during model training. Things you might be interested in visualizing are things like: 
    * model accuracy across iterations
    * the cross entropy (certainty of model predictions) across iterations.
    
You can visualizations for different runs, like training and validation. This can help you gain a deeper understanding of the model's performance. For example, if a model's training accuracy is very high towards the end of the iterations, but the validation accuracy is low, it means that the model has started memorizing the training data instead of simply learning the features of the images.

### 

In [9]:
%run -i scripts/label_image.py --image ../tf_files/data/test/4.jpg \
    --graph tmp/output_graph.pb \
    --labels tmp/output_labels.txt \
    --input_height 160 \
    --input_width 160 \
    --input_layer Placeholder \
    --output_layer final_result

dog 0.9991627
cat 0.0008373484


## Predictions using Trained Model

**Responsible Person: Akshi**

**Estimated Duration: 15 mins**

#### Notes

* Use ~3 image examples (2 good, 1 ambiguous)

* Talk about interpreting the class probabilities.

## Hyperparameter Tuning (Optional)

**Responsible Person: Xinbin**

**Estimated Duration: TBD**

## Using Different Image Dataset (Optional)

**Responsible Person: Akshi**

**Estimated Duration: TBD**

## Conclusion

**Responsible Person: Johannes**

**Estimated Duration: 15 mins**

#### Notes

* Talk about coding challenge

* Give a taster for the theory to be covered in the next meetup