# Image Classification

## Intro to Image Classification

### What is Image Classification?

Image classification is the process of determining what is shown in an image.

[Silicon Valley - Hot Dog Not Hot Dog](https://www.youtube.com/watch?v=pqTntG1RXSY)

We can use deep learning to do this for us. When classifying images using deep learning, we use a convolutional neural network (CNN). CNNs are specifically designed to process images. For this session, we will steer clear of the theory behind CNN's and focus on the practical stuff.

![](../additional/img/cnn1.png)

## How do CNNs learn to classify?

First we need to decide what we want to teach our model.

Do we want our model to correctly identify:

* Cats and dogs?

* Different types of cats?

* Different types of dogs?

* Different types of flowers?

* Everything?

CNNs work in a similar way as a human brain (inspired by the way the visual cortex works). If we, as humans, are exposed to something new, it takes time for us to learn what it is.

### Can you identify this berry?

![](../additional/img/Wild_red_baneberry_1.jpg)

If our brain hasn't been exposed to to something, classification becomes a guessing game. This applies to deep learning as well.

We need to teach our model what different berries look like.




We need to train our model what the difference is between the different classes.

After training, when the model is faced with a new image that it hasn't seen before, it needs to decide for itself what is most likely shown in the image.

![](../additional/img/cat_dog.png)

![](../additional/img/cat_dog_2.jpg)

## Model Training / Retraining

### Pretrained architechture & Transfer Learning

**Architechture used**: [MobileNet](https://ai.googleblog.com/2017/06/mobilenets-open-source-models-for.html) is a a small efficient convolutional neural network, which is designed to accomodate the restricted resources for an on-device or embedded application.

The MobileNet is configurable in two ways:

- Input image resolution: 128,160,192, or 224px. Unsurprisingly, feeding in a higher resolution image takes more processing time, but results in better classification accuracy.
- The relative size of the model as a fraction of the largest MobileNet: 1.0, 0.75, 0.50, or 0.25.

We will use 160 and 0.75 for the first run.
```
mobilenet_v1_075_160
```

### Retraining script
The retrain script is from the [TensorFlow Hub repo](https://github.com/tensorflow/hub/blob/master/examples/image_retraining/retrain.py), and we have included in the workshop repo.

Before running the script, there are a few arguments worth mentioning:

- **bottleneck_dir** : path to cache bottleneck layer values as files
- **how_many_training_steps** : How many training steps to run before ending
- **summaries_dir** : Where to save training summary logs for TensorBoard
- **output_graph** : Where to save the trained graph
- **output_labels** : path to save the trained graph's labels
- **tfhub_module** : *url* of the model architecture to use from TensorFlow Hub  
- **image_dir** : path to labeled images for training

You can retrive the whole list of arguments using the following command.

```bash
python src/retrain.py -h
```

Let's run the training with the following commands:

```bash
python src/retrain.py \
    --bottleneck_dir tf_files/bottlenecks \
    --image_dir tf_files/data/train \
    --tfhub_module https://tfhub.dev/google/imagenet/mobilenet_v1_075_160/feature_vector/1 \
    --how_many_training_steps 500 \
    --train_batch_size 25 \
    --summaries_dir tf_files/retrain_logs \
    --output_graph tf_files/output_graph.pb \
    --output_labels tf_files/output_labels.txt
```

### How does it work? 
The above script downloads the pre-trained model, adds a new final layer, and trains that layer on the cat/dog photos we provided. It contains two main phases:
1. Calculates and caches the bottleneck values for each image
2. Actual training of the final layer which makes the classification

The techinques that make the training possible is **Transfer Learning**.

### Transfer Learning
**Transfer learning** is a machine learning method where a model developed for a related task is reused as the starting point for a new model. It has the following benefits

- Utilize the power of pre-trained model to extract features from images
- Faster...
- Less data & less resource (Google: 1000x computing power to replace ml expert)

The image below summarize the process. (Image Source: [Google Cloud Next '17](https://www.youtube.com/watch?v=EnFyneRScQ8&feature=youtu.be&t=4m17s) by *Yufeng Guo*)

![](../additional/img/retrain.png)

### Bottlenecks 
A **bottleneck** is an informal term we often use for the layer just before the final output layer that actually does the classification (TensorFlow Hub calls this an "image feature vector"). This penultimate layer has been trained to output a set of values that's good enough for the classifier to use to distinguish between all the classes it's been asked to recognize.

Because every image is reused multiple times during training and calculating each bottleneck takes a significant amount of time, it speeds things up to cache these bottleneck values on disk so they don't have to be repeatedly recalculated. The command you ran saves these files to the `bottlenecks/` directory. If you rerun the script, they'll be reused, so you don't have to wait for this part again.

### Actual training
You'll see a series of step outputs, each one showing training accuracy, validation accuracy, and the cross entropy. 
- **training accuracy** : percent of the images used in the current training batch were labeled with the correct class. 
- **validation accuracy** : the precision on a randomly-selected group of images different from the training.
    - **Overfitting** : model may overfit to the noise during training, so we use **validation accuracy** to measure the true performance. If the train accuracy is high but the validation accuracy remains low, that means the network is overfitting and remembering noise
- **cross entropy** : a loss function which gives a glimpse into how well the learning process is progressing. It should keep going down.

## Tensorboard

### What is TensorBoard?

TensorBoard is a suite of visualization tools.

The goal of Tensorboard is to remove some of the complexity and confusion behind deep learning through visualization.

TensorBoard can be used to:

* visualize quantitive metrics such as model accuracy, across the training process
* visualize how parameters change during the training process
* visualise your TensorFlow graph

### How do I access TensorBoard?

TensorBoard runs in the browser.

You can launch TensorBoard by running the following command:

```bash
tensorboard --logdir tf_files/retrain_logs
```

For Mac users like me that cannot access port 6006:

```bash
tensorboard --logdir=tf_files/retrain_logs/ --host localhost --port 8088
```

Check out the last line. It should say something like:

```bash 
TensorBoard 1.10.0 at http://localhost:8088 (Press CTRL+C to quit)
```

This means that TensorBoard is available at `http://localhost:8088`

Type in your equivalent of `jlocalhost:8088` into your browser (Chrome, Firefox, etc.)

You should now see TensorBoard with your browser!

**Mac users: certain versions of Firefox may not work. Try Chrome or update your Firefox to the latest version**

**Remember: Once you are done exploring Tensorboard, go to the place where you launched TensorBoard and press `CTRL+C` to quit TensorBoard. Otherwise it will keep on running in the background.**

### Some Interesting TensorBoard Features

### Scalars

Here you can visualize any recording you decided to make during model training. Things you might be interested in visualizing are things like:

* model accuracy across iterations
* the cross entropy (certainty of model predictions) across iterations

Recording can be performed for different processes, like training and validation processes. Visualizing these can help you gain a deeper understanding of the model's performance.

For example, if a model's training accuracy is very high towards the end of the iterations, but the validation accuracy is low, it means the model started memorizing the training data instead learning the features of the training images.

### Graphs

Here you can visually inspect your neural network. For the purposes of this workshop we aren't going to go into the details of this.

### Histograms

On the histogram tab you can visualize model parameters or value distributions across time.

This is really cool since you can see how the parameters change across time to best fit the data it is training on.

In our example, where we have two classes, we can see how our distributions become bimodal (two peaks) across iterations. These two peaks are assisting in identifying whether an image is a cat or dog. Each peak relates to one of our two classes.

## Predictions using Trained Model

Now that we have a model that is fairly good at knowing what a cat and dog looks like, let's give it some "never seen before" images to classify on!

![](../tf_files/data/test/4.jpg)

To run the prediction in the terminal, run the following:

```bash
python src/label_image.py --image tf_files/data/test/4.jpg \
    --graph tf_files/output_graph.pb \
    --labels tf_files/output_labels.txt \
    --input_height 160 \
    --input_width 160 \
    --input_layer Placeholder \
    --output_layer final_result
```

or the below code chunk if you are doing this in Jupyter.

In [1]:
%run -i label_image.py --image ../tf_files/data/test/4.jpg \
    --graph ../tf_files/output_graph.pb \
    --labels ../tf_files/output_labels.txt \
    --input_height 160 \
    --input_width 160 \
    --input_layer Placeholder \
    --output_layer final_result

  return f(*args, **kwds)
  from ._conv import register_converters as _register_converters


chenuen 0.76495105
koei 0.23504888


![](../tf_files/data/test/5.jpg)

To run the prediction in the terminal, run the following:

```bash
python src/label_image.py --image tf_files/data/test/5.jpg \
    --graph tf_files/output_graph.pb \
    --labels tf_files/output_labels.txt \
    --input_height 160 \
    --input_width 160 \
    --input_layer Placeholder \
    --output_layer final_result
```

or the below code chunk if you are doing this in Jupyter.

In [3]:
%run -i label_image.py --image ../tf_files/data/test/5.jpg \
    --graph ../tf_files/output_graph.pb \
    --labels ../tf_files/output_labels.txt \
    --input_height 160 \
    --input_width 160 \
    --input_layer Placeholder \
    --output_layer final_result

koei 0.9974957
chenuen 0.0025042275


Now I will use [deepart.io](https://deepart.io/hire/) artistic blend to create an image that contains both style.

![](../tf_files/data/test/blend.jpg)

To run the prediction in the terminal, run the following:

```bash
python src/label_image.py --image tf_files/data/test/catdog.jpg \
    --graph tf_files/output_graph.pb \
    --labels tf_files/output_labels.txt \
    --input_height 160 \
    --input_width 160 \
    --input_layer Placeholder \
    --output_layer final_result
```

or the below code chunk if you are doing this in Jupyter.

In [4]:
%run -i label_image.py --image ../tf_files/data/test/blend.jpg \
    --graph ../tf_files/output_graph.pb \
    --labels ../tf_files/output_labels.txt \
    --input_height 160 \
    --input_width 160 \
    --input_layer Placeholder \
    --output_layer final_result

koei 0.96212685
chenuen 0.03787317


## Hyperparameter Tuning 

There are other hyperparameters that may affect the model performance.

Again, you can read about the whole list of hyperparameters :
```bash
python src/retrain.py -h
```

Some of them are not really hyperparameters : i.e. `output_labels`, `summaries_dir` ....

Some are :
- `learning_rate`: magnitude of the updates to the final layer during training (default: 0.01)
- `how_many_training_steps`: how many training steps before the training stop
- `train_batch_size`: how many images are used for each training step
- ......

Try adjusting some of these hyperparameters to improve the validation accuracy!

## Using Different Image Dataset

* **Remember:** Our cat-dog model is a model that we trained using our own training data! With the application of transfer learning, we can build very good models on any type of images! Think of all the possibilities!

* If you build your own model, please note the file structure in the training image directory. Each subfolder represents a class that the model needs to learn. This is unique to the way the code works which we are using.

## Conclusion

Image processing is fun!