# Introduction


[TensorFlow](https://www.tensorflow.org/) is an open source library for numerical computation, specializing in machine learning applications.
This part aims at using transfer learning technique to classify plankton.

### What you will build

In this codelab, you will learn how to run TensorFlow on a single machine, and will train a simple classifier to classify images of plankton.


![](./assets/60.image.jpg)

![jelly](./assets/results_jelly.png)

We will be using transfer learning, which means we are starting with a model that has been already trained on another problem. We will then retrain it on a similar problem. Deep learning from scratch can take days, but transfer learning can be done in short order.

We are going to use a model trained on the [ImageNet](http://image-net.org/) Large Visual Recognition Challenge [dataset](http://www.image-net.org/challenges/LSVRC/2012/). These models can differentiate between 1,000 different classes, like Dalmatian or dishwasher. You will have a choice of model architectures, so you can determine the right tradeoff between speed, size and accuracy for your problem.

We will use this same model, but retrain it to tell apart a small number of classes based on our own examples.


### What you'll Learn

- How to use Python and TensorFlow to train an image classifier
- How to classify images with your trained classifier

### What you need

- A basic Python knowledge
- A basic understanding of Linux commands

## Plankton

Plankton are critically important to our ecosystem, accounting for more than half the primary productivity on earth and nearly half the total carbon fixed in the global carbon cycle. They form the foundation of aquatic food webs including those of large, important fisheries. Loss of plankton populations could result in ecological upheaval as well as negative societal impacts, particularly in indigenous cultures and the developing world. Plankton’s global significance makes their population levels an ideal measure of the health of the world’s oceans and ecosystems.

![plankton](https://storage.googleapis.com/kaggle-competitions/kaggle/3978/media/Plankton-Diagram3.png)

Traditional methods for measuring and monitoring plankton populations are time consuming and cannot scale to the granularity or scope necessary for large-scale studies. Improved approaches are needed. One such approach is through the use of an underwater imagery sensor. This towed, underwater camera system captures microscopic, high-resolution images over large study areas. The images can then be analyzed to assess species populations and distributions.

Manual analysis of the imagery is infeasible – it would take a year or more to manually analyze the imagery volume captured in a single day. Automated image classification using machine learning tools is an alternative to the manual approach. Analytics will allow analysis at speeds and scales previously thought impossible. The automated system will have broad applications for assessment of ocean and ecosystem health.

## Setup and installation 

### Install TensorFlow

Before we can begin the tutorial you need to [install TensorFlow](https://www.tensorflow.org/versions/r1.7/install/) version 1.7

In [None]:
# Imports
!pip install --upgrade "tensorflow==1.7.*"

### Clone the git repository

All the code used in this codelab is contained in this git repository. Clone the repository and cd into it. This is where we will be working.


In [None]:
!git clone https://github.com/aymen-mouelhi/cml-plankton-classifier

In [None]:
%cd cml-plankton-classifier

In [None]:
!ls

## Download the training images

Before you start any training, you'll need a set of images to teach the model about the new classes you want to recognize. Download the photos (791 MB) by invoking the following two commands:

In [None]:
# Download the training images
!curl https://cmlplankton.s3.amazonaws.com/images.zip \
    | tar xz

You should now have a copy of the plankton photos. Confirm the contents of your working directory by issuing the following command:

In [None]:
!ls images

You should now have something like this: 

![](./assets/folder.png)

<span style="color:blue">Note: You will need to open a new terminal window and copy the instructions below.</span>

##  (Re)training the network

### Configure your MobileNet

In this exercise, we will retrain a [MobileNet](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html). MobileNet is a a small efficient convolutional neural network. "Convolutional" just means that the same calculations are performed at each location in the image.

The MobileNet is configurable in two ways:

- Input image resolution: 128,160,192, or 224px. Unsurprisingly, feeding in a higher resolution image takes more processing time, but results in better classification accuracy.
- The relative size of the model as a fraction of the largest MobileNet: 1.0, 0.75, 0.50, or 0.25. We will use 224 0.5 for this codelab.

With the recommended settings, it typically takes only a couple of minutes to retrain on a laptop. You will pass the settings inside Linux shell variables. Set those variables in your shell:

In [None]:
IMAGE_SIZE=224

In [None]:
ARCHITECTURE="mobilenet_0.50_${IMAGE_SIZE}"

### More about MobileNet performance (optional)

The graph below shows the first-choice-accuracies of these configurations (y-axis), vs the number of calculations required (x-axis), and the size of the model (circle area).

16 points are shown for MobileNet. For each of the 4 model sizes (circle area in the figure) there is one point for each image resolution setting. The 128px image size models are represented by the lower-left point in each set, while the 224px models are in the upper right.

Other notable architectures are also included for reference. "GoogleNet" in this figure is "Inception V1" in this table.

![models](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/img/70170cbb89d318b1.png)

<span style="color:blue">Note: You will need to open a new terminal window and copy the instructions below.</span>

### Start TensorBoard

Before starting the training, launch tensorboard in the background. TensorBoard is a monitoring and inspection tool included with tensorflow. You will use it to monitor the training progress.

In [None]:
!tensorboard --logdir tf_files/training_summaries &

#### Note:
This command will fail with the following error if you already have a tensorboard process running:

<span style="color:red">ERROR:tensorflow:TensorBoard attempted to bind to port 6006, but it was already in use</span>

You can kill all existing TensorBoard instances with:


In [None]:
# Uncomment to kill all existing TensorBoard instances
# !pkill -f "tensorboard"

#### Investigate the retraining script

The retrain script is from the TensorFlow Hub repo, but it is not installed as part of the pip package. So for simplicity I've included it in the codelab repository. You can run the script using the python command. Take a minute to skim its "help".

In [None]:
!python -m scripts.retrain -h

#### Run the training

As noted in the introduction, ImageNet models are networks with millions of parameters that can differentiate a large number of classes. We're only training the final layer of that network, so training will end in a reasonable amount of time.

Start your retraining with one big command (note the --summaries_dir option, sending training progress reports to the directory that tensorboard is monitoring) :

In [None]:
%cd cml-plankton-classifier

In [None]:
!pygmentize scripts/retrain.py

In [None]:
!python -m scripts.retrain \
  --bottleneck_dir=tf_files/bottlenecks \
  --how_many_training_steps=500 \
  --model_dir=tf_files/models/ \
  --summaries_dir=tf_files/training_summaries/"${ARCHITECTURE}" \
  --output_graph=tf_files/retrained_graph.pb \
  --output_labels=tf_files/retrained_labels.txt \
  --architecture="${ARCHITECTURE}" \
  --image_dir=images

Note that this step will take a while.

This script downloads the pre-trained model, adds a new final layer, and trains that layer on the plankton photos you've downloaded.

ImageNet does not include any of these plankton species we're training on here. However, the kinds of information that make it possible for ImageNet to differentiate among 1,000 classes are also useful for distinguishing other objects. By using this pre-trained network, we are using that information as input to the final classification layer that distinguishes our plankton classes.

#### Bottleneck

A bottleneck is an informal term Google often use for the layer just before the final output layer that actually does the classification. "Bottleneck" is not used to imply that the layer is slowing down the network. We use the term bottleneck because near the output, the representation is much more compact than in the main body of the network.

Every image is reused multiple times during training. Calculating the layers behind the bottleneck for each image takes a significant amount of time. Since these lower layers of the network are not being modified their outputs can be cached and reused.

#### Optional: I'm NOT in a hurry!
The first retraining command iterates only 500 times. You can very likely get improved results (i.e. higher accuracy) by training for longer. To get this improvement, remove the parameter --how_many_training_steps to use the default 4,000 iterations.

In [None]:
!python -m scripts.retrain \
  --bottleneck_dir=tf_files/bottlenecks \
  --model_dir=tf_files/models/"${ARCHITECTURE}" \
  --summaries_dir=tf_files/training_summaries/"${ARCHITECTURE}" \
  --output_graph=tf_files/retrained_graph.pb \
  --output_labels=tf_files/retrained_labels.txt \
  --architecture="${ARCHITECTURE}" \
  --image_dir=images

### Training And TensorBoard

Once the script finishes generating all the bottleneck files, the actual training of the final layer of the network begins.

By default, this script runs 4,000 training steps. Each step chooses 10 images at random from the training set, finds their bottlenecks from the cache, and feeds them into the final layer to get predictions. Those predictions are then compared against the actual labels, and the results of this comparison is used to update the final layer's weights through a backpropagation process.

As it trains, you'll see a series of step outputs, each one showing training accuracy, validation accuracy, and the cross entropy:

- The training accuracy shows the percentage of the images used in the current training batch that were labeled with the correct class.
- Validation accuracy: The validation accuracy is the precision (percentage of correctly-labelled images) on a randomly-selected group of images from a different set.
- Cross entropy is a loss function that gives a glimpse into how well the learning process is progressing. (Lower numbers are better.)


The figures below show an example of the progress of the model's accuracy and cross entropy as it trains. If your model has finished generating the bottleneck files you can check your model's progress by [opening TensorBoard](http://0.0.0.0:6006/), and clicking on the figure's name to show them. Ignore any warnings that TensorBoard prints to your command line.

The first figure shows accuracy (y-axis) as a function of training progress (x-axis):

![graph](./assets/train.png)

Two lines are shown. The orange line shows the accuracy of the model on the training data. While the blue line shows the accuracy on the test set (which was not used for training). This is a much better measure of the true performance of the network. If the training accuracy continues to rise while the validation accuracy decreases then the model is said to be "overfitting". Overfitting is when the model begins to memorize the training set instead of understanding general patterns in the data.

As the process continues, you should see the reported accuracy improve. After all the training steps are complete, the script runs a final test accuracy evaluation on a set of images that are kept separate from the training and validation pictures. This test evaluation provides the best estimate of how the trained model will perform on the classification task.

You should see an accuracy value of between 85% and 99%, though the exact value will vary from run to run since there's randomness in the training process. (If you are only training on two classes, you should expect higher accuracy.) This number value indicates the percentage of the images in the test set that are given the correct label after the model is fully trained.

### Using the Retrained Model

The retraining script writes data to the following two files:

- ***tf_files/retrained_graph.pb***, which contains a version of the selected network with a final layer retrained on your categories.
- ***tf_files/retrained_labels.txt***, which is a text file containing labels.

#### Classifying an image

The codelab repo also contains a copy of tensorflow's label_image.py example, which you can use to test your network. Take a minute to read the help for this script:

In [None]:
!python -m scripts.label_image -h

In [None]:
!pygmentize scripts/label_image,py

Now, let's run the script on this image of a Calanoida:

![calanoida](./cml-plankton-classifier/images/Calanoida/10.dx5duqtumaam4og.jpg)

In [None]:
!python -m scripts.label_image \
    --graph=tf_files/retrained_graph.pb  \
    --image=images/Calanoida/10.dx5duqtumaam4og.jpg

Each execution will print a list of plankton labels, in most cases with the correct plankton on top (though each retrained model may be slightly different).

You might get results like this for a Calanoida photo:

![results](./assets/results_Calanoida.png)

This indicates a high confidence (~95%) that the image is a Calanoida, and low confidence for any other label.

You can use label_image.py to classify any image file you choose, either from your downloaded collection, or new ones. You just have to change the --image file name argument to the script.

![another](https://upload.wikimedia.org/wikipedia/commons/thumb/f/f0/Cyanea_kils.jpg/180px-Cyanea_kils.jpg)

In [None]:
!python -m scripts.label_image \
    --graph=tf_files/retrained_graph.pb  \
    --image=../assets/180px-Cyanea_kils.jpg 

### Deploy our Classification Model

####  Flask

Flask is a micro web framework written in Python. It is classified as a microframework because it does not require particular tools or libraries.[3] It has no database abstraction layer, form validation, or any other components where pre-existing third-party libraries provide common functions. However, Flask supports extensions that can add application features as if they were implemented in Flask itself. Extensions exist for object-relational mappers, form validation, upload handling, various open authentication technologies and several common framework related tools. Extensions are updated far more regularly than the core Flask program.[4]

Applications that use the Flask framework include Pinterest,[5] LinkedIn,[6] and the community web page for Flask itself.

In [None]:
#### Clone the Github repository
!git clone https://github.com/aymen-mouelhi/cml-plankton-flask.git

## TODO: Display the section that does classify

In [None]:
#### Copy your model files
cp tf_files/models/mobilenet_0.50_224 ./cml-plankton-flask/models/mobilenet

In [None]:
cp tf_files/models/retrained_graph.pb ./cml-plankton-flask/models/mobilenet

In [None]:
cp tf_files/models/retrained_labels.txt ./cml-plankton-flask/models/mobilenet

In [None]:
%cd ..

In [None]:
%cd cml-plankton-flask

In [None]:
!pygmentize app.py

In [None]:
# start web server
!python app.py

### Testing

In order to test our chatbot, we need to deploy it in a webserver where it can be accessible from recast.ai
This can be done by deploying our Flask application to Cloud Foundry or to aws. But, as this require a complicated setup, we can use ngrok instead, for testing purposes only. Ngrok will allow external access to your defined http(s) port and will provide an external url that can be included in recast.


#### Download ngrok
You can get the latest version of ngrok from [here](https://ngrok.com/)

In [None]:
#### Deploy using ngrok
!ngrok http 5000

## Chatbot

In this part we will start creating our chatbot using SAP Conversational AI. 


### Chatbot Settings

We will need to tell our chatbot from where it will get the answers for the classified plankton. Head to the settings section and insert your HTTPS link for your web server


![configuration](./assets/configuration.png)

### Intents Creation

Our bot will need to have an intent for classifying plankton images. 
Let's start by creating the intent named "classify_intent"

![intent](./assets/intent.png)

Once we have created the intent, the next step is to define what should we do once we detect an intent. This is done throught the tab "Build".
- Create a new skill "handle_classify_intent" and provide a description
- Define when the skill should be trigerred
- As we will be expecting a url from the user, we would need to define a requirement parameter:
    - #url as image
- Now the interesting part, let's define the action. In our case, the chatbot should send an HTTP call to our deployed python application. For this workshop, we won't need authentication, so just :
    - Select the type of the HTTP request: POST
    - type /classify as the endpoint
    
![skill](./assets/skill.png)

## Conclusion

![whale food](./assets/whale_food.png)

That's it ! Throught this workshop, you retrained a deep learning model to classify plankton, and then you used it to asnwer users questions.
The same technique can be used to classify flowers and create a flower clasification chatbot :)