# Interactive Machine Learning - Exercise 03

In this exercise we will learn about cooperative machine learning.
Our goal is it to build a very basic cooperative machine learning user interface and use it to extend our Pokedex model from the last exercise.

The steps you are going to cover are as follows:
* Pretrain our Pokedex model with the original data
* Manually label a small bit of new data
* Train our model on the new data
* Use the model in a cooperative workflow to annotate the rest of the dataset

Please read each exercise carefully before you start coding! You will find a number in the comments before each step of coding you will do. Please refer to these numbers if you have any questions.

## 0. Import the libraries
As always we are providing a list useful packages in the import section below.
Keep in mind that you can import additional libraries at any time and that you do not need to use all the imports if you know another solution for a given task.

In [None]:
import ipywidgets as widgets
import os
import numpy as np
import glob
import random
from IPython.display import Image
from ipywidgets import interact_manual
from tensorflow import keras
from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input
from tensorflow.keras.layers import Flatten, Dense
from tensorflow.keras import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing import image
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from shutil import copyfile

## 1. Pretrain the model
In this part we are going to pretrain our model on the pokemon images you already know.
To this end we will use the same VGG16 model as last week with the following training procedure:

Preprocessing:
* Imagesize (224,224)
* Vgg16 standard preprocessing from the Keras framework

Datasplit:
* Use 90% of the data to train and 10% to valitdate your results

Training 1:
* Initialize the model with the imagenet weights
* Freeze all convolution layers
* Train the model using the following settings:
 * 5 Epochs
 * Adam Optimizer with default Parameters
 * categorical cross entropy loss
 * Batchsize 32

Training 2:
*  Unfreeze the last two convolutional Blocks
*  Continue training with the following settings:
 * 10 Epochs
 * Adam Optimizer with a learning rate of 0.0001
 * Batchsize of 32

A convolutional Block in the VGG16 architecture consists of 2 to 3 Conv Layers and on Pooling layer.
You can access a models layer directly via `model.layers`.
Read up on how to freeze layers [here](https://keras.io/guides/transfer_learning/), in case you did not use this technique in the last exercise.
Your model should achieve a validation accuracy of close to 100% .

In [None]:
# 1. Load data for pretraining and apply preprocessing

# 2. Split data into training and test partition

# 3. Define network

# 4. Freeze weights and perform training step 1

# 5. Unfreeze weights and perform training step 2

## 2. Pretrain the model
Now that we have our initial model we are going to extend it with some more pokemon.
[Here](https://megastore.uni-augsburg.de/get/OxpI3M_JyU/) you will find roughly 6000 images of the following Pokemon:
* Blastoise
* Charizard
* Charmeleon
* Ivysaur
* Venusaur
* Wartortle

Unfortunately images are not labeled yet. To speed things up a bit we are only going to label a small part of the data ourselves, and then build a model to help us doing the rest.
(Actually this will probably not be faster, but more fun anyway :) ).
In your project directory you will find a 'data_labled' folder, which we will use to store the labeled data.
This time we will use the folder structure to create our labels and train / validation partitions.
Inside the folder you will therefore find a 'train' and an 'val' folder, each of them containing subfolders for each class.

In the following step you should at first manually pick at least 5 examples per class and copy them from the 'data' folder to the train partition of the 'data_labeled' folder.
To then take full advantage of the current way the data is structured, we will use keras data generators in combination with the `flow_from_directory` to dynamically read the input data and feed it to our model.
You can find an example of such data generators [here](https://keras.io/api/preprocessing/image/#flowfromdirectory-method).

Specifically we are going to write a function `train_loop()` which creates two data generators (one for training and one for validation) and trains a model for the new Images on features extracted from our current Pokexedx model.
To this end you can simply rebuild the structure of the original model, but replace the number of output classes.
To load the weights you can then use the following code snippet:
`model.layers[-1]._name = 'new_output'`</br>
`model.load_weights(weight_path, by_name=True)`</br>

Freeze all layers but the dense layers, we will only need those and want to speed up the training process a bit.

In [None]:
# 6. Copy at least 5 images per class from the data folder to the correct partition in the data_labeled folder

# 7. Write a function train_loop()

def train_loop():

    # 8. Build model

    # 9. Load weights

    # 10. Build data generators

    # 11. Fit the model to the data for a few epochs

# 12. Call train loop

## 3. Interactive UI

In this part of the exercise we are going to put our pretrained model to good use by employing it in a cooperative workflow.
To this end we gonna build a minimal cooperative machine learning using interface in this python notebook.
Our user interface will consist of the following components:

* (optional) A progressbar to keep to motivation up
* A slider to set a high confidence threshold
* A slider to set the mid confidence threshold
* Some radio buttons to choose the label
* A button to save the annotation and label and show the next image
* A button to retrain our model
* A button to use our model to predict our dataset

The final our UI should look a like this:

![img](https://hcm-lab.de/cloud/index.php/s/ak3txGXepnt9NxS/preview)

The 'retrain' button should call the `train_loop()`  function from before to retrain the model on all labeled data.
The 'predict' button should create a list of predictions for all unlabeled images.
All predictions that are above the high confidence threshold, set by the respective slider, should be automatically accepted as correct label and copied to the respective folders in the training data folder.
Additionally you should implement a garbage label to delete unfitting images.
Potential reasons to consider an Image as garbage are if no Pokemon is visible, too many Pokemon are visible, non of the Pokemon we want to train are visible, the Imagefile is broken etc.
When you are pressing the 'next' button the current image should be copied to the right folder in the training dataset, depending on the current value of the radio button.
Afterwards the next image should be chosen from all predicted images, where the confidence is greater or equal than the value set by the mid_threshold slider.
The current value of the radiobutton should then be set to the prediction for this respective image.
Optionally you can also implement a progressbar to track your progress for you annotations.

You can use the ipywidgets library to create the UI.
You can find an IPython tutorial [here](https://towardsdatascience.com/interactive-controls-for-jupyter-notebooks-f5c94829aee6) and the api documentation [here](https://towardsdatascience.com/interactive-controls-for-jupyter-notebooks-f5c94829aee6).
Note, that Pycharm might not play well with the the widgets in all scenarios. It's best to view them in the browser by visting: http://localhost:8888 after you started your notebook.

In [None]:
# 13. Build UI

## 4. repeat(annotate, train, predict)
After you are done creating the UI, we are now going to label the whole dataset together with our model.
To this end use your model to predict and improve iteratively in the following manner:

Set the high confidence slider to a value greater or equal than 0.95 and the mid confidence slider to at least 0.8

Repeat 3 times:

* Call automatic prediction
* Check images that have been above the maximum confidence threshold manually by looking at the content of the respective folders. Make corrections if necessary.
* Annotate remaining images that have been over the mid confidence score
* Retrain you model

Do you notice any change in the amount of images you have to annotate each time?

Repeat till all data is annotated:

* Call automatic prediction
* Annotate remaining images that have been over the mid confidence score
* Retrain you model
* Adjust both confidence scores based on how much you trust your model

Describe your subjective impression of the annotation process. Did you have the feeling, that the cooperative workflow is helpful?