# Purpose
Trains an Image Classification neural network model, using the classes provided.

# Usage
1. Install python3-dev and scipy with `sudo apt install python3-dev python3-scipy`.
2. Install all dependencies with `pip install -r requirements.txt`. _(VirtualEnv usage recommended)_
3. Create an account on [Weights & Biases](https://www.wandb.com/) and login through the terminal CLI.
4. _(Optional)_ To enable GPU usage, install the [NVIDIA packages](https://www.tensorflow.org/install/gpu#software_requirements)
5. Execute all the ***notebook cells***

# Example of Directory Structure 
```
.
├── ...
├── train.ipynb
├── requirements.txt
├── images_formatted
│   ├── class_A
│   │   ├── _image0.jpg
│   │   ├── _image1.jpg
│   │   └── _image2.jpg
│   └── class_B
│       ├── _image3.jpg
│       ├── _image4.jpg
│       └── _image5.jpg
├── model
│   ├── history.csv
│   ├── model.h5
│   └── model.tflite
└── wandb
    └── [Stores the data of all runs...]
```


## Imports and Setup

In [1]:
import os
import pandas as pd
import random

import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.models import Model
from tensorflow.keras.applications import MobileNetV2

import wandb
from wandb.keras import WandbCallback

# Allows dynamic GPU memory allocation, instead of using the whole memory.
# Must be added if using a RTX series' GPU and the TF-jupyter-gpu docker.
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

## Definitions and Parameters

The following variables must be defined:
* `TRAIN_DATA_FOLDER`: Folder which contains the images that will be used for the training. It's the output folder of `format_images.ipynb`

Training parameters:
* `NUMBER_OF_CLASSES`: Amount of classes that should be recognized by the trained model.
* `IMAGE_SHAPE`: A tuple composed of (width, height, channels) of the image formatting.
* `BATCH_SIZE`: Size of each training batch. [Recommended value](https://arxiv.org/abs/1206.5533): `32`.
* `MAX_EPOCHS`: The maximum number of training epochs. [Recommended value](https://keras.io/api/callbacks/early_stopping/): `10000` _(The training should be stopped by ***EarlyStop*** before hitting `MAX_EPOCHS`)_

In [2]:
TRAIN_DATA_FOLDER = "images_formatted"

NUMBER_OF_CLASSES = 2
IMAGE_SHAPE = (128, 128, 1)
BATCH_SIZE = 32
MAX_EPOCHS = 10000

## Dataset

The dataset will be composed by two groups:
* Training set (90%)
* Validation set(10%)

At each epoch, some of the images will be randomly selected and will receive distortion, zoom, rotation, shifting, mirroring and/or color inversion, in order to diversify the dataset.


In [4]:


def image_inverter(image):
    # Has a 50% chance to invert the image, in order to diversify the dataset.
    return 1 - image if random.choice((True, False)) else image

imageDataGenerator = ImageDataGenerator(rescale=1. / 255,
                                   shear_range=0.2,
                                   zoom_range=0.2,
                                   rotation_range=180,
                                   width_shift_range=0.2,
                                   height_shift_range=0.2,
                                   horizontal_flip=True,
                                   vertical_flip=True,
                                   validation_split=0.1,
                                   preprocessing_function=image_inverter)

train_generator = imageDataGenerator.flow_from_directory(TRAIN_DATA_FOLDER,
                                                    target_size=IMAGE_SHAPE[:2],
                                                    batch_size=BATCH_SIZE,
                                                    color_mode='grayscale' if IMAGE_SHAPE[2] == 1 else 'rgb',
                                                    class_mode='categorical',
                                                    subset='training')

validation_generator = imageDataGenerator.flow_from_directory(TRAIN_DATA_FOLDER,
                                                         target_size=IMAGE_SHAPE[:2],
                                                         batch_size=BATCH_SIZE,
                                                         color_mode='grayscale' if IMAGE_SHAPE[2] == 1 else 'rgb',
                                                         class_mode='categorical',
                                                         subset='validation')

Found 356 images belonging to 2 classes.
Found 39 images belonging to 2 classes.


## Training

The model is initialized with the MobileNetV2 structure.

The Adam (*Adaptive Moment Estimation*) optimization algorithm will be used in order to update iteratively the network weights during training, as it's [the most recommended](https://arxiv.org/abs/1609.04747) for image classification neural network.

An Early Stop method will be used to stop the training process, monitoring the validation loss, with a ***patience*** of 300 epochs.

The ***loss*** is calculated by using the ***Categorical Cross-Entropy*** method.

In [5]:
wandb.init(
    project="physiotherapy-platform",
    config={"MobileNetv2_alpha": 0.3}
)

model = MobileNetV2(
    input_shape=IMAGE_SHAPE,
    alpha=wandb.config.MobileNetv2_alpha,
    classes=NUMBER_OF_CLASSES,
    weights=None
)

optimizer = Adam()
earlystop = EarlyStopping(monitor='loss', patience=300, verbose=2, mode='min')

model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

history = model.fit(train_generator,
                    validation_data=validation_generator,
                    steps_per_epoch=None,
                    validation_steps=None,
                    epochs=MAX_EPOCHS,
                    callbacks=[earlystop, WandbCallback()])


[34m[1mwandb[0m: Currently logged in as: [33mkaiquesacchi[0m (use `wandb login --relogin` to force relogin)


Epoch 2181/10000
Epoch 2182/10000
Epoch 2183/10000
Epoch 2184/10000
Epoch 2185/10000
Epoch 2186/10000
Epoch 2187/10000
Epoch 2188/10000
Epoch 2189/10000
Epoch 2190/10000
Epoch 2191/10000
Epoch 2192/10000
Epoch 2193/10000
Epoch 2194/10000
Epoch 2195/10000
Epoch 2196/10000
Epoch 2197/10000
Epoch 2198/10000
Epoch 2199/10000
Epoch 2200/10000
Epoch 2201/10000
Epoch 2202/10000
Epoch 2203/10000
Epoch 2204/10000
Epoch 2205/10000
Epoch 2206/10000
Epoch 2207/10000
Epoch 2208/10000
Epoch 2209/10000
Epoch 2210/10000
Epoch 2211/10000
Epoch 2212/10000
Epoch 2213/10000
Epoch 2214/10000
Epoch 2215/10000
Epoch 2216/10000
Epoch 2217/10000
Epoch 2218/10000
Epoch 2219/10000
Epoch 2220/10000
Epoch 2221/10000
Epoch 2222/10000
Epoch 2223/10000
Epoch 2224/10000
Epoch 2225/10000
Epoch 2226/10000
Epoch 2227/10000
Epoch 2228/10000
Epoch 2229/10000
Epoch 2230/10000
Epoch 2231/10000
Epoch 2232/10000
Epoch 2233/10000
Epoch 2234/10000
Epoch 2235/10000
Epoch 2236/10000
Epoch 2237/10000
Epoch 2238/10000
Epoch 2239/100

Saves the model and converts it to TF-Lite.

In [6]:
os.makedirs('model', exist_ok=True)

df = pd.DataFrame.from_dict(history.history)
df.to_csv('model/history.csv', encoding='utf-8', index=False)
model.save('model/model.h5')
model.save('model/saved')
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

# Save the TF Lite model.
with tf.io.gfile.GFile('model/model.tflite', 'wb') as f:
    f.write(tflite_model)

with tf.io.gfile.GFile(os.path.join(wandb.run.dir, 'model.tflite'), 'wb') as f:
    f.write(tflite_model)

model.save(os.path.join(wandb.run.dir, "model.h5"))
model.save(os.path.join(wandb.run.dir, "SavedModel"))

INFO:tensorflow:Assets written to: model/saved/assets
INFO:tensorflow:Assets written to: /tmp/tmpichc3vug/assets
INFO:tensorflow:Assets written to: /tf/wandb/run-20210107_232438-1dfwiqm1/files/SavedModel/assets
INFO:tensorflow:Assets written to: /tf/wandb/run-20210107_232438-1dfwiqm1/files/SavedModel/assets
