# Evaluate models for splitting the problem into 3 different problems.

## Disclaimer. A beginner here!
As a beginner, I'm not even sure I got the problem we are facing for sure, but for my understanding maybe splitting the problem into three different problems could make the things easier. It seems that masks are a bit different depending of the type of cells we find, then selecting the type of cells in the first place and then apply three different models to find the masks.

If I'm right, what I was trying to do is to pick the better and faster model to classify each image into the problem we want to solve and then apply the correct 2nd step model to extract the correct masks. To do that, I was wondering what model could fit better and what is the minimum necessary size (parameters/speed) of the model is needed to solve this first problem. 

**These notebook can have a totally wrong approach or probably have errors. Please, feel free to correct me on anything that can be improved or the whole point!**

## What you'll find in this notebook

This notebooks gets all the training images and trains (for 3 epochs) and evaluate on up to 26 different pre-trained models to find which one could fit better to solve what would be my first step of my final pipeline, splitting the test images to be predicted by three different models.

## Starting from the end... references.
As a beginner I'm taking advantage of the knowledge of others who has a great experience and knowledge. Here, I'm just compiling and joining the work of others.

### The dataset used:

I'm using the great COCO ready annotations created by [@Slawek Biel](https://www.kaggle.com/slawekbiel) and his great series of notebooks that I'm following to just understand how the whole thing works and being my guideline to enter to this competition:
- [Positive score with Detectron 1/3 - input data](https://www.kaggle.com/slawekbiel/positive-score-with-detectron-1-3-input-data/)
- [Positive score with Detectron 2/3 - Training](https://www.kaggle.com/slawekbiel/positive-score-with-detectron-2-3-training/)
- [Positive score with Detectron 3/3 - Inference](https://www.kaggle.com/slawekbiel/positive-score-with-detectron-3-3-inference)

The dataset I'm using is the on [@Slawek Biel](https://www.kaggle.com/slawekbiel) created through his first notebook: ["Sartorius - Cell Instance Segmentation" COCO](https://www.kaggle.com/slawekbiel/sartorius-cell-instance-segmentation-coco)

### Evaluation models

I extracted (copied) the idea of evaluating  multiple models from a great blog post by [Mario Stephan Leo](https://towardsdatascience.com/how-to-choose-the-best-keras-pre-trained-model-for-image-classification-b850ca4428d4) at Medium.
You can visit [his Github here](https://github.com/stephenleo/keras-model-selection) 
For further details on the evaluation models explanation, take a look at the Mario's blog post.


## An image classification problem

In order to split the problem into 3 different problems, It is needed to classify images into different categories. This is a image classification problem. We need to solve for every image which category it contains.

### Input images

For simplicity, I selected the pre-trained models that work with shape `(224, 224, 3)`. Instead of resizing the images of this competition `(702, 540)`, I decided to split images into chunks of the needed size and then having 6x images for training and 6 images (per image) for predicting and then summing up the prediction scores  for each chunk to obtain the final prediction for the whole image. Probably there're better ways to join predictions than just summing them.


#### Setting up paths

In [None]:
ANNOTATIONS_DIR = '../input/sartorius-cell-instance-segmentation-coco/'
IMG_BASE_DIR = "../input/sartorius-cell-instance-segmentation"

#### Preparing the dataset. Getting 6 images of shape (224,224,3) from every image 


In [None]:
from tqdm.auto import tqdm
import json
import numpy as np
from scipy.fftpack import dct, idct
from matplotlib.pyplot import figure
import matplotlib.pylab as plt
import cv2

def get_class(item, data):
    image_id = item['file_name'].split('/')[1].split('.')[0]
    for annot in data['annotations']:
        if annot['image_id']==image_id: return annot['category_id']
    return None

def convert_images(img_path, item, category, images, classes, plot=False):
    
    np.set_printoptions(threshold=np.inf)
    image = cv2.imread(img_path + '/' + item['file_name'])
    srcImageWidth, srcImageHeight = image.shape[1], image.shape[0]

    im_width = 224
    im_height = 224
    
    # Extracting 6 images per original image and its classes
    for left in range(3):
        for top in range(2):
            images.append(image[int(top*im_height):int(top*im_height)+im_height, left*im_width:(left*im_width)+im_width, :]/255)
            classes.append(np.asarray(category))

            if plot:
                figure(figsize=(20, 20), dpi=80)
                plt.gray()
                plt.subplot(121), plt.imshow(images[len(images)-1]), plt.axis('off'), plt.title('original image', size=20)
                plt.subplot(122), plt.imshow(image), plt.axis('off'), plt.title('original image', size=20)
                plt.show()

    return images, classes

def convert_dataset(json_filename):
    print(f"Converting {json_filename}")

    # Opening JSON file
    with open(ANNOTATIONS_DIR + '/' + json_filename) as json_file:
        data = json.load(json_file , encoding='utf-8')

    images = []
    classes = []
    for item in tqdm(data['images']):
        category = get_class(item, data)
        if category is not None:
            images, classes = convert_images(IMG_BASE_DIR, item, category, images, classes, plot=False) 
        #break

    return np.asarray(images), np.asarray(classes)

# Extracting data for training and testing
X_train, y_train = convert_dataset('annotations_train.json')
X_test, y_test = convert_dataset('annotations_val.json')

In [None]:
# Shuffling the training set
from sklearn.utils import shuffle
X_train, y_train = shuffle(X_train, y_train, random_state=111)
X_train.shape, y_train.shape

#### Setting up the model evaluator

In [None]:
# Imports
import tensorflow as tf
import tensorflow_datasets as tfds

import pandas as pd
import matplotlib.pyplot as plt
import inspect
from tqdm import tqdm

# Set batch size for training and validation
batch_size = 32

In [None]:
# Creating a list of pre-trained available models in keras
model_dictionary = {m[0]:m[1] for m in inspect.getmembers(tf.keras.applications, inspect.isfunction)}

In [None]:
# Setting up some parameters
num_train = X_train.shape[0]
num_validation = X_test.shape[0]
num_classes = 3
num_iterations = int(num_train/batch_size)

def get_joined_y(y_hot):
    #TODO: Probably can be joined with one single line
    y_joined = []
    for i in range(0,y_hot.shape[0],6):
        y_joined.append( np.argmax(np.sum(y_hot[i:i+6],axis=0)) )

    return np.asarray(y_joined)

# One hot enconding ys
y_train_hot = np.zeros((y_train.shape[0], num_classes))
for i in range(y_train.shape[0]):
    y_train_hot[i,:] = tf.one_hot(y_train[i]-1, depth=num_classes)

y_test_hot = np.zeros((y_test.shape[0], num_classes))
for i in range(y_test.shape[0]):
    y_test_hot[i,:] = tf.one_hot(y_test[i]-1, depth=num_classes)   
    
# Check one-hot encoding correctness
for i in range(y_train.shape[0]):
    #print(y_test[i], y_test_hot[i])
    assert(y_train_hot[i][y_train[i]-1]==1)
    
for i in range(y_test.shape[0]):
    #print(y_test[i], y_test_hot[i])
    assert(y_test_hot[i][y_test[i]-1]==1)

#### Evaluating on each available model

In [None]:
# Loop over each model available in Keras
import gc
y_join = get_joined_y(y_test_hot)

model_benchmarks = {'model_name': [], 'num_model_params': [], 'validation_accuracy': [], 'join_6_img_val_acc': []}
for model_name, model in tqdm(model_dictionary.items()):
    print(f"Evaluating on {model_name}")
    # Special handling for "NASNetLarge" since it requires input images with size (331,331)
    if 'NASNetLarge' in model_name:
        continue
        #input_shape=(331,331,3)
        #train_processed = train_processed_331
        #validation_processed = validation_processed_331
    #else:
    #    input_shape=(224,224,3)
    #    train_processed = train_processed_224
    #    validation_processed = validation_processed_224

    input_shape=(224,224,3)
    # load the pre-trained model with global average pooling as the last layer and freeze the model weights
    pre_trained_model = model(include_top=False, pooling='avg', input_shape=input_shape)
    pre_trained_model.trainable = False

    # custom modifications on top of pre-trained model
    clf_model = tf.keras.models.Sequential()
    clf_model.add(pre_trained_model)
    clf_model.add(tf.keras.layers.Dense(num_classes, activation='softmax'))
    clf_model.compile(loss='categorical_crossentropy', metrics=['accuracy'])
    history = clf_model.fit(X_train, y_train_hot, epochs=3, validation_data=(X_test, y_test_hot), steps_per_epoch=num_iterations)

    predictions = clf_model.predict(X_test)
    joined_predictions = get_joined_y(predictions)
    
    total_acc = joined_predictions[joined_predictions==y_join].shape[0] / joined_predictions.shape[0]
    print(f"Total Accuracy: {total_acc}")

    # Calculate all relevant metrics
    model_benchmarks['model_name'].append(model_name)
    model_benchmarks['num_model_params'].append(pre_trained_model.count_params())
    model_benchmarks['validation_accuracy'].append(history.history['val_accuracy'][-1])
    model_benchmarks['join_6_img_val_acc'].append(total_acc)

    del pre_trained_model, clf_model
    gc.collect()

## Evaluating the results

In [None]:
# Convert Results to DataFrame for easy viewing
benchmark_df = pd.DataFrame(model_benchmarks)
benchmark_df.sort_values('num_model_params', inplace=True) # sort in ascending order of num_model_params column
benchmark_df.to_csv('benchmark_df.csv', index=False) # write results to csv file
print("Sorting the benchmark by number parameters")
benchmark_df

#### Plotting the results

In [None]:
# Loop over each row and plot the num_model_params vs validation_accuracy
markers=[".",",","o","v","^","<",">","1","2","3","4","8","s","p","P","*","h","H","+","x","X","D","d","|","_",4,5,6,7,8,9,10,11]
plt.figure(figsize=(7,5))
for row in benchmark_df.itertuples():
    plt.scatter(row.num_model_params, row.join_6_img_val_acc, label=row.model_name, marker=markers[row.Index], s=150, linewidths=2)
plt.xscale('log')
plt.xlabel('Number of Parameters in Model')
plt.ylabel('Validation Accuracy after 3 Epochs')
plt.title('Accuracy vs Model Size')
plt.legend(bbox_to_anchor=(1, 1), loc='upper left'); # Move legend out of the plot

In [None]:
print("Printing benchmark by score")
benchmark_df.sort_values('join_6_img_val_acc')

## Conclusion

Reading the results I have 2 main conclusions:
- `MobileNetv2` seems a good candidate and `ResNet50V2` having 10x parameters relative to `MobileNetv2` can be a good candidate as well. Maybe an ensembling of the top models can be a good approach.
- These results are obtained with no data augmentation more than extracting 6 images per sample and only trained in 3 epochs. Probably scores can be improved.
- I haven't tested training with images without splitting into 6. Maybe it gets good or better results.

Happy to know what you think!!