## Deciding on a Model Using Manual Analysis with Gradio

This notebook documents some of the steps taken to choose the final model for deployment. 

For this project, we played around with four different models to see which performed best for our dataset. Our initial literature search showcased four different models that are popular for transfer learning including:

1. Densenet
2. Resnet
3. Vgg16
4. Inception

After conducting extensive runs to choose the [best image transformations](https://github.com/UBC-MDS/capstone-gdrl-lipo/blob/master/notebooks/manual-albumentation.ipynb) and doing hyperparameter tuning on the individual [models](https://github.com/UBC-MDS/capstone-gdrl-lipo/tree/master/notebooks), we used these optimized models to do a manual analysis of images to compare the models. We build a [local internal decision making tool app using gradio](https://github.com/UBC-MDS/capstone-gdrl-lipo/tree/master/notebooks/gradio_demo.ipynb) to analyze specific test cases. 

## Reviewing Specific Images

Below are some screenshots from the gradio app of some negative and positive images that the model has never seen. Six negative images and five positives images were chosen for a manual review in hopes to pick out ways to see how the model would do on examples that are visually hard for the human eye to identify and label correctly. All models were able to catch negative examples relatively well. Densenet stood out was able to capture 4 out of the 6 images well compared to the rest of the models with very high confidence. 

### Negative Image Example

We chose a difficult negative image example that features a circular ball that to the eye appears to be lipohypertrophy but it is not. We can see that although all models predict negative, Densenet is the most confident in its prediction.

![true_neg_densenet_right](../image/true_neg_densenet_right.png)

## Positive Image Example

Identifying positives was hard for all models and the below figure shows an example where all model struggled. It makes sense that all the models are struggling as we don't have a very large dataset (~300 total images with a 62:38 split for negative:positive) and it's hard to tell visually where the lipohypertrophy is present or not. However, we noticed that even when Densenet is wrong, it is less confident in its prediction. This is ideal as our capstone partner has identified that the model should be less confident in its prediction when its wrong.

![true_pos_all_wrong](../image/true_pos_all_wrong.png)

## Conclusion and Next Steps

From this manual visualization excercise, we were able to narrow down our model choice to Densenet. According to the recall and accuracy, this model has the highest score, so even when it is wrong, it is not as confident in its prediction. Lastly, due to resource limitation on the deployment of this application, DenseNet is also the smallest app. So, the next steps were to optimize the Densenet model to further improve the scores. Two steps taken were:

1. Increase the pos_weight argument of the optimizer so that there is a greater loss on positive examples
2. Play around with the dropout rate in the model architechture