# RC Case Study

## Problem Statement

Build a deep learning pipeline consisting of one or more models to identify the type of food and drink present inside the test images. Food must be classified into any one of the following three categories (labelled data shared for training) 

 - Mexican food 

 - Fast Food 

 - Steak 

 Similarly, drinks must be classified into the following three categories (labelled data shared for training) 

 - Beer 

 - Soda 

 - Wine 

Based on the Food-Drink association coming out of the test images, present your insights about consumption patterns of these items.

## Submission Instructions

 - You need to submit two folders, one for codes and the other with results
 - In the code folder, submit all your codes along with a readme/ execution instructions. You can also submit a Jupyter notebook (or equivalent). Please comment your codes for better comprehension
 - In the result folder, submit a final document indicating the accuracy score and ROC curve of the final model as well as the insights around food-drink association that you draw from the test images 

## Evaluation Criteria/Questionnaire

 - F1 Score
```
Ans: F1 Score =  2*((precision*recall)/(precision+recall))
In this case study we are not focussing on F1 score of training model, any decent accuracy (>85) and right approch will work
```
 - How the layers have been built? What has gone into building the layers?
 - Explain the architecture (VGG, ResNet pertained architecture)
 - Explain why was a particular architecture chosen
 ```
 Subjective answer, ask about the different architecture they have tried and reason to select/reject that architecture. 
 Typically: Transfer learning on pretrained architecture (VGG11, ResNet, etc) will not work as the training data is quite less. In our base model we have used 3 Conv layer followed by 1 Fully connected layer.
 ```
 - How much tuning was done?
 ```
 Ask about any changes they have done in the choice of learning rate, momentum, loss function, optimizer, etc in order to tune the model and the reason for the same. 
 ```
 - Choice of libraries
 ```
 Typically Pytorch or Tensorflow.
 Our base solution is built on pytorch. 
 ```
 - How was noise treated?
 - Were any filters applied?
 - Was there any distortion?
 ```
 Subjective answer. In our base solution we didn't use any of the above method. 
 ```
 - Explain the basic data preprocessing/transformation performed.
 ```
 Multiple techniques are posiible, including cropping, data augmentation, normalization, resizing, etc.
 For the base solution we have used: 1) Resizing, 2) Center Crop, 3) Converting to tensor, 4) Normalization
 ```
 
 - Explain your solution pipeline.
 ```
 Base pipeline: 
 Training of two seperate model having same architecture (3 Conv and 1 FC) on drink and food dataset separately to individually detect food and drink item from the image.
 
 Once this is done for formimg association:
 Pass the test image to first the drink model to detect the drink item inside the image, then pass the same test image to food model to detect the food item from the image. This will give us one food-drink pair for the image. 
 
 Similarly do this for each of the 30 test image to get 30 food-drink pairs, and based on frequency of the each pair draw your insights. 
 
 Sample Result:
 
 ```
 ![image.png](attachment:image.png)
 
 ```
 Clearly from the results:
 beer goes well with fast-food
 soda goes well with steak
 and wine goes well with fast food
 ```
 ---

### Base solution links

##### Drink model:
[Jupyter Notebook](https://colab.research.google.com/drive/1Qhrt7e_NQfo71u7cq7OzCHzO0pTwWs0I?usp=sharing)

##### Food model
[Jupyter Notebook](https://colab.research.google.com/drive/18TJY2R9AJtKxjSkKk9iBEb_xntkBRixo?usp=sharing)

##### Final association script
[Jupyter Notebook](https://colab.research.google.com/drive/15kZtaKY7TOrRbp4rWsU97bohANtNvTbW?usp=sharing)