# Report

In [2]:
from dataset import *

## 1 Data Preprocessing

In [3]:
!python3 dataset.py

loading train...
loading val...
loading test...
Train:	 300
Val:	 150
Test:	 100
Mean: [124.495 118.847  95.293]
Std:  [62.754 59.597 62.425]


### (a)
i). <br />
Mean: [124.495 118.847  95.293]<br />
Std:  [62.754 59.597 62.425]<br /><br />
ii). <br />
Because we are training a model that is representative of the training dataset. This means that the model should be fit based on properties of the training data. We are simply applying the model to validation and testing data; so if we use their properties, it may cause overfitting.

### (b)
![](Data_Preprocessing.png)

## 2 Convolutional Neural Network

### (a)
There are 32*2 = 64 parameters to be learned; they are the weights of the fully connected layer.<br />

### (f)
i). <br />
One reason is that our the model is overfitting the training data so it won't perform well on the validation dataset. Another possible reason is that the the training, validation, and testing dataset aren't distributed similarly. So what the model learns from the training data can't be used to predict the validation data well. <br /><br />

ii). <br />
The model stopped training at epoch 11. With a patience of 10, the model would have to wait until epoch 16 to stop. Based on the training graphs, patience = 5 is a better choice because the graphs are all generally identical. However, patience = 5 takes fewer epoch to reach the performance. A higher patience might be better for training data that has a local minimum; if the patience is high, we won't stop at the local minimum and could potentially get a better model.
![](cnn_training_plot_5.png)
![](cnn_training_plot_10.png)

iii). <br />
The new size is 64x2x2 = 256. <br />

|                	| Epoch 	| Training AUROC 	| Validation AUROC 	|
|---------------	|-------	|----------------	|------------------	|
| 8 filters     	| 15    	| 0.9991         	| 0.942            	|
| 64 filters    	| 4     	| 0.9947         	| 0.9271           	|

The performance actually decreases as we increase the number of filters. This could be because that the architecture we set up makes the model overfits; we are having more neurons then we need to correctly classify an image. It could also be the case that the added inputs introduce unnecessary noise to the fully connected layer, which can distrub the overall accuracy of the model. <br />

### (g)
i). <br />
|          	| Training 	| Validation 	| Testing 	|
|----------	|----------	|------------	|---------	|
| Accuracy 	| 0.9933   	| 0.88       	| 0.62    	|
| AUROC    	| 0.9991   	| 0.942      	| 0.6996  	|

ii). <br />
No, the validation performance is relatively close to training performance. <br />

iii). <br />
We see that there the testing perforamnce is much wrose than the validation performance in both the accuracy and AUROC score. This could be because we are not splitting our data properly and the validation data has a lot more label belonging to golden retriever while the testing data has a lot more labels of Collies. Therefore, the model will perform poorly  on the testing data.

## 3 Visualizing what the CNN has learned

### (a) <br /> 
$$\begin{equation} 
L^1 = \begin{bmatrix} 3/16 & 0 & 3/8 & 5/8 \\ 0 & 0 & 3/16 & 0 \\ 0 & 3/16 & 0 & 1/4 \\ 7/16 & 5/8 & 1/2 & 1/2 \end{bmatrix}
\end{equation}$$
### (b) <br /> 
The CNN appears to be using green pixels to identify between Golden Retrievers and Collies 
### (C) <br />
Yes, it confirms the hypothesis. It seems like the model relies heavily on green pixels to identify between Golden Retrievers and Collies. However, we as humans know that this is not the distinctive feature between the two types of dogs. This shows that the model has a strong bias toward green color, and it further implies that the green is over-represented in the training data and our model is biased and cannpt perform as well on the testing data.


## 4 Transfer Learning & Data Augmentation

### 4.1 Transfer Learning

#### (c)
![](source_training_plot.png)
Epoch 10 has the lowest validation loss with a value of 1.8664. <br />

#### (d)
![](conf_matrix.png) <br />
The classifier is the most accurate when predicting for Samoyed and it is the least accurate when predicting for Syberian Husky. This might be because we the training data we have is mislabeled and most Syberian Husky use Samoyed's label. This means that the classifier will treat most Syberian Husky as Samoyed and classify them correctly. On the other hand, the classifier doesn't have enough information to classify Syberian Husky because the train dataset doesn't contain many samples of it. This will result in very low accurate for Syberian Husky. <br />

#### (f)
|                                                                  	|        	|  AUROC 	|        	|
|:----------------------------------------------------------------:	|:------:	|:------:	|:------:	|
|                                                                  	|  TRAIN 	|   VAL  	|  TEST  	|
|            Freeze all CONV layers (Fine_tune FC layer)           	| 0.9015 	| 0.8983 	| 0.8272 	|
| Freeze first two CONV layers (Fine-tune last CONV and FC layers) 	| 0.9756 	| 0.9033 	| 0.8072 	|
|   Freeze first CONV layer (Fine-tune last 2 conv, and fc layer)  	| 0.9925 	| 0.9327 	| 0.7784 	|
|              Freeze no layers (Fine-tune all layers)             	| 0.9905 	| 0.9262 	| 0.7576 	|
|    No pretraining or Transfer Learning (Section 2 performance)   	| 0.9991 	|  0.942 	| 0.6996 	|

Transfer learning helps significantly and the source task we used is very helpful because we witness a huge imporvement in testing performance. Freezing all layers results in a lot more epoches than when we freeze a subset of the layers. This is because without convolutional layers, we are not filtering our data. So our fully connected layer will receive many many more inputs. This means that it will take a lot longer to train the classifier  and many epoches are taken before we reach a good performance. <br />

### 4.2 Data Augmentation

|                                         	|        	|  AUROC 	|        	|
|:---------------------------------------:	|:------:	|:------:	|:------:	|
|                                         	|  TRAIN 	|   VAL  	|  TEST  	|
|         Rotation (Keep original)        	|        	|        	|        	|
|        Grayscale (keep original)        	| 0.9835 	| 0.9332 	|  0.694 	|
|       Grayscale (discard original)      	| 0.3163 	| 0.3369 	|  0.364 	|
| No augmentation (section 2 performance) 	| 0.9991 	|  0.942 	| 0.6996 	|