Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [1]:
NAME = ""
COLLABORATORS = ""

---

# PA5

## Intro to Convolutional Neural Nets with Keras

Contest on Cat/Dog classification. 

Due F, 7/30/2021, 5pm.

# Contest

Besides the ~20k train/test images that we load below, there are ~5,000 more unseen/hidden cat/dog images in a private folder. 

Train a CNN model of your own design, tune it, (cross-validate it if you want!) until you are satisfied with its performance.

I will run your saved models on this competition dataset and let you know how well your model fares.

For this part: You must **create your own model** such as the VGG example or a smaller one of your own design, like you did in 'Lab Classification with Keras'! 


### Important: Your  model has to be named `comp_model` 

``` python
comp_model = myNN(X_train)
```

If your model name is incorrect, my call will fail and so may you! 

## What can I use?
- Your code can use feature engineering, normalization or other common tricks
- You can use Dense, Conv2D or similar layers in Keras 
- You can use dropout, normalization, etc.
- You can (and should!) plot your history and observe your model's behaviour. 
- You can choose your favorite 'optimizer' or 'loss' functions.
- Your 'metrics' HAS to be:  `metrics = ['accuracy','mae']`

## Comments
- **Results**: Validation alone is not enough for this assignment as some parts are manually graded. For me to able to view your results, you need to ensure that your notebook runs. One way to test is to click: `Kernel`->`Restart & Run All` and verify that all cells execute without errors. 
- **Runtime**: Each notebook has a runtime limit. Your submitted notebook should execute in under a min. Thus, if you are using iterative algos, similar to scikit's GridSearch, comment them out before submitting your notebook. For instance, you can hard code the best coefs returned by such algorithms and comment out the search to save execution time.
- **Randomness**: if you are using an algo that depends on random state, make sure it is reproduceable. For instance, set the random seed so that your model behaves similary when I execute it.  You could also [save your model and load it](https://www.tensorflow.org/guide/keras/save_and_serialize) to get around this. 
- Above also means that you can comment out your training code to save execution time and avoid the randomness issue. MAke sure to load your model as 'comp_model'

## TL;DR

- Name your trained Keras model as `comp_model` 
- Your 'metrics' HAS to be:  `metrics = ['accuracy','mae']`


In [None]:
# Load the train/test data:
import numpy as np
path = '/home/memo/public/eaix/'

X_train = np.load(path+'X_train.npy')
X_test = np.load(path+'X_test.npy')
y_train = np.load(path+'y_train.npy')
y_test = np.load(path+'y_test.npy')
print(X_train.shape, X_test.shape) #should be (14967, 50, 50, 1) (4990, 50, 50, 1)

(14967, 50, 50, 1) (4990, 50, 50, 1)


In [None]:
from keras import models
from keras import layers

from keras.callbacks import ReduceLROnPlateau
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Flatten, MaxPooling2D, Dropout
from keras.utils import to_categorical
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

Using TensorFlow backend.


In [None]:
def myNN(train_data):
#Initiate model:    
    model = models.Sequential()
    
    model.add(layers.Conv2D(64, (3,3), activation='relu', input_shape=((50, 50, 1)), name='in'))
    model.add(layers.Conv2D(128, (3,3), activation='relu', name='hidden1'))
    model.add(layers.Conv2D(256, (3,3), activation='relu', name='hidden2'))
    model.add(MaxPooling2D((2, 2)))
    
    model.add(layers.Conv2D(64, (3,3), activation='relu', name='hidden3'))
    model.add(layers.Conv2D(128, (3,3), activation='relu', name='hidden4'))
    model.add(layers.Conv2D(256, (3,3), activation='relu', name='hidden5'))
    model.add(MaxPooling2D((2, 2)))
    
    model.add(layers.Conv2D(64, (3,3), activation='relu', name='hidden6'))
    model.add(layers.Conv2D(128, (3,3), activation='relu', name='hidden7'))
    model.add(layers.Conv2D(256, (3,3), activation='relu', name='hidden8'))
    model.add(MaxPooling2D((2, 2)))
    
    model.add(Flatten())
    
    model.add(Dropout(0.2))
    
    model.add(Dense(256, activation='relu'))
    
    model.add(Dense(1, activation='sigmoid'))
    
    
    model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics = ['accuracy','mae'], )
    return model
    

In [None]:
comp_model = myNN(X_train)

comp_model.summary()

ne = 24

reduce_lr = ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=1, verbose=1)

history = comp_model.fit(X_train, y_train, validation_split=0.3, epochs = ne, batch_size = 100, verbose = 1, callbacks=[reduce_lr])

Model: "sequential_12"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
in (Conv2D)                  (None, 48, 48, 64)        640       
_________________________________________________________________
hidden1 (Conv2D)             (None, 46, 46, 128)       73856     
_________________________________________________________________
hidden2 (Conv2D)             (None, 44, 44, 256)       295168    
_________________________________________________________________
max_pooling2d_34 (MaxPooling (None, 22, 22, 256)       0         
_________________________________________________________________
hidden3 (Conv2D)             (None, 20, 20, 64)        147520    
_________________________________________________________________
hidden4 (Conv2D)             (None, 18, 18, 128)       73856     
_________________________________________________________________
hidden5 (Conv2D)             (None, 16, 16, 256)     

In [None]:
# Plot history: Sample code
#ne is number of epochs to plot. Update it! 
#ne = 10


# History object is a dictionary with keys. 
hd = history.history

loss_tr = hd['accuracy']
loss_va = hd['val_accuracy']
epochs = range(0, ne) #ne is number of epochs. Set it! 

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

plt.plot(epochs, loss_tr, '-.o', label='Training Acc')
plt.plot(epochs, loss_va, 'r', label='Validation Acc')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

## Wrap it up

Once you are happy with your model, you can comment out your training code (model.fit() ) and SAVE your model: 


How to save/load your model: https://www.tensorflow.org/guide/keras/save_and_serialize

Pseudocode to save/load and test a Keras NN name 'mymodel': 
``` python
model.save('mymodel') #save your model
comp_model = load_model('mymodel') #load it as desired name
#Comment out: model.fit, history and model.save() if load works.
#Evaluate loaded models performance on test data:
nnmse, nnacc, nnmae = comp_model.evaluate(X_test, y_test, verbose = 1)
print('*** Test *** ')
print('NN Test MAE: ', nnmae)
print('NN Test ACC: ', nnacc)
```

In [None]:
#model.save('comp_model') #save your model
#comp_model = load_model('comp_model ') #load it as desired name
#Comment out: model.fit, history and model.save() if load works.
#Evaluate loaded models performance on test data:
nnmse, nnacc, nnmae = comp_model.evaluate(X_test, y_test, verbose = 1)
print('*** Test *** ')
print('NN Test MAE: ', nnmae)
print('NN Test ACC: ', nnacc)

## Performance on hidden data

You shouldn't be able to run the following cells as these are hidden files. But, I'll run them to evaluate your models performance on this data. 

Your model must be saved as `comp_model` for this to work.

In [None]:
#read hidden data
#Note this data is NOT visible to you. 
X_val =  np.load('/home/memo/private/X_val.npy')
y_val =  np.load('/home/memo/private/y_val.npy')


In [None]:
#Competition data. You can't run this.
nnloss, nnacc, nnmae = comp_model.evaluate(X_val, y_val, verbose = 1)
#final score on hidden dataset:
print("Competition accuracy is %.2f" % (nnacc*100))