# Problem description

In the last assignment, we have developed fully connected neural networks to train image dataset. In this assignment, we will attempt to recognize ships in satellite photos in a different way.

As in any other domain: specific knowledge of the problem area will make you a better analyst.
For this assignment, we will ignore domain-specific information and just try to use a labelled training set (photo plus a binary indicator for whether a ship is present/absent in the photo), assuming that the labels are perfect.

n.b., it appears that a photo is labelled as having a ship present only if the ship is in the center of the photo.  Perhaps this prevents us from double-counting.


## Goal:

There are two notebook files in this assignment:
- **`Ships_in_satellite_images_P2.ipynb`**: First and only notebook you need to work on. Train your models and save them
- **`Model_test.ipynb`**: Used to test your results. After you complete the `Ships_in_satellite_images_P2.ipynb`, this notebook should be submitted

**Before you start working on this assignment, please check if your kernel is Python 3.7 (Right top of the page). If it is not Python 3.7, please go to `Kernel->Change kernel->Python 3.7` on the top**

In this `Ships_in_satellite_images_P2.ipynb` notebook, you will need to create CNN models in Keras to classify satellite photos.
- The features are images: 3 dimensional collection of pixels
  - 2 spatial dimensions
  - 1 dimension with 3 features for different parts of the color spectrum: Red, Green, Blue
- The labels are either 1 (ship is present) or 0 (ship is not present)

## Learning objectives
- Learn how to construct Neural Networks using Keras Sequential model
- Appreciate how layer choices impact number of weights

# Import modules

In [None]:
## Standard imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import sklearn

import os
import math

%matplotlib inline

## Import tensorflow
import tensorflow as tf
from tensorflow.keras.utils import plot_model

print("Running TensorFlow version ",tf.__version__)

# Parse tensorflow version
import re

version_match = re.match("([0-9]+)\.([0-9]+)", tf.__version__)
tf_major, tf_minor = int(version_match.group(1)) , int(version_match.group(2))
print("Version {v:d}, minor {m:d}".format(v=tf_major, m=tf_minor) )

# API for students

We have defined some utility routines in a file `nn_helper.py`. There is a class named `Helper` in it.  

This will simplify problem solving

More importantly: it adds structure to your submission so that it may be easily graded

`helper = nn_helper.Helper()`

- getData: Get a collection of labelled images, used as follows

  >`data, labels = helper.getData()`
- scaleData: scale your input data

  >`X, y = helper.scaleData(data, labels)`
- showData: Visualize labelled images, used as follows

  >`helper.showData(data, labels)`
- plot training results: Visualize training accuracy, loss and validation accuracy, loss

  >`helper.plotTrain(history, modelName)`, where history is the result of model training
- save model: save a model in `./models` directory

  >`helper.saveModel(model, modelName)`
- save history: save a model history in `./models` directory
  >`helper.saveHistory(history, modelName)`

In [None]:
## Load nn_helper module

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# Reload all modules imported with %aimport
%load_ext autoreload
%autoreload 1

import nn_helper
%aimport nn_helper

helper = nn_helper.Helper()

# Create the dataset

In [None]:
# Data directory
DATA_DIR = "./Data"
json_file =  "shipsnet.json"

# Get the data
data, labels = helper.getData(DATA_DIR, json_file)
n_samples, width, height, channel = data.shape
print("Date shape: ", data.shape)
print("Labels shape: ", labels.shape)
print("Label values: ", np.unique(labels))

In [None]:
# Shuffle the data
data, labels = sklearn.utils.shuffle(data, labels, random_state=42)

## Examine the image/label pairs
We have loaded and shuffled our dataset, now we will take a look at image/label pairs. You can also explore the dataset using your own way.

In [None]:
# Inspect some data (images)
num_each_label = 10

for lab in np.unique(labels):
    # Fetch images with different labels
    X_lab, y_lab = data[ labels == lab ], labels[ labels == lab]
    fig = helper.showData( X_lab[:num_each_label], [ str(label) for label in y_lab[:num_each_label] ])
    fig.suptitle("Label: "+  str(lab), fontsize=14)
    fig.show()
    print("\n\n")

# Make sure the features are in the range [0,1]  

Just like what we have done in the DL 1, we need to scale data first. Since the feature values in our image data are between 0 and 255, to make them between 0 and 1, we need to divide them by 255. In addition, we usually use one-hot encoding to deal with our lables.

In [None]:
# Scale the data
# Assign values for X, y
#  X: the array of features
#  y: the array of labels
# The length of X and y should be identical and equal to the length of data.
X, y = np.array([]), np.array([])
X, y = helper.scaleData(data, labels)

print('X shape: ', str(X.shape))
print('y.shape: ', str(y.shape))
print(y[0])

# Split data into training data and testing data
To train and evaluate a model, we need to split the original dataset into 2 parts, in-sample and out-of-sample. We train model based on in-sample dataset, then evaluate training result based on out-of-sample dataset.

**DO NOT** shuffle the data until after we have performed the split into train/test sets
- We want everyone to have the **identical** test set for grading
- Do not change this cell

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10, random_state=42)

# Save X_train, X_test, y_train, y_test for further testing
if not os.path.exists('./data'):
    os.mkdir('./data')
np.savez_compressed('./data/train_test_data.npz', X_train=X_train, X_test=X_test, y_train=y_train, y_test=y_test)

# Create a simple CNN model 

**Question:** Build a CNN model with:
- a single Convolutional Layer providing 32 features (you can set your own kernel size); full padding
- feeding a Classification layer  

Please name your Convolutional layer "CNN_1" and your Dense layer (head layer) "dense_head"

**Hints:**
- The input shape is the shape of image sample, which should be 3-dimensional. We don't need to flatten input data at first
- After building your Convolutional layer, you need to flatten it into 1-dimensional in order to make the inputs of `Dense` layer right
- Since there are 2 cases in labels, you can use a `sigmoid` function or `softmax` function to be your classifier
- You may want to use `Dropout` layer to prevent overfitting and accelerate your training process
- `MaxPooling2D` or `AveragePooling2D`layer is also useful to reduce number of parameters to learn, in addition, it can help to prevent overfitting

In [None]:
# Get the number of unique labels
num_cases = np.unique(labels).shape[0]

# Set model0 equal to a Keras Sequential model
model0 = None

### BEGIN SOLUTION
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Dropout
from tensorflow.keras.layers import Conv2D, MaxPooling2D

model0 = Sequential()
model0.add(Conv2D(32, (3, 3), padding="same", input_shape=X.shape[1:], activation='relu', name='CNN_1'))
model0.add(Flatten())
model0.add(Dropout(0.2))
model0.add(Dense(num_cases, activation='softmax', name='dense_head'))
### END SOLUTION

model0.summary()

In [None]:
# Plot your model
plot_model(model0)

## Train model

**Question:** Now that you have built your first model, next you will compile and train your model. The requirements are as follows:
- Split your dataset `X_train` into 0.8 training data and 0.2 validation data. Set the `random_state` to be 42. You can use `train_test_split()`
- Loss function: "categorical_crossentropy"; Metric: "accuracy"
- Training epochs is 10
- Save your training results in a variable named `history`
- Plot your training results using API `plotTrain()`


In [None]:
model_name0 = "CNN + Head"

### BEGIN SOLUTION
X_train_, X_val_, y_train_, y_val_ = train_test_split(X_train, y_train, test_size=0.2, random_state=42)
model0.compile(loss='categorical_crossentropy', metrics=['accuracy'])
history = model0.fit(X_train_, y_train_, epochs=10, validation_data=(X_val_, y_val_))
fig, axs = helper.plotTrain(history, model_name0)
### END SOLUTION

**Expected outputs (there may be some differences because we only have one Convolutional layer and the model structure may be a little different):**  
<table> 
    <tr> 
        <td>  
            Training accuracy
        </td>
        <td>
         0.9760
        </td>
    </tr>
    <tr> 
        <td>
            Validation accuracy
        </td>
        <td>
         0.9333
        </td>
    </tr>

</table>

The graphs of your loss and accuracy curves:
<img src='./images/CNN_model_loss_accuracy.png' style='width:600px;height:300px;'>

We can see that CNN model is very powerful even though we only have 1 Convolutional layer. It performs much better than the only-fully-connencted models. The training accuracy curve is increasing while the validation accuracy curve is increasing first but then begins to fluctuate around some level. This may tell us that our CNN model is learning at first but then stop learning.

## How many weights in the model ?

**Question:** Calculate the number of parameters in your model.  

**Hint:** You can use model's method `count_params()`

In [None]:
# Set num_parameters2 equal to the number of weights in the model
num_parameters0 = None

### BEGIN SOLUTION
num_parameters0 = model0.count_params()
### END SOLUTION

print("Parameters number in model0: ", num_parameters0)

## Evaluate the model

**Question:** We have trained our model, then what we need to do next is to evaluate the model using test dataset. Please store the model score in a variable named `score0`.   

**Hint:** The method we should use is `evaluate()`. 

In [None]:
score0 = []

### BEGIN SOLUTION
score0 = model0.evaluate(X_test, y_test, verbose=0)
### END SOLUTION

print("{n:s}: Test loss: {l:3.2f} / Test accuracy: {a:3.2f}".format(n=model_name0, l=score0[0], a=score0[1]))

Your test accuracy should be around 0.9

## Save the trained model0 and history for submission

In [None]:
helper.saveModel(model0, model_name0)
helper.saveModelNonPortable(model0, model_name0)
helper.saveHistory(history, model_name0)

In [None]:
## Restore the model (make sure that it works)
model_loaded = helper.loadModel(model_name0)
score_loaded = model_loaded.evaluate(X_test, y_test, verbose=0)

assert score_loaded[0] == score0[0] and score_loaded[1] == score0[1]

# Create a model with 4 Convolutional layers

**Question:** At this time, we will add more Convolutional layers to the original model0. You model should have 
- **4** Convolutional layers. First two Convolutional layers should have 32 features (you can set your own kernel size). Please name these 2 Convolutional layers "CNN_1" and "CNN_2"; full padding
- Last two Convolutional layers should have 64 features (you can set your own kernel size). Please name these 2 Convolutional layers "CNN_3" and "CNN_4"; full padding
- ReLU activation functions follow your Convolutional layers
- a MaxPooling layer behind every two Convolutional layers (behind CNN_2 and CNN_4) to reduce each spatial dimension by a factor of 2

Please also name your head layer "dense_head". 

**Hints:**
- Remember to flatten your outputs of Convolutional layers before feeding them to dense layers
- You may want to use `Dropout` layer to prevent overfitting and accelerate your training process

In [None]:
# Set model3 equal to a Keras Sequential model
model1 = None

### BEGIN SOLUTION
model1 = Sequential()
model1.add(Conv2D(32, (3, 3), padding="same", input_shape=(80, 80, 3), activation='relu', name='CNN_1'))
model1.add(Conv2D(32, (3, 3), padding="same", activation='relu', name='CNN_2'))
model1.add(MaxPooling2D(pool_size=(2, 2)))
model1.add(Dropout(0.2))

model1.add(Conv2D(64, (3, 3), padding="same", activation='relu', name='CNN_3'))
model1.add(Conv2D(64, (3, 3), padding="same", activation='relu', name='CNN_4'))
model1.add(MaxPooling2D(pool_size=(2, 2)))
model1.add(Dropout(0.2))

model1.add(Flatten())
model1.add(Dense(num_cases, activation='softmax', name='dense_head'))
### END SOLUTION

model1.summary()

In [None]:
# Plot your model
plot_model(model1)

## Train model

**Question:** Now that you have built your new model1, next you will compile and train your model1. The requirements are as follows:
- Split your dataset `X_train` into 0.8 training data and 0.2 validation data. Set the `random_state` to be 42. You can use `train_test_split()`
- Loss function: cross entropy; Metric: accuracy
- Training epochs is 10
- Save your training results in a variable named `history1`
- Plot your training results using API `plotTrain()`


In [None]:
# Train the model using the API
model_name1 = "4CNNs + Head"

### BEGIN SOLUTION
model1.compile(loss='categorical_crossentropy', metrics=['accuracy'])
history1 = model1.fit(X_train_, y_train_, epochs=10, validation_data=(X_val_, y_val_))
fig, axs = helper.plotTrain(history1, model_name1)
### END SOLUTION

**Expected outputs (there may be some differences):**  
<table> 
    <tr> 
        <td>  
            Training accuracy
        </td>
        <td>
         0.9903
        </td>
    </tr>
    <tr> 
        <td>
            Validation accuracy
        </td>
        <td>
         0.9750
        </td>
    </tr>

</table>

The graphs of your loss and accuracy curves:
<img src='./images/4CNNs_model_loss_accuracy.png' style='width:600px;height:300px;'>

We can see that new model performs better than previous model. The training accuracy and validation accuracy are both higher than before. What's more, our validation accuracy curve has a trend to increase, which means our new model is learning!

## How many weights in this model ?

**Question:** Calculate the number of parameters in your new model.  

In [None]:
# Set num_parameters3 equal to the number of weights in the model
num_parameters1 = None

### BEGIN SOLUTION
num_parameters1 = model1.count_params()
### END SOLUTION

print('Parameters number in model1:', num_parameters1)

## Evaluate the model

**Question:** We have trained our new model, then what we need to do next is to evaluate the new model using test dataset. Please store the model score in a variable named `score1`.   

In [None]:
score1 = []

### BEGIN SOLUTION
score1 = model1.evaluate(X_test, y_test, verbose=0)
### END SOLUTION

print("{n:s}: Test loss: {l:3.2f} / Test accuracy: {a:3.2f}".format(n=model_name1, l=score1[0], a=score1[1]))

Your test accuracy should be higher than before

# Save your trained model1 and history1

In [None]:
helper.saveModel(model1, model_name1)
helper.saveModelNonPortable(model1, model_name1)
helper.saveHistory(history1, model_name1)

## Your own model (Optional)
Now you can build your own model using what you have learned from the course. The things you can try are:
- Add `Dropout()` layer and change the parameter 
- Add `BatchNormalization()` layer
- Add pooling layer, `MaxPooling2D` or `AveragePooling2D`
- Change the activation function
- Change the kernel size in Convolutional layers
- Change the number of features of Convolutional layers
- ...

Try to see how your model will change!

## Now Submit your assignment!
Please click on the blue button <span style="color: blue;"> **Submit** </span> in this notebook. 