# Problem description

In the last assignment, we created fully connected neural networks to
solve the task of classifying whether a ship is present in a satellite photo.

This assignment will address the same task, but using Convolutional Neural Network layers.


## Goal: 

In this notebook, you will need to create a model in `TensorFlow/Keras` to classify satellite photos.
- The features are images: 3 dimensional collection of pixels
  - 2 spatial dimensions
  - 1 dimension with 3 features for different parts of the color spectrum: Red, Green, Blue
- The labels are either 1 (ship is present) or 0 (ship is not present)

There are two notebook files in this assignment:
- The one you are viewing now: First and only notebook you need to work on. 
    - Train your models here
    - There are cells that will save your models to a file
- **`Model_test.ipynb`**:
    - This notebook will retrieve the saved model and test your results.

In this `Ships_in_satellite_images_P2.ipynb` notebook, you will need to create CNN models in Keras to classify satellite photos.
- The features are images: 3 dimensional collection of pixels
  - 2 spatial dimensions
  - 1 dimension with 3 features for different parts of the color spectrum: Red, Green, Blue
- The labels are either 1 (ship is present) or 0 (ship is not present)

## Learning objectives
- Learn how to construct Neural Networks in a Keras Sequential model that uses Convolutional layer types.
- Appreciate how layer choices impact number of weights

# Import modules

In [None]:
## Standard imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import sklearn

import os
import math

%matplotlib inline

## Import tensorflow
import tensorflow as tf
from tensorflow.keras.utils import plot_model

print("Running TensorFlow version ",tf.__version__)

# Parse tensorflow version
import re

version_match = re.match("([0-9]+)\.([0-9]+)", tf.__version__)
tf_major, tf_minor = int(version_match.group(1)) , int(version_match.group(2))
print("Version {v:d}, minor {m:d}".format(v=tf_major, m=tf_minor) )

# API for students

We have defined some utility routines in a file `helper.py`. There is a class named `Helper` in it.  

This will simplify problem solving

More importantly: it adds structure to your submission so that it may be easily graded

`helper = helper.Helper()`

- getData: Get a collection of labelled images, used as follows

  >`data, labels = helper.getData()`
- scaleData: scale your input data

  >`X, y = helper.scaleData(data, labels)`
- showData: Visualize labelled images, used as follows

  >`helper.showData(data, labels)`
- plot training results: Visualize training accuracy, loss and validation accuracy, loss

  >`helper.plotTrain(history, modelName)`, where history is the result of model training
- save model: save a model in `./models` directory

  >`helper.saveModel(model, modelName)`
- save history: save a model history in `./models` directory
  >`helper.saveHistory(history, modelName)`

In [None]:
## Load helper module

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# Reload all modules imported with %aimport
%reload_ext autoreload
%autoreload 1

import helper
%aimport helper

helper = helper.Helper()

# Create the dataset

In [None]:
# Get the data
data, labels = helper.getData()
n_samples, width, height, channel = data.shape
print("Date shape: ", data.shape)
print("Labels shape: ", labels.shape)
print("Label values: ", np.unique(labels))

We will shuffle the examples before doing anything else.

This is usually a good idea
- Many datasets are naturally arranged in a *non-random* order, e.g., examples with the sample label grouped together
- You want to make sure that, when you split the examples into training and test examples, each split has a similar distribution of examples

In [None]:
# Shuffle the data
data, labels = sklearn.utils.shuffle(data, labels, random_state=42)

## Have a look at the data

We will not go through all steps in the Recipe, nor in depth.

But here's a peek

In [None]:
# Visualize the data samples
helper.showData(data[:25], labels[:25])

## Have  look at the data: Examine the image/label pairs

Rather than viewing the examples in random order, let's group them by label.

Perhaps we will learn something about the characteristics of images that contain ships.

We have loaded and shuffled our dataset, now we will take a look at image/label pairs. 

Feel free to explore the data using your own ideas and techniques.

In [None]:
# Inspect some data (images)
num_each_label = 10

for lab in np.unique(labels):
    # Fetch images with different labels
    X_lab, y_lab = data[ labels == lab ], labels[ labels == lab]
    fig = helper.showData( X_lab[:num_each_label], [ str(label) for label in y_lab[:num_each_label] ])
    fig.suptitle("Label: "+  str(lab), fontsize=14)
    fig.show()
    print("\n\n")

# Make sure the features are in the range [0,1]  

Just as in our prior assignment: we need to
- Scale the image data so that
pixel values are in the range between 0 and 255
- Here we don't use one-hot encoding

Hopefully you have done this on your own in the prior assignment.
We will do it for you below.

In [None]:
# Scale the data
# Assign values for X, y
#  X: the array of features
#  y: the array of labels
# The length of X and y should be identical and equal to the length of data.
X, y = np.array([]), np.array([])
one_hot = False
X, y = helper.scaleData(data, labels, one_hot)

print('X shape: ', str(X.shape))
print('y.shape: ', str(y.shape))
print(y[0])

# Split data into training data and testing data

To train and evaluate a model, we need to split the original dataset into
a training subset (in-sample) and a test subset (out of sample).

We will do this for you in the cell below.


In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10, random_state=42)

# Save X_train, X_test, y_train, y_test for further testing
if not os.path.exists('./data'):
    os.mkdir('./data')
np.savez_compressed('./data/train_test_data.npz', X_train=X_train, X_test=X_test, y_train=y_train, y_test=y_test)

# Create a simple CNN model 

**Question:** 

Create a Keras Sequential model:
- Set variable `model0` to be a Keras `Sequential` model object that implements your model.
- With a single Convolutional Layer providing 32 features
    - You may choose your own kernel size
    - Use full padding
    - Name your Convolutional layer "CNN_1"
- Feeding a head layer implementing Classification  
    - Name your Dense layer (head layer) "dense_head"

**Hints:**
- The `input shape` argument of the first layer should be the shape of a single example, which should be 3-dimensional. We don't need to flatten the data before feeding the Convolutional layer.
- Activation function for the head layer: Since this is a classification problem
    - Use  `sigmoid` if your target's final dimension equals 1
    - Use  `softmax` if your target's final dimension is greater than 1
- What is the shape of the output of the Convolutional Layer ? What should be the shape of the input to the Classification head ?
    - You may want to flatten the output of the Convolutional layer before feeding the Classification head.

In [None]:
# Get the number of unique labels
num_cases = np.unique(y).shape[0]

# Set the activation function
if y.ndim ==2 and num_cases >= 2:
    activation = 'softmax'
else:
    activation = 'sigmoid'
    num_cases = 1

# Set model0 equal to a Keras Sequential model
model0 = None

### BEGIN SOLUTION
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Dropout
from tensorflow.keras.layers import Conv2D, MaxPooling2D

model0 = Sequential()
model0.add(Conv2D(32, (3, 3), padding="same", input_shape=X.shape[1:], activation='relu', name='CNN_1'))
model0.add(Flatten())
model0.add(Dropout(0.2))
model0.add(Dense(num_cases, activation=activation, name='dense_head'))
### END SOLUTION

model0.summary()

In [None]:
# Plot your model
plot_model(model0)

## Train model


**Question:** Now that you have built your first model, you will compile and train it. The requirements are as follows:
- Name your model "CNN + Head" and store it in variable `model_name0`
- Split the **training** examples `X_train, y_train` again !
    - 80% will be used for training the model
    - 20% will be used as validation (out of sample) examples
    - Use `train_test_split()` from `sklearn` to perform this split
        -  Set the `random_state` parameter of `train_test_split()` to be 42

- Loss function: 
    - `binary_crossentropy` if your target is one-dimensional
    - `categorical_crossentropy`if your target is One Hot Encoded
- Metric: "accuracy"
- Use exactly 10  epochs for training
- Save your training results in a variable named `history`
- Plot your training results using the`plotTrain` method described in the Student API above.



In [None]:
# Set the name of model0
model_name0 = "CNN + Head"

### BEGIN SOLUTION
X_train_, X_val_, y_train_, y_val_ = train_test_split(X_train, y_train, test_size=0.2, random_state=42)
model0.compile(loss='binary_crossentropy', metrics=['accuracy'])
history = model0.fit(X_train_, y_train_, epochs=10, validation_data=(X_val_, y_val_))
fig, axs = helper.plotTrain(history, model_name0)
### END SOLUTION

The graphs of your loss and accuracy curves:
<img src='./images/CNN_model_loss_accuracy.png' style='width:600px;height:300px;'>

We can see that, compared to the prior assignment's model that used *only* a Classification head,
adding a CNN layer really seems to help to reduce loss and increase accuracy.

## How many weights in the model ?

**Question:** 

Calculate the number of parameters in your model
- Set `num_parameters0` equal to the number of weights in the model

**Hint:** You can use model's `count_params()` method.

In [None]:
# Set num_parameters2 equal to the number of weights in the model
num_parameters0 = None

### BEGIN SOLUTION
num_parameters0 = model0.count_params()
### END SOLUTION

print("Parameters number in model0: ", num_parameters0)

## Evaluate the model


**Question:**

We have trained our model. We now need to  evaluate the model using the test dataset created in an earlier cell.

Please store the model score in a variable named `score0`.   

**Hint:** The model object has a method  `evaluate`.  Use that to compute the score.

In [None]:
score0 = []

### BEGIN SOLUTION
score0 = model0.evaluate(X_test, y_test, verbose=0)
### END SOLUTION

print("{n:s}: Test loss: {l:3.2f} / Test accuracy: {a:3.2f}".format(n=model_name0, l=score0[0], a=score0[1]))

## Save the trained model0 and history for submission

In [None]:
helper.saveModel(model0, model_name0)
helper.saveHistory(history, model_name0)

In [None]:
## Restore the model (make sure that it works)
model_loaded = helper.loadModel(model_name0)
score_loaded = model_loaded.evaluate(X_test, y_test, verbose=0)

assert score_loaded[0] == score0[0] and score_loaded[1] == score0[1]

# Create a model with 4 Convolutional layers

**Question:** 

We will now create a model with more Convolutional layers.
- Set variable `model1` to be a Keras `Sequential` model object that implements your model.
- Use **4** Convolutional layers.
    - You may choose your own kernel size
    - Use full padding
    - ReLU activation functions for the Convolutional layers

    - The first two Convolutional layers should have 32 features each.
        - Please name thee layers "CNN_1" and "CNN_2"
    - The last two Convolutional layers should have 64 features each.
        - Please name thee layers "CNN_3" and "CNN_4"
- Insert a `MaxPooling` layer after every two Convolutional layers (e.g., after CNN_2 and CNN_4)
    - to reduce each spatial dimension by a factor of 2  

- Please name your head layer "dense_head". 

**Hints:**
- Don't forget to flatten the output of the layer feeding the Classification head 
- A Dropout layer maybe helpful to prevent overfitting and accelerate your training process. 


In [None]:
# Set model1 equal to a Keras Sequential model
model1 = None

### BEGIN SOLUTION
model1 = Sequential()
model1.add(Conv2D(32, (3, 3), padding="same", input_shape=(80, 80, 3), activation='relu', name='CNN_1'))
model1.add(Conv2D(32, (3, 3), padding="same", activation='relu', name='CNN_2'))
model1.add(MaxPooling2D(pool_size=(2, 2)))
model1.add(Dropout(0.2))

model1.add(Conv2D(64, (3, 3), padding="same", activation='relu', name='CNN_3'))
model1.add(Conv2D(64, (3, 3), padding="same", activation='relu', name='CNN_4'))
model1.add(MaxPooling2D(pool_size=(2, 2)))
model1.add(Dropout(0.2))

model1.add(Flatten())
model1.add(Dense(num_cases, activation=activation, name='dense_head'))
### END SOLUTION

model1.summary()

In [None]:
# Plot your model
plot_model(model1)

## Train model
**Question:**

Train your new model following the same instructions as given for training the first model. **Except**
- Save your training results in a variable named `history1`
- Name your new model "4CNNs + Head" and store it in varibale `model_name1`


In [None]:
# Train the model using the API
model_name1 = "4CNNs + Head"

### BEGIN SOLUTION
model1.compile(loss='binary_crossentropy', metrics=['accuracy'])
history1 = model1.fit(X_train_, y_train_, epochs=10, validation_data=(X_val_, y_val_))
fig, axs = helper.plotTrain(history1, model_name1)
### END SOLUTION

The graphs of your loss and accuracy curves:
<img src='./images/4CNNs_model_loss_accuracy.png' style='width:600px;height:300px;'>

Hopefully, your new model will have improved Loss and Accuracy metrics compared to your first model.

## How many weights in this model ?

**Question:** 

Calculate the number of parameters in your new model.  
- Set `num_parameters1` equal to the number of weights in the model

In [None]:
# Set num_parameters3 equal to the number of weights in the model
num_parameters1 = None

### BEGIN SOLUTION
num_parameters1 = model1.count_params()
### END SOLUTION

print('Parameters number in model1:', num_parameters1)

## Evaluate the model

Evaluate your new model following the same instructions as given for evaluating the first model.
- **Except**: store the model score in a variable named `score1`.  

In [None]:
score1 = []

### BEGIN SOLUTION
score1 = model1.evaluate(X_test, y_test, verbose=0)
### END SOLUTION

print("{n:s}: Test loss: {l:3.2f} / Test accuracy: {a:3.2f}".format(n=model_name1, l=score1[0], a=score1[1]))

Your test accuracy should be higher than before

# Save your trained model1 and history1

In [None]:
helper.saveModel(model1, model_name1)
helper.saveHistory(history1, model_name1)

## Your own model (Optional)

Now you can build your own model using what you have learned from the course. Some ideas to try:
- Change the kernel size in Convolutional layers
- Change the number of features of Convolutional layers
- Experiment with different pooling layers: `MaxPooling2D` and `AveragePooling2D`
- Change the activation function

Observe the effect of each change on the Loss and Accuracy.


## Now Submit your assignment!
Please click on the blue button <span style="color: blue;"> **Submit** </span> in this notebook. 