![Intel Logo](./images/logo.png)

# Lecture 02. ANN Overview

### Contents :
   1. Lecture 02 Overview <br>
      1.1. Importing libraries <br>
   2. Keras <br>
   3. The Iris Dataset <br>
   4. Preparing, Compiling and Training the Model <br>
   5. Validating and Testing the Model: Checking for Overfitting<br>
   6. Dealing with Overfitting: Dropout, Batch Normalization

# 1. Lecture 02 Overview

This lecture aims to explore preparing, compiling, training and improving the model using Keras.<br>
The intended learning outcomes are:<br>

*   Understanding and applying the Keras API
*   Understanding overfitting and methods to overcome it
## 1.1 Importing libraries

First, import the needed libraries.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf 
import pandas as pd
import seaborn as sns
from skimage import io

from keras.utils import np_utils
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Dropout, BatchNormalization
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

# 2. Keras

Keras is an Artificial Neural Network (ANN) library written in Python. Keras' intuitive API allows you to create various deep learning models with just a few lines of code. Using Keras, a model can be created easily by combining independent modules such as: neural network layer, loss function, activation function, and optimization algorithm modules, which are important concepts in deep learning.

Let's apply what we have learned so far using the Keras API and The Iris Dataset. In order to perform the previously learnt classification task, we will use the small but reliable Iris Dataset which is widely used in classification examples.

# 3. The Iris Dataset

Iris flowers are divided into several varieties according to the shape or length of the calyx or petal. In The Iris Dataset, Iris flowers are divided into three types according to the widths and lengths of the calyces and the petals.

<p align="center">
  <img src='./images/iris.png' width=700/>
</p>

After executing the cell, the figure displayed shows the classification of flowers according to each attribute (seed width/length, petal width/length).

In [None]:
## Loading the Iris Dataset
df = pd.read_csv('./iris.csv', names = ["sepal_length", "sepal_width", "petal_length", "petal_width", "species"])

## Checking the dataframe 
print(df.head(10))

## Iris Data visualization (identifying attribute-variety distribution)
sns.pairplot(df, hue='species');
plt.show()

# 4. Preparing, Compiling and Training the Model  

We now want to design a model that will properly distinguish flowers and evaluate its accuracy. Since it is a classification task of choosing one of the three varieties, we call this multi-classification and set the node of the output layer to 3.

The ratio of the training set and the testing set is set to 80:20.

In [2]:
## Converting the data into numeric values
dataset = df.values
X = dataset[:,0:4].astype(float)
Y = dataset[:,4]
char_to_num = LabelEncoder()
char_to_num.fit(Y)
Y = char_to_num.transform(Y)
Y = np_utils.to_categorical(Y)

## Splitting the dataset into training(80%) and testing(20%) sets
X_train, X_test, Y_train, Y_test = train_test_split(
  X,
  Y,
  test_size=0.20
)

## Building the model
model = Sequential([
    Dense(64, input_shape=(4,), activation="relu"),
    ##  QUIZ: Set the number of hidden layer nodes to 128 and use the relu activation function
    
    Dense(128, activation='relu'),
    Dense(64, activation='relu'),
    Dense(64, activation='relu'),
    ##  QUIZ: Set the number of output layer nodes to 3, use the softmax activation function and ____Fill in the blank____
    Dense(3, activation='softmax')
])

model.summary()

## Compiling the model- categorical crossentropy is an error function suitable for multi classification.
model.compile(loss='categorical_crossentropy',
            optimizer='adam',
            metrics=['accuracy'])

## Training the model
history = model.fit(X_train, Y_train, epochs=200, validation_split=0.25, batch_size=10, verbose=2)

## Print the output
# print("\n Accuracy: %.5f" % (model.evaluate(X_train, Y_train)[1]))

NameError: name 'df' is not defined

# 5. Validating and Testing the Model: Checking for Overfitting

Overfitting refers to the phenomenon in which the model shows high accuracy using the training data, does not perform well with new data. The overfitting problem appears when there are too many layers or variables, or when there are too many complexities. Refer to the figure below.

In [None]:
## Checking for overfitting
train_metrics = history.history['loss']
val_metrics = history.history['val_loss']
epochs = range(1, len(train_metrics) + 1)
plt.plot(epochs, train_metrics)
plt.plot(epochs, val_metrics)
plt.title('Training and validation loss')
plt.xlabel("Epochs")
plt.ylabel('loss')
plt.legend(['train_loss', 'val_loss'])
plt.show()

Referncing the figure above, it can be seen that the validation loss is higher than the training loss, and as the training loss decreases, the validation loss increases, which is a singal for overfitting.

# 6. Dealing with Overfitting: Dropout, Batch Normalization

We will apply Dropout and Batch normalization to overcome overfitting. Dropout is the act of dropping some of the nodes of the hidden layer. For example, Dropout(0.1) will drop 10% of the nodes in the respective layer. On the other hand, batch normalization refers to the activation value of the activation function being normalized and distributed appropriately. Although the mean and variance values of the data may be different, the gradient vanishing/exploding problem can be solved to some extent.

In [None]:
model = Sequential([
    Dense(64, input_shape=(4,), activation="relu"),
    ##  QUIZ: Drop 50% of the nodes in the respective layers

    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(64, activation='relu'),
    Dropout(0.5),
    Dense(64, activation='relu'),
    Dropout(0.5),
    Dense(3, activation='softmax')
])

model.summary()

## Compiling the model
model.compile(loss='categorical_crossentropy',
            optimizer='adam',
            metrics=['accuracy'])

## Training the model
history = model.fit(X_train, Y_train, epochs=200, validation_split=0.25, batch_size=10, verbose=2)

In [None]:
train_metrics = history.history['loss']
val_metrics = history.history['val_loss']
epochs = range(1, len(train_metrics) + 1)
plt.plot(epochs, train_metrics)
plt.plot(epochs, val_metrics)
plt.title('Training and validation loss')
plt.xlabel("Epochs")
plt.ylabel('loss')
plt.legend(['train_loss', 'val_loss'])
plt.show()

Compared with the figure in section 1.4, the training loss and the validation loss are similar and decreasing in both cases- indicating that overfitting can be solved to some extent through dropout.

Next, we want to overcome overfitting through Batch Normalization.

In [None]:
model = Sequential([
    Dense(64, input_shape=(4,), activation="relu"),
    BatchNormalization(),
    Dense(128, activation='relu'),
    BatchNormalization(),
    Dense(128, activation='relu'),
    BatchNormalization(),
    Dense(64, activation='relu'),
    BatchNormalization(),
    Dense(64, activation='relu'),
    BatchNormalization(),
    Dense(3, activation='softmax')
]);

model.summary()

## Compiling the model
model.compile(loss='categorical_crossentropy',
            optimizer='adam',
            metrics=['accuracy'])

## Training the model
history = model.fit(X_train, Y_train, epochs=200, validation_split=0.25, batch_size=10, verbose=2)

In [None]:
train_metrics = history.history['loss']
val_metrics = history.history['val_loss']
epochs = range(1, len(train_metrics) + 1)
plt.plot(epochs, train_metrics)
plt.plot(epochs, val_metrics)
plt.title('Training and validation loss')
plt.xlabel("Epochs")
plt.ylabel('loss')
plt.legend(['train_loss', 'val_loss'])
plt.show()