<a href="https://colab.research.google.com/github/poiuyytee/amulya/blob/main/Neural_nets_with_Keras_notes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Week 5.1 - Neural_nets with keras


# Training Neural Networks with Keras

<br />

## Tools in the Deep Learning Ecosystem

<img src="img/deep_learning_ecosystem.jpeg" />

<br />

* Hardware GPU vs CPU: At the core hardware level, we have CPUs and GPUs executing instructions.

    <br />

    * A CPU is able to execute large, sequential instructions but can only execute a small number of instructions parallelly
    
    <br />
    
    * A GPU can execute hundreds of small instructions parallelly
    
<img src = 'img/cpu_vs_gpu.png' />

* For deep learning where we have to do a bunch of linear algebraic computations parallelly, GPUs are exponentially faster than CPUs

<br />

* Frameworks such as BLAS and CUDA help routine computations to be optimised for the specific processor instruction set for accelerated compute

    <br />
    
    * Basic Linear Algebra Subprograms are a bunch of routines that specify the low level routines for many linear algebraic operations. R, numpy, Matlab use BLAS to accelerate linear algebra operations. Ex: Intel's implementation of BLAS is known as Intel Math Kernel Library (MKL)
    
    <br />
    
    * CUDA is a framework created by NVIDIA that helps programmers write software to perform general purpose computing tasks on GPUs, most of the deep learning libraries use CUDA

<br />

* Libraries with support for Autodifferentiation to help in gradient computation for stacked layers were developed such as Theano, Tensorflow, CNTK and PyTorch.

* These libraries are operations level (dot product, etc.) and the code is low level and hard to write for quick prototyping of new networks

<br />

<img src='img/deep_learning_frameworks.png' width='550px'/>

<br />

* That is where Keras comes into picture, it is a high level library with the abstraction at the __layer__ level. It was built as an abstraction to Theano and later support was added to Tensorflow and CNTK

* So you can write code in Keras that can run on any of these three deep learning library __backends__

## The Building Blocks of Neural Networks

### Layers, Data and Learning Representations

<br />

* __Layers__: logically grouped operations in a neural network, the parameters for the operations in the layer learn to generate the best features to predict the target

<img src='img/nn_layers.jpeg' width='400px'/>


<br />

* The network needs to have __input data__ and corresponding __targets (y)__

<br />

* In traditional machine learning we see that changing the representation of the data (kernel trick, etc.) helps ease the process of learning from data

<br />

* __Activation function__ adds that non linearity and in combination with weights (parameters) of a layer, the network learns better representations of the data at each layer

<br />

### So basically, each layer takes input as data and spits out transformed data as output, simple as that. Now, let's dive into the details

<img src='img/learning_representations_mlp.jpg' />

<br />

* The goal of training neural networks is to find these perfect representation of data, which we get by "learning" the right weights

<img src='img/learning_weights.jpg' />

<br />

* The loss function, which defines the feedback signal used for learning helps guage __how different are the targets and the predicted targets__

<img src='img/loss_function.jpg' />

<br />

* The optimizer, based on the feedback signal from the loss function changes the parameters / weights of the network to help make the predictions as close to the target as possible (minimizing the loss function)

<br />

<img src='img/building_blocks_of_neural_networks.jpg' />

## The Keras Interface

<br />

* There are two major ways to define and run neural netwroks using the Keras API
    
    <br />
    
    * Sequential API
    
    <br />
    
    * Functional API

<br />

### Keras API

<img src="img/keras_interface.jpg" width='550px'/>

<br />

<img src='img/keras_sequential_api.jpg' />

<br />

### The Functional API : Chainaing the layers/ callable layers

* The functional api allows us to build complex graph networks, we can kee chaining the the layers as functions and finally the `Model(inputs, outputs)` class connects all the various inputs and outputs

<br />

<img src="img/functional_api_bimodal_network.png" width='450px'/>



## Layers in Keras

* There are different categories of layers in Keras, the most commonly used ones are "Dense Layers", "Convolution Layers", "Recurrent Layers", etc.

In [None]:
from keras.layers import Dense

* Let's break down the above layer that we just built using the Dense class from keras

<br />

* The OUTPUT from the above layer can be given by `sigmoid ( dot ( W, INPUT ) + b )`, in the first round of training the W (weights) are randomly "initialized"

<br />

* Every input to the Dense layer (also known as fully connected) is connected to every unit in the hidden layer, as shown in the figure below

<br />

<img src ='img/fc_dense_layers_keras.jpg' />

<br />

* We will see many more categories and types of __Layers__ in keras, the __"Dense"__ layer is just one such class

##  method 1: The Keras Sequential API : Stacking the layers together

<br />

* The sequential api allows us to __quickly stack layers__ and build networks


* The keras sequential api enables us to build common yet complex neural network architectures flexibly

<br />

* Objects of the Keras sequential class, can have multiple neural network layers stacked on top of one another

<br />

<img src='img/keras_sequential_api.jpg' />



* You can create a Keras sequential model by passing in a list of layers to the Sequential object

In [None]:
#example 1
from keras.models import Sequential
from keras.layers import Dense

model1 = Sequential([
    Dense(2, input_shape=(2,), activation = 'sigmoid'),
    Dense(1, activation = 'sigmoid')
])

In [None]:
model1.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 2)                 6         
                                                                 
 dense_1 (Dense)             (None, 1)                 3         
                                                                 
Total params: 9
Trainable params: 9
Non-trainable params: 0
_________________________________________________________________


<img src='img/Dense_prams1.jpeg' width='550px' />

* But soon this becomes a problem to add more layers in a single list, so the "add" method on the sequential class object can add layers sequentially to the neural network

In [None]:
#example 2
from keras.models import Sequential
from keras.layers import Dense

model2 = Sequential([
    Dense(3, input_shape=(3,), activation = 'sigmoid'),
    Dense(3, activation = 'sigmoid'),
    Dense(2, activation = 'sigmoid')
])

In [None]:
model2.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_5 (Dense)             (None, 3)                 12        
                                                                 
 dense_6 (Dense)             (None, 3)                 12        
                                                                 
 dense_7 (Dense)             (None, 2)                 8         
                                                                 
Total params: 32
Trainable params: 32
Non-trainable params: 0
_________________________________________________________________


<img src='img/Dense_prams2.jpeg' width='550px' />

## using add() method

In [None]:
#example 1
neural_network = Sequential()

neural_network.add(Dense(2, input_dim=2, activation = 'sigmoid',
                         kernel_initializer = 'zeros', bias_initializer = 'zeros'))


neural_network.add(Dense(1, activation = 'sigmoid', kernel_initializer = 'zeros',
                         bias_initializer = 'zeros'))

In [None]:
neural_network.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_8 (Dense)             (None, 2)                 6         
                                                                 
 dense_9 (Dense)             (None, 1)                 3         
                                                                 
Total params: 9
Trainable params: 9
Non-trainable params: 0
_________________________________________________________________


In [None]:
#example 2
neural_network = Sequential()

neural_network.add(Dense(3, input_dim=3, activation = 'sigmoid',name = 'layer_1',
                         kernel_initializer = 'zeros', bias_initializer = 'zeros'))

neural_network.add(Dense(3, activation = 'sigmoid',name = 'layer_2', kernel_initializer = 'zeros',
                         bias_initializer = 'zeros'))

neural_network.add(Dense(2, activation = 'sigmoid', name = 'layer_3',kernel_initializer = 'zeros',
                         bias_initializer = 'zeros'))

In [None]:
neural_network.summary()

Model: "sequential_9"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer_1 (Dense)             (None, 3)                 12        
                                                                 
 layer_2 (Dense)             (None, 3)                 12        
                                                                 
 layer_3 (Dense)             (None, 2)                 8         
                                                                 
Total params: 32
Trainable params: 32
Non-trainable params: 0
_________________________________________________________________


In [None]:
#example 3
nn = Sequential()

nn.add(Dense(1, input_dim=1, activation = 'sigmoid',name = 'layer_1',
                         kernel_initializer = 'zeros', bias_initializer = 'zeros'))

nn.add(Dense(1, activation = 'sigmoid', name = 'layer_2',kernel_initializer = 'zeros',
                         bias_initializer = 'zeros'))

In [None]:
nn.summary()

Model: "sequential_14"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer_1 (Dense)             (None, 1)                 2         
                                                                 
 layer_2 (Dense)             (None, 1)                 2         
                                                                 
Total params: 4
Trainable params: 4
Non-trainable params: 0
_________________________________________________________________


In [None]:
print("Weights and biases of the layers before training the model: \n")
for layer in nn.layers:
  print(layer.name)
  print("Weights")
  print("Shape: ",layer.get_weights()[0].shape,'\n',layer.get_weights()[0])
  print("Bias")
  print("Shape: ",layer.get_weights()[1].shape,'\n',layer.get_weights()[1],'\n')

Weights and biases of the layers before training the model: 

layer_1
Weights
Shape:  (1, 1) 
 [[0.]]
Bias
Shape:  (1,) 
 [0.] 

layer_2
Weights
Shape:  (1, 1) 
 [[0.]]
Bias
Shape:  (1,) 
 [0.] 



In [None]:
import numpy as np

In [None]:
weights = [np.random.rand(*w.shape) for w in nn.get_weights()]
weights

[array([[0.1744399]]),
 array([0.24087493]),
 array([[0.56080447]]),
 array([0.88717866])]

In [None]:
nn.set_weights(weights)

In [None]:
weights=[[[1]],[[1]],[[1]],[[1]]]

In [None]:
print("weights: ",nn.get_weights())

weights:  [array([[0.1744399]], dtype=float32), array([0.24087493], dtype=float32), array([[0.5608045]], dtype=float32), array([0.88717866], dtype=float32)]


In [None]:
print(nn.non_trainable_weights)

[]


In [None]:
print(nn.trainable_weights)

[<tf.Variable 'dense_23/kernel:0' shape=(3, 3) dtype=float32, numpy=
array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]], dtype=float32)>, <tf.Variable 'dense_23/bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>, <tf.Variable 'dense_24/kernel:0' shape=(3, 3) dtype=float32, numpy=
array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]], dtype=float32)>, <tf.Variable 'dense_24/bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>, <tf.Variable 'dense_25/kernel:0' shape=(3, 2) dtype=float32, numpy=
array([[0., 0.],
       [0., 0.],
       [0., 0.]], dtype=float32)>, <tf.Variable 'dense_25/bias:0' shape=(2,) dtype=float32, numpy=array([0., 0.], dtype=float32)>]


In [None]:
#example 4
nn = Sequential()

nn.add(Dense(1, input_dim=1, activation = 'sigmoid',name = 'layer_1',
                         kernel_initializer = 'zeros', bias_initializer = 'zeros'))

nn.add(Dense(1, activation = 'sigmoid', name = 'layer_2',kernel_initializer = 'zeros',
                         bias_initializer = 'zeros'))

In [None]:
nn.summary()

Model: "sequential_14"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer_1 (Dense)             (None, 1)                 2         
                                                                 
 layer_2 (Dense)             (None, 1)                 2         
                                                                 
Total params: 4
Trainable params: 4
Non-trainable params: 0
_________________________________________________________________


In [None]:
nn.trainable_weights

[<tf.Variable 'layer_1/kernel:0' shape=(1, 1) dtype=float32, numpy=array([[0.1744399]], dtype=float32)>,
 <tf.Variable 'layer_1/bias:0' shape=(1,) dtype=float32, numpy=array([0.24087493], dtype=float32)>,
 <tf.Variable 'layer_2/kernel:0' shape=(1, 1) dtype=float32, numpy=array([[0.5608045]], dtype=float32)>,
 <tf.Variable 'layer_2/bias:0' shape=(1,) dtype=float32, numpy=array([0.88717866], dtype=float32)>]

In [None]:
               ##############################################################################################

## Method 2: Building the same network using the Keras Functional API

<br />

* Essential keywords of the functional API: `Tensor, Callable Layers, Input, Model`

<br />

* The core data structure for the functional API is the `tensor`

<br />

* Once you have a tensor the others are just layers similar to the sequential API. But in the case of the functional API, the Layers become `callable` using round paranthesis just like we call a function, for example `function_name()`

* The output of calling any `Layer` object in `Keras` is a `tensor`.

We pass in a tensor to a layer, as follows

```
                                     output_tensor = Layer(layer_params)(tensor)

```

<br />

* To pass our initial tensor, or the input data to the first layer, we have to create an input tensor, which we can by importing the `Input` class from the `keras.layers` submodule, as follows

```
                                     from keras.layers import Input, Dense
                                        
                                     input_tensor = Input(shape = (num_independent_vars, ))
                                        
                                     first_layer_output = Dense(layer_params)(input_tensor)
                                        
                                     output = Dense(final_layer_params)(first_layer_output)

```

* Finally, we need to import the `Model` class from the `keras.models` submodule and mention the inputs and the outputs of the model, after which `Keras` very gracefully builds a model connecting the inputs to the outputs

```
                                     from keras.models import Model

                                     model = Model(inputs = input_tensor, outputs = output)

```

In [None]:
# Example 1
from keras.layers import Input, Dense
from keras.models import Model

# define input
input_tensor = Input(shape = (2,))

# define layers
output1 = Dense(2, activation = 'sigmoid')(input_tensor)
output2 = Dense(1, activation = 'sigmoid')(output1)

model_functional = Model(inputs = input_tensor, outputs = output2)

In [None]:
model_functional.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 2)]               0         
                                                                 
 dense_16 (Dense)            (None, 2)                 6         
                                                                 
 dense_17 (Dense)            (None, 1)                 3         
                                                                 
Total params: 9
Trainable params: 9
Non-trainable params: 0
_________________________________________________________________


In [None]:
# Example 2
from keras.layers import Input, Dense
from keras.models import Model

input_tensor = Input(shape = (3,))

output1 = Dense(3, activation = 'sigmoid')(input_tensor)
output2 = Dense(3, activation = 'sigmoid')(output1)
output3 = Dense(2, activation = 'sigmoid')(output2)

model_functional = Model(inputs = input_tensor, outputs = output3)

In [None]:
model_functional.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 3)]               0         
                                                                 
 dense_18 (Dense)            (None, 3)                 12        
                                                                 
 dense_19 (Dense)            (None, 3)                 12        
                                                                 
 dense_20 (Dense)            (None, 2)                 8         
                                                                 
Total params: 32
Trainable params: 32
Non-trainable params: 0
_________________________________________________________________


* The model summary looks the same, except for the inclusion of the Input Layer which creates the input tensor

## Defining, Compiling and Running a Neural Network in Keras

<br />

### The process of learning in a Neural Network

<br />

### Loss Score

<br />

* The loss score is feedback signal that says how far is the output of your network compared to the ground truth

<br />

<img src='img/loss_function.jpg' />

<br />

* Two such loss scores that we use quite frequently are :
    
    1) Binary Cross-Entropy: For Two-Class Classification problems
    
    2) Mean Squared Error: For Regression problems
    
<br />

* We have already come across mean squared error before, so let's dive deeper into Binary Cross Entropy

<br />

$$\begin{eqnarray}
  C = -\frac{1}{n} \sum_x \left[y \ln p + (1-y ) \ln (1-p) \right]
\end{eqnarray}$$

<br />

* In the above equation, p is the output of the network, n is the total number of samples in the training data, the sum is over all training inputs, x, and y is the corresponding desired output

<br />

* Below, we see the value of the cross entropy (sometimes referred to as the log loss) changing with the predicted probability, we can see that the value of the loss for a prediction above 0.5 significantly drops and this helps the network converge much faster than using a traditional mean squared error

<br />

<img src='img/binary_cross_entropy.png' width='300px'/>

## Optimizers

<br />

* An optimizer is an algorithm that uses the feedback signal from the loss function, to actually update the weights so that the output from the network gets closer to the ground truth. The first optimizer that we use is Stochastic Gradient Descent (SGD), we will slowly come across many more optimizers

<br />

* We can import Classes from the optimizers module of keras and customize the specific optimizer to our liking

<br />

<img src='img/building_blocks_of_neural_networks.jpg' />

* Please refer to this beautiful article written by Sebastian Ruder on the various iterations of the SGD optimizer http://ruder.io/optimizing-gradient-descent/

<br />

* Keras has the implementations of the following optimizers:  
    * SGD
    * RMSprop
    * Adagrad
    * Adadelta
    * Adam
    * Adamax
    * Nadam

In [None]:
from keras.optimizers import SGD

customized_optimizer = SGD(lr = 0.0001)

* We already know how learning rate can effect convergence, the graph below provides a decent intuition, hence having the flexibility to change the learning rate is very important

![](img/learning_rate.jpg)

## Compiling the neural network ( loss function + optimizer)

In [None]:
neural_network = Sequential()

neural_network.add(Dense(32, input_dim=784, activation = 'sigmoid',
                         kernel_initializer = 'RandomUniform', bias_initializer = 'zeros'))

neural_network.add(Dense(10, activation = 'sigmoid', kernel_initializer = 'zeros',
                         bias_initializer = 'RandomUniform'))

neural_network.add(Dense(1, activation = 'sigmoid', kernel_initializer = 'zeros',
                         bias_initializer = 'RandomUniform'))

neural_network.summary()

In [None]:
# neural_network.compile(loss = 'binary_crossentropy', optimizer = 'sgd', metrics = ['accuracy'])

In [None]:
from keras.optimizers import SGD

customized_optimizer = SGD(lr = 0.0001)

In [None]:
neural_network.compile(loss = 'binary_crossentropy', optimizer = customized_optimizer, metrics = ['accuracy'])

* As we can see from the compile step above we need to specify the loss function, optimization algorithm and we can also mention the metrics that we want to monitor while training the neural network

<br />

* Also, please __note__ that the optimizer argument can either be a string or a customizable object from the optimizers module in Keras

## So are we done? what about learning the weights?

<br />

* Training the network in Keras is also very simple, we call the `.fit()` method and pass in the arguments

<br />

* Some important terms for training neural networks are epochs, batch_size

### Download the MNIST dataset using the Keras helper function

In [None]:
from keras.datasets import mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [None]:
X_train.head(3)

### The dataset consist of images represented as

![](img/mnist_representation.png)

In [None]:
print(X_train.shape)

In [None]:
## Reshape the images from 28 * 28 to 784

X_train = X_train.reshape(-1, 28*28)

In [None]:
X_train[2349]

In [None]:
y_train[2349]

In [None]:
X_train.shape

In [None]:
print(X_test.shape)

In [None]:
## Reshape the images from 28 * 28 to 284

X_test = X_test.reshape(-1, 28*28)

In [None]:
X_test[0]

In [None]:
training_history = neural_network.fit(X_train, y_train, epochs=100, batch_size=1024, validation_split=0.2)

## Plotting the training process of the neural network

<br />

* Once, we write the code for plotting the loss and accuracy while training our network, we will functionalize the code to make it easy for us to very quickly visualize the training metrics of our network.

<br />

In [None]:
from matplotlib import pyplot as plt

plt.plot(training_history.history['accuracy'])
plt.plot(training_history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()

In [None]:
print(training_history.history.keys())

In [None]:
plt.plot(training_history.history['loss'])
plt.plot(training_history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()

In [None]:
# Function to plot metrics after training

def plot_training_metrics(history_object, metric = 'loss', val = True):

    plt.plot(history_object.history[metric])

    if val == True:
        plt.plot(history_object.history['val_' + metric])

    plt.title('model loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'val'], loc='upper left')
    plt.show()

In [None]:
plot_training_metrics(training_history)

In [None]:
plot_training_metrics(training_history, metric = 'accuracy')

* An __Epoch__ is when an ENTIRE dataset is passed forward and backward through the neural network only once.

<br />

* Since most of the times an epoch is too large to fit in memory, we divide the data into batches and compute the gradient on batches for each forward and backward pass

<br />

* __Batch size__ is the number of samples that are going to be propagated through the network.

<br />

<img src='img/learning_rate_epochs_rel.png' width='400px'/>

## Neural Network Architectures for Basic ML Tasks

<br />

<img src='img/nn_for_basic_ml_tasks.jpg' width='800px'/>

## Exercise & Experiment

<br />

* Use different activation functions, report difference in metrics while training

__Different experiments to run: 'sigmoid', 'tanh', 'relu'__

<br />

* Use different weight initialization functions, report difference in metrics while training

__Different experiments to run: 'zeros', 'RandomUniform', 'RandomNormal', 'glorot_normal', 'glorot_uniform', 'he_normal', 'lecun_normal', 'he_uniform'__

<br />

* Use different number of layers, number of nodes and report difference in metrics while training

__Different experiments to run: more nodes or more layers?__


<br />

* Use BatchNormalization, report difference in metrics while training, if you have to add batchnorm after a layer just do as follows

```

from keras.layers import BatchNormalization

model.add(Dense(layer_params))

model.add(BatchNormalization())

model.add(Dense(layer_params))

```

<br />

* Use different Dropout rates, report difference in metrics while training

```
from keras.layers import Dropout

model.add(Dense(layer_params))

model.add(Dropout(rate = 0.4))

model.add(Dense(layer_params))

```

<br />

* Use different learning rates and optimizers, report difference in metrics while training

<br />

# There are no secrets to success. It is the result of preparation, hard work, and learning from failure.