# The Art of Training Neural Networks with Keras

### Where Neural Networks and Deep Learning Fit

* Deep Learning and neural networks are a subfield of Machine Learning and Artificial Intelligence in general.

![](img/deep_learning.jpg)

* In Machine Learning, we see that rules are learned from data, and in deep learning the same holds.

![](img/classical_vs_ml.jpg)

## Learning representations from data

* In traditional machine learning we see that changing the representation of the data helps ease the process of learning from data

![](img/learning_representations.jpg)

# Training Neural Networks in Keras

* We will walk through the fundamentals of both neural networks and keras together

## Layers as a fundamental building block of neural networks

* Layers act as data filters or distillers, distilling and generating important information from the input to get to the output

![](img/learning_representations_mlp.jpg)

* If you guys remember the MNIST dataset, the four layers above have "learnt" the following ways to represent the data

![](img/learning_representations_mlp_02.jpg)

### So basically, each layer takes input as data and spits out data as output, simple as that. Now, let's dive into the details

### The goal of training neural networks is to find these perfect representation of data, which we get by "learning" the right weights

![](img/learning_weights.jpg)

* Before we go ahead and understand how to learn the perfect weights, we have to understand what a layer does

### Layers in Keras

* There are different categories of layers in Keras, the most commonly used ones are "Core Layers", "Convolution Layers", "Recurrent Layers", etc.

In [7]:
from keras.layers.core import Dense

In [8]:
layer1 = Dense(units = 32, activation = 'sigmoid')

### Let's break down the above layer that we just built using the Dense class from keras' core layers

### The OUTPUT from the above layer can be given by sigmoid ( dot ( W, INPUT ) + b ), in the first round of training the W (weights) are randomly "initialized"

### Every input to the Dense layer (also known as fully connected) is connected to every unit in the hidden layer, as shown in the figure below

![](img/fc_dense_layers_keras.jpg)

### We will see many more categories and types of layer objects in keras, the "Dense" core layer is just one such class

## Putting the layers together : The Keras Sequential API

* The keras sequential api enables us to build common yet complex neural network architectures flexibly

* Objects of the Keras sequential class, can have multiple neural network layers stacked on top of one another


![](img/keras_sequential_api.jpg)

## You can create a Keras sequential model by passing in a list of layers to the Sequential object

In [9]:
from keras.models import Sequential
from keras.layers import Dense

model = Sequential([
    Dense(32, input_shape=(784,), activation = 'sigmoid'),
    Dense(10, activation = 'sigmoid'),
    Dense(1, activation = 'sigmoid')
])


## But soon this becomes a problem to add more layers, so the "add" method on the sequential class object can add layers sequentially to the neural network

In [10]:
neural_network = Sequential()

neural_network.add(Dense(32, input_dim=784, activation = 'sigmoid'))

neural_network.add(Dense(10, activation = 'sigmoid'))

neural_network.add(Dense(1, activation = 'sigmoid'))

* The summary method on the neural network object gives us basic information about the structure of the network

In [12]:
neural_network.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_6 (Dense)              (None, 32)                25120     
_________________________________________________________________
dense_7 (Dense)              (None, 10)                330       
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 11        
Total params: 25,461
Trainable params: 25,461
Non-trainable params: 0
_________________________________________________________________


### That's all! Keras is that simple, initialize an object of the Sequential class, use the add method on that object to add layers to the network sequentially, but wait what about the loss function? what about the optimizer?

# Defining, Compiling and Running a Neural Network in Keras

## The process of learning in a Neural Network

## Loss Score

* The loss score is feedback signal that says how far is the output of your network compared to the ground truth

![](img/loss_function.jpg)

* Two such loss scores that we use quite frequently are :
    
    1) Binary Cross-Entropy: For Two-Class Classification problems
    
    2) Mean Squared Error: For Regression problems

* We have already come across mean squared error before, so let's dive deeper into Binary Cross Entropy

$$\begin{eqnarray} 
  C = -\frac{1}{n} \sum_x \left[y \ln p + (1-y ) \ln (1-p) \right]
\end{eqnarray}$$

### In the above equation, p is the output of the network, n is the total number of items of training data, the sum is over all training inputs, x, and y is the corresponding desired output

* Below, we see the value of the cross entropy (sometimes referred to as the log loss) changing with the predicted probability

![](img/binary_cross_entropy.png)

## Optimizers

* An optimizer is an algorithm that uses this feedback signal, to actually update the weights so that the output from the network gets closer to the ground truth. The first optimizer that we use is Stochastic Gradient Descent (SGD), we will slowly come across many more optimizers

![](img/optimizer.jpg)

### We can import Classes from the optimizers module of keras and customize the specific optimizer to our liking

In [16]:
from keras.optimizers import SGD

customized_optimizer = SGD(lr = 0.0001)

* We already know how learning rate can effect convergence, the graph below provides a decent intuition, hence haing the flexibility to change the learning rate is very important

![](img/learning_rate.jpg)

## Compiling the neural network ( loss function + optimizer)

In [17]:
neural_network.compile(loss = 'binary_crossentropy', optimizer = 'sgd', metrics = ['accuracy'])

In [18]:
neural_network.compile(loss = 'binary_crossentropy', optimizer = customized_optimizer, metrics = ['accuracy'])

* As we can see from the compile step above we need to specify the loss function, optimization algorithm and we can also mention the metrics that we want to monitor while training the neural network

* Also, please __note__ that the optimizer argument can either be a string or a customizable object from the optimizers module in Keras

### So are we done? what about learning the weights? 

* Training the network in Keras is also very simple, we call the "fit" method and pass in the arguments

* Some important terms for training neural networks are epochs, batch_size

* An __Epoch__ is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE.

* Since most of the times an epoch is too large to fit in memory, we divide the data into batches and compute the gradient on batches for each forward and backward pass

* __Batch size__ is the number of samples that are going to be propagated through the network.

![](img/learning_rate_epochs_rel.png)

## Understanding Tensors, the oil for the deep learning engine

![](img/4d_array.jpg)