# This notebook is based in the book: 
# Deep Learning with Keras
## Implement neural netwroks with Keras on Thano and TensorFlow

### Authors:  Antonio Gulli, Sujit Pal

### The first example of Keras code
The initial building block of Keras is a model, and the simplest model is called sequential. A sequential Keras model is a linear pipeline (a stack) of neural networks layers. This code fragment defines a single layer with **12 artificial neurons**, and it expects **8 input variables (also known as features)**.In this example **random_uniform** is used to initialize the weiths of the layers with uniformly random small values.

In [6]:
from keras.models import Sequential
from keras.layers import Dense, Activation
model = Sequential()
#random_uniform Weights are initialized to uniformly random small values in (-0.05, 0.05).
#There are more ways to initialize weights: https://keras.io/initializers/
model.add(Dense(12, input_dim=8, kernel_initializer='random_uniform')) 


**Note:** The _net is dense_, meaning that each neuron in a layer is connected to all neurons
located in the previous layer and to all the neurons in the following layer.

**Sigmoid Function:** $\sigma(x)= \frac{1}{1+e^{-x}}$ -> used by neuron for computing the nonlinear function $\sigma(z=wx+b)
$. A neuron with sigmoid activation has a behavior similar to the perceptron, but the changes are gradual.

**Sigmoid** and **ReLU** are generally called **activation functions** in neural network jargon. They are the basic building blocks to developing a learning algorithm which adapts little by little, by progressively reducing the mistakes made by our nets.

Full list of activation function is available at https://keras.io/activations/.

**One-hot enconding- OHE**

In many applications, it is convenient to transform categorical (non-numerical) features into numerical
variables. For instance, the categorical feature digit with the value d in [0-9] can be encoded into a
binary vector with 10 positions, which always has 0 value, except the d-th position where a 1 is
present.

## A real example — recognizing handwritten digits

In this section, we will build a network that can recognize handwritten numbers. For achieving this
goal, we use MNIST (for more information, refer to http://yann.lecun.com/exdb/mnist/), a database of
handwritten digits made up of a training set of 60,000 examples and a test set of 10,000 examples.
The training examples are annotated by humans with the correct answer. For instance, if the
handwritten digit is the number three, then three is simply the label associated with that example.

In [7]:
from __future__ import print_function
import numpy as np
from keras.datasets import mnist                       #import dataset
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import SGD
from keras.utils import np_utils
np.random.seed(1671)                                   # for reproducibility

In [8]:
# network and training
NB_EPOCH = 200
BATCH_SIZE = 128
VERBOSE = 1
NB_CLASSES = 10                        # number of outputs = number of digits
OPTIMIZER = SGD()                      # SGD optimizer, explained later in this chapter
N_HIDDEN = 128
VALIDATION_SPLIT=0.2                   # how much TRAIN is reserved for VALIDATION: 0.8 is for training and 0.2 is for test

We need terminologies like **epochs, batch size, iterations** only when the data is too big which happens all the time in machine learning and we can’t pass all the data to the computer at once. So, to overcome this problem we need to divide the data into smaller sizes and give it to our computer one by one and update the weights of the neural networks at the end of every step to fit it to the data given.

**Epochs**: One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE.

As the number of **epochs increases**, more number of times the weight are changed in the neural network and the curve goes from underfitting to optimal to overfitting curve.

**Batch Size**: Total number of training examples present in a single batch. As I said, you can’t pass the entire dataset into the neural net at once. So, you divide dataset into Number of Batches or sets or parts. Just like you divide a big article into multiple sets/batches/parts like Introduction, Gradient descent, Epoch, Batch size and Iterations which makes it easy to read the entire article for the reader and understand it.

**Iterations**: It is the number of batches needed to complete one epoch. Let’s say we have 2000 training examples that we are going to use. We can divide the dataset of 2000 examples into batches of 500 then it will take 4 iterations to complete 1 epoch.