# Keras
## What is it?
Some of the famous frameworks to perform deep-learning experiments are theano, tensorflow and torch. These are quite inaccessible to the common crowd in terms of coding complexity. 
Each framework requires one to build the entire network inclusive of how and where each weight is calculated in the neural-network. Although this gives a lot of control over the neural net, debugging becomes an issue and prototyping a fast solution for an experimenter is very difficult.


Keras simplifies the issue by using theano/tensorflow in the backend while exposing an API with set of common functions, which is user friendly.


## Let's understand how keras works
We shall build a simple one-hidden layer neural network with the following inputs:

In [1]:
import numpy as np
#CREATE XOR DATA
X = np.asarray([[0,0],[0,1],[1,0],[1,1]])
y = np.asarray([0, 1, 1, 0])
print(X,'\n', y)
print(X.shape, y.shape)

[[0 0]
 [0 1]
 [1 0]
 [1 1]] 
 [0 1 1 0]
(4, 2) (4,)


### First we import a set of dependencies.
* Most of our neural nets are a sequence of layers, so we import an object which stores these layers and automatically knows how to forward and back propagate.
* A dense object is a fully connected layer

In [2]:
#LOAD REQUIRED MODULES
from keras.models import Sequential
from keras.layers import Dense

Using TensorFlow backend.


### A net is constructed in the following way
Can you draw a graph and see how it looks?

In [3]:
model = Sequential()
model.add(Dense(20, input_dim = 2, activation = 'sigmoid'))
model.add(Dense(1, activation = 'sigmoid'))
model.compile(loss = 'binary_crossentropy', optimizer='adam', metrics= ['accuracy'])
## In case the model has more than two categories (which is the case in MNIST Activity) one needs to use 
## model.compile(loss='categorical_crossentropy', ...)
## categorical_crossentropy is an extension of binary crossentropy for more categories
%time model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 20)                60        
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 21        
Total params: 81
Trainable params: 81
Non-trainable params: 0
_________________________________________________________________
Wall time: 0 ns


In [4]:
model.fit(X, y, epochs=5000, verbose=0)

<keras.callbacks.History at 0x28de9179048>

In [5]:
test_data = [[0,1], [1,1]]
predictions = model.predict_classes(test_data, verbose = False) #GIVES CLASSES
print(predictions)

prediction_probabilities = model.predict(X) #GIVES PROBABILITIES
print(X,'\n'*2,prediction_probabilities)

print(model.evaluate(X, y))

[[1]
 [0]]
[[0 0]
 [0 1]
 [1 0]
 [1 1]] 

 [[ 0.00618563]
 [ 0.99274921]
 [ 0.99490428]
 [ 0.00604742]]
[0.0061641419306397438, 1.0]


## Optional
There are many more things that can go under the hood.
* **kernel_initializer** is responsible for creating random weights using a specified distribution - ```RandomUniform```
* activation is different for the layers
* we have given our own optimizer - ```opt``` with our own learning rate
* we have used a **learning rate reducer** ```lrs``` which reduces learning rate from ```x``` to ```9*x/10``` if loss does not decrease by ```epsilon=0.001``` for 10 epochs straight.
* we have used an **early stopper** which stops training before 100 epochs if the loss doesn't decrease by 0.01 in 10 epochs.
* we have **plotted** the loss and accuracy of the model at every epoch using the ```history``` object


* using right combinations of hyperparameters we have reduce the number of epochs from 5000 to 100 and only 10 hidden neurons. Try other combinations to make the model more simple (for example, try to fit the model using less than 10 hiddens in the first layer)



In [None]:
from keras.optimizers import Adam
opt = Adam(lr=0.1)

from keras.callbacks import ReduceLROnPlateau, EarlyStopping
lrr = ReduceLROnPlateau(monitor='loss', min_lr=1e-3, factor=0.9, epsilon=0.001, patience=10, verbose=1)
early_stop = EarlyStopping(monitor='loss', min_delta=1e-2, patience=10, verbose=1)
model = Sequential()
model.add(Dense(10, input_dim = 2, kernel_initializer='RandomUniform', activation = 'relu'))
model.add(Dense(1, kernel_initializer='RandomUniform', activation = 'sigmoid'))
model.compile(loss='binary_crossentropy', 
              optimizer=opt, 
              metrics=['accuracy'])

history = model.fit(X, y, epochs=100, verbose=0, callbacks=[lrr, early_stop])
model.evaluate(X, y)

In [None]:
from matplotlib import pyplot as plt
%matplotlib inline

plt.plot(history.history['acc'], label='accuracy')
plt.plot(history.history['loss'], label='loss')
plt.legend()

There are a lot more initializers:
* Zeros
* Ones
* Constant
* RandomNormal
* RandomUniform
* TruncatedNormal
* VarianceScaling
* Orthogonal
* Identity
* lecun_uniform
* glorot_normal
* glorot_uniform
* he_normal
* he_uniform

There are a lot more loss functions:
* mean_squared_error
* mean_absolute_error
* mean_absolute_percentage_error
* mean_squared_logarithmic_error
* squared_hinge
* hinge
* logcosh
* categorical_crossentropy
* sparse_categorical_crossentropy
* binary_crossentropy
* kullback_leibler_divergence
* poisson
* cosine_proximity

There are a lot more optimizers:
* SGD
* RMSprop
* Adagrad
* Adadelta
* Adam
* Adamax
* Nadam
* TFOptimizer


http://cs231n.github.io/neural-networks-3/ is a great place to understand about the optimizers.