# Keras

## 1 Keras Steps

### 1.1 Theoritic Steps

1. Prepare datasets<br>
a. Parse data<br>
b. Index variables<br>
c. Split train and test<br>
2. Add layers<br>
a. Layer type and number<br>
b. Layer parameters<br>
3. Compile and fit the model<br>
a. Training data<br>
b. Optimizer<br>
c. Loss function and Metrics<br>
4. Evaluate and predict<br>
a. Test data<br>
b. Metrics<br>
c. Predict new data<br>

### 1.2 Steps with Code

<b>1. Sequential model (a linear stack of layers)</b><br>

In [None]:
from keras.models import Sequential
model = Sequential()

In [None]:
from keras.layers import Dense, Activation

model.add(Dense(units=64, imput_dim=100))
model.add(Activation('relu'))
model.add(Dense(units=10))
model.add(Activation('softmax'))

<b>2. Compile model</b><br>

In [None]:
model.compile(loss='categorical_crossentropy', opotimizer='sgd', metrics='accuracy')

Also we can further configure optimizer<br>

In [None]:
model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.SGD(lr=0.01, momentum=0.9))

<b>3. Training</b>

In [None]:
model.fit(x_train, y_train, epochs=5, batch_size=32)

We can feed data manually

In [None]:
model.train_on_batch(x_batch, y_batch)

<b>4. Evaluation</b>

In [None]:
loss_and_metrics = model.evaluate(x_test, y_test, batch_size=128)

<b>5. Prediction</b>

In [None]:
classes = model.predict(x_test, batch_size=128)

<b>Models :</b> Sequential(), LSTM()<br>
<b>Layers :</b> Dense, dropout, activation, convolutional layer, pooling<br>
<b>Compile :</b> Loss, ptimizer<br>
<b>fit :</b> train, test, X, Y, lr(learning rate), epochs, convergence criteria, batch size<br>

### 1.2.1 Losses

binary_crossentropy<br>
categorical_crossentropy<br>
mean_squared_error<br>
sparse_categorical_crossentropy<br>
mean_absolute_error<br>
mean_absolute_percentage_error<br>
mean_squared_logarithmic_error<br>
squared_hinge<br>
hinge<br>
categorical_hinge<br>
logcosh<br>

### 1.2.2 Optimizers

SGD<br>
RMSprop<br>
Nadam<br>
Adadelta<br>
Adam<br>

In [None]:
keras.optimizers.SGD(lr=0.01, momentum=0.0, decay=0.0, nesterov=False)
keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

### 1.2.3 Metrics

1. Accuracy<br>
a. binary_accuracy<br>
b. categorical_accuracy<br>
c. sparse_categorical_accuracy<br>
d. top_k_categorical_accuracy<br>
e. sparse_top_k_categorical_accuracy<br>
2. Precision<br>
3. Recall<br>
4. F1 Score<br>

In [None]:
from keras import metrics

model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy', 'f1score', 'precision', 'recall'])

In [None]:
# Custom metric
import keras.backend as K

def mean_pred(y_true, y_pred):
    return K.mean(y_pred)

model.compile(optimizer='rpsprop', loss='binary_crossentropy', metrics=['accuracy', mean_pred])

### 1.2.4 Model save, load or delete

In [None]:
from keras.models import load_model
model.save('my_model.h5')
del model

model = load_model('my_model.h5')

Note : We can save/load only a model's architecture or weights only

### 1.2.5 Model Visualization

In [None]:
from keras.utils import plot_model
plot_model(model, to_file='model.png')

### 1.2.6 Callbacks

* Allowing function call during training
* callbacks can be used at different points of training(batch or epochs)
* Existing callbacks : Early stopping, weight saving after epoch
* Easy to build and implement, called in training function, fit()

--------------------------------------------------------------------------------------------------------------------------------

# Keras Case Study(customer churm prediction) Notes

<b>1. Import all libraries</b>

<b>2. Load csv dataset</b>

<b>3. Use pandas profiling</b>

a. This will give detailed data audit report<br>
b. Dataset information<br>
b. Types of variables<br>
c. Warnings for zeros, distinct values and missing values<br>
d. Variablewise distribution<br>
e. Statistics, histogram, common vlaues and extreme values of each variable<br>
f. Different correlation metrics<br>

<b>4. Drop unwanted variables</b> 

a. Look for the important/helpful variables for predicting output<br>
b. Drop unwated variables<br>
dataset.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1, inplace=True)<br>

<b>5. Get dummies for Categorical variables</b>

a. Get dummies for the categorical variables :<br>
dataset_new = pd.get_dummies(dataset, ['Geography', 'Gender'], drop_first=True)<br>
b. Check dimesions of the dataset :<br>
dataset.shape<br>
dataset.columns<br>

<b>6. Separate X and Y variables</b>

In [None]:
X = dataset_new[dataset_new.columns.difference(['Exited'])]
y = dataset_new['Exited']

a. Cross check for X and Y variables<br>

<b>7. Split into Train and Test</b>

In [None]:
from sklearn.model_selection import train_test_split
train_x, test_x, train_y, test_y = train_test_split(X, y, test_size=0.3, random_state=123)

a. Check for dimensions after split for train_x, test_x, train_y, test_y<br>

<b>8. Scaling variables(part of feature engineering)</b>

a. Feature scaling helps in converging faster<br>

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
train_x = sc.fit_transform(x_train)
test_x = sc.fit_transform(x_test)

<b>9. Import keras modules and steps</b>

In [None]:
import keras.models as km
from keras.models import Sequential
from keras.layers import Dense, activations

In [None]:
# Network = [11, 6, 6, 1] 
model = Sequential()
model.add(Dense(output_dim = 6, init='uniform', activation='relu', input_dim=11)) # init - initialization of betas
model.add(Dense(output_dim = 6, init='uniform', activation='relu')) # input_dim taken from previous step
model add(Dense(output_dim = 1, init='uniform', activation='sigmoid')) # sigmoid/softmax

In [None]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
model.fit(train_x, train_y, batch_size = 10, nb_epoch = 10)
# batch_size 10 means if there are 700 observations 700/10=70 times betas will adjust
# nb_epoch, no of iterations

In [None]:
y_pred = model.predict(test_x)

In [None]:
import sklearn.metrics as metrics
metrics.roc_auc_score(test_y, y_pred)

In [None]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(test_y, y_pred)

In [None]:
print(metrics.classification_report(test_y, pred_y))

In [None]:
model.save('model.h5')

In [None]:
from keras.models import load_model
model = load_model('model.h5')

In [None]:
# If we get model from somewhere else, we can use this method to see architecture
model.summary()

--------------------------------------------------------------------------------------------------------------------------------

# Tensorflow

* Tensorflow is an open source sw libraryfor numerical computation using dataflow graphs.
* Nodes in the graph represent mathematical operations while graph edges represent the multidimensional data arrays(tensors) that flow between them
* Python based neural network framework.
* Runs highly optimized c++ code for actual calculations.
* Higher level APIs on top of tensorflow available, like skflow to fit within Scikit-learn API.   

<b>Fundamental tensorflow workflow</b>

When working with tensorflow, our code will be effectively be divided into two parts:<br>
    1. Define a graph which contains our model.
    2. Run the graph. Two special cases are:
        a. Train the model
        b. Test/Predict using the model

<b>Example :</b><br>
<b>Step 1 : Define a computational graph</b>

In [None]:
# tf.placegolder creates an 'input' node. We must give it value when we runour model.
# these can be data we want to learn from or values of hyper-parametersfor out model  
a = tf.placeholder(tf.int32, name='input_a')
b = tf.placeholder(tf.int32, name='input_b')

# tf.add creates an addition node
c = tf.add(a, b, name='add')

# tf.multiply creates a multiplication node
d = tf.multiply(a, b, name='multiply')

# tf.subtract creates a subtraction node
e = tf.subtract(c, d, name='subtract')

# add up results of previous nodes
out = tf.add(e, b, name='output')

In [None]:
out # it is null as our placeholder doesnot contain any values

<b>Step 2 : Run the graph</b>

In [None]:
# start the session
sess = tf.Session()

# Create a 'feed_dict' dictionary to define input values
# Keys to dictionaries are handles to our placeholders
# Values to dictionary are values we would like to feed in
feed_dict = {a : 4, b : 3}

# Execute the graph using 'Session.run()', which takes two parameters:
# - 'fetches' Lists which nodes we'd like to receiveas output
# - 'feed_dict', feeds in key-value pairs to input to various nodes
# In this case, we passin the Tensor 'out'as our valuefor'fetches',
# which causes the valuefor out to be computed and returned
result = sess.run(out, feed_dict=feed_dict)

# print the value of out
print("(({0}*{1}) - ({0}+{1}) + {1} = {2}".format(feed_dict[a], feed_dict[b], result))

# close the session
sess.close()

# Linear Regression using Tensorflow

In [None]:
import os
import time
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import pandas as pd

In [None]:
data = pd.read_csv('Reg_data.csv')
data.head()
# country | GDP | Life expectancy
# drop country

In [None]:
data1 = data.drop(['Country'], axis=1).as_matrix()

In [None]:
n_samples = len(data1)

In [None]:
data1.shape

In [None]:
X = tf.placeholder(tf.float32, name='X')
Y = tf.placeholder(tf.float32, name='Y')
w = tf.get_variable('weights', initializer=tf.constant(0.0))
b = tf.get_variable('bias', initializer=tf.constant(0.0))

In [None]:
X

In [None]:
Y

In [None]:
w

In [None]:
b

In [None]:
Y_predicted = w * X + b  # alternate to using tf.add()/tf.multiply()
loss = tf.square(Y - Y_predicted, name='loss')
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.0003).minimize(loss)

In [None]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    # train model
    for i in range(500):  # for every epoch
        total_loss = 0
        for x, y  in data1: # for every observation, we are calculating the graph
            _, mloss = sess.run([optimizer, loss], feed_dict={X:x, Y:y})
            total_loss += mloss
            
        print('Epoch {0}: {1}'.format(i, total_loss/n_samples))
    w_out, b_out = sess.run([w, b])

In [None]:
w_out

In [None]:
b_out

In [None]:
data1[:,1]*w_out+b_out   # predicted values

In [None]:
data1[:,0] # actual values