# Keras
Keras (since Tensorflow 2.0) is an integrated high-level API for developing Neural Networks using the tensorflow backend. This means that with Keras you can define neural networks that are trained and run against tensorflow and can even leverage GPU support. To implement these networks Keras implements a Functional and Sequential modeling design

-----

## Sequential
The sequential design follow a lot of the paradigms we have seen in the GWU NN library. Essentially the output of one layer is fed into the following layer, and the model can be finalized through the **compile** method.

```python
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

model = Sequential()
model.add(Dense(x, input_dim))
model.add(Dense(1))
model.compile(loss='loss_function')
```

----

## Functional
The functional design is more flexible in the sequential, as multiple workflows (inputs and outputs) can be defined from the same model, as well as having diverging and converging pathways. These models still need to be compiled, but need to be defined beforehand. Given time we may dive deeper into these types of models.

```python
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Input

inputs_a = Input(input_a_dim)
ha_1 = Dense(xa)(inputs_a)

inputs_b = Input(input_b_dim)
hb_1 = Dense(xb)(inputs_b)

con = concatenate(inputs=[h_a1, h_b1])
hcon_1 = Dense(xh)(con)

output = Dense(1, activation="activation")(hcon_1)

model = Model(inputs=[input_a_dim, input_b_dim], output = output)
model.compile(loss='loss_function')
```

## Our First Keras Model
For our first keras model we'll reimplement one of our earlier models using the Keras API.

In [4]:
import os
import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn import preprocessing
from sklearn.model_selection import train_test_split

y_col = 'Survived'
x_cols = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']
df = pd.read_csv('titanic_data.csv')
y = np.array(df[y_col]).reshape(-1, 1)
orig_X = df[x_cols]

# Lets standardize our features
scaler = preprocessing.StandardScaler()
stand_X = scaler.fit_transform(orig_X)
X = stand_X

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)


## Defining our Model
To create our sequential model we'll need to import a couple of things from Keras:
 - keras.models.Sequential - Keras's import for creating Sequential models
 - keras.layers.Dense - This is how we'll create a fully connected layer in Keras

In [53]:
import tensorflow as tf
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

seq_run_num = 1
func_run_num = 1

For our actual model, we'll more or less replicate the logic we had previoulsy used within the GWU NN library. The only real difference is that our **loss** function will be *binary_crossentropy* instead of log_loss and we'll need to define an **optimizer**. Optimizers are algorithms for enhancing the way that *Gradient Descent* works.

In [6]:
def get_model(input_dim, run=seq_run_num):
    """Defines a binary classification model for our titanic dataset.
    Args:
        input_dim (tuple): Size of the input data
        
    Returns:
        Keras.Sequential: A Keras model for performing binary_classification
    """
    model = Sequential()
    model.add(Dense(28, input_dim=input_dim, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    dtime = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
    logdir = os.path.join("logs", f"seq_run#{run}-{dtime}")
    run+=1
    tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
    
    model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])
    return model, tensorboard_callback

With our model defined, we simply need to fit the model to our training data. To do this we need to define the following:
 - X - Our inputs
 - y - Our outputs
 - epochs - How many times we'll loop through our training dataset
 - batch_size - How many records are used during one batch/gradient descent

We can also define a verbosity or **verbose** parameter that defines what kind of output the training will produce:
 - 0 = Shows nothing
 - 1 = (Default) Shows how each epoch progresses during training
 - 2 = Truncated version of 1
 - 3 = Shows how many epochs have been completed

In [None]:
model, tensorboard_callback = get_model(X_train.shape[1])

history = model.fit(X_train, y_train, epochs=30, batch_size=25, validation_data=(X_test, y_test), callbacks=[tensorboard_callback])

Using the complied loss function and metrics we can also **evaluate(X, y)** our model.

In [8]:
model.evaluate(X_test, y_test)



[0.4889432191848755, 0.7881355881690979]

## In Class
Implement the same model, but use the functional API and use the **adam** optimizer. Compare the results of this implementation to that of the sequential model using the graph of the training history.

In [56]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Input
def func_model(run=seq_run_num):
    inputs = Input(X_train.shape[1])
    layer_1 = Dense(28,activation='relu')(inputs)


    output = Dense(1, activation="sigmoid")(layer_1)

    model = Model(inputs, output)
    model.compile(loss='loss_function')
    
    dtime = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
    logdir = os.path.join("logs", f"seq_run#{run}-{dtime}")
    run+=1
    tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
    
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model, tensorboard_callback

In [58]:
m, callback = func_model()
history = m.fit(X_train, y_train, epochs=30, batch_size=25, validation_data=(X_test, y_test), callbacks=[callback])

2021-10-26 21:16:38.560119: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2021-10-26 21:16:38.560138: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2021-10-26 21:16:38.560165: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.


Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
 1/20 [>.............................] - ETA: 0s - loss: 0.6058 - accuracy: 0.8400

2021-10-26 21:16:38.786100: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2021-10-26 21:16:38.786117: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2021-10-26 21:16:38.795844: I tensorflow/core/profiler/lib/profiler_session.cc:71] Profiler session collecting data.
2021-10-26 21:16:38.796989: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
2021-10-26 21:16:38.798085: I tensorflow/core/profiler/rpc/client/save_profile.cc:137] Creating directory: logs/seq_run#1-20211026-211638/train/plugins/profile/2021_10_26_21_16_38
2021-10-26 21:16:38.798575: I tensorflow/core/profiler/rpc/client/save_profile.cc:143] Dumped gzipped tool data for trace.json.gz to logs/seq_run#1-20211026-211638/train/plugins/profile/2021_10_26_21_16_38/Marshalls-MacBook-Pro.local.trace.json.gz
2021-10-26 21:16:38.800478: I tensorflow/core/profiler/rpc/client/save_profile.cc:137] Creating directory: logs/seq_ru

Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [None]:
m.fit(X_train, y_train, epochs=30, batch_size=25, validation_data=(X_test, y_test))

## TensorFlow Model
We'll notice that this isn't too far off from the keras implementation. However, it's worth reviewing as we'll be continuing forward relying more heavily on TensorFlow directly rather than Keras.

Material sourced from - [aymericdamien github](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/tensorflow_v2/notebooks/3_NeuralNetworks/neural_network_raw.ipynb)

In [39]:
# MNIST dataset parameters.
num_classes = 1 # total classes (0-9 digits).
num_features = 7 # data features (img shape: 28*28).

# Training parameters.
learning_rate = 0.0001
training_steps = 3000
batch_size = 25
display_step = 100

# Network parameters.
n_hidden_1 = 28 # 1st layer number of neurons.

### Training Data
When using a manual model we may need further tweaking of our data to conform to our needs. Below we're transforming the data into the proper typing, but we're also setting up a `tf.data.Dataset`.

The `train_data.repeat().shuffle(500).batch(batch_size).prefetch(1)` line defines a data generator that is constantly repeating a shuffled representation of the data and batching it for the training/testing cycles.

In [40]:
tf_X_train = np.array(X_train, dtype='float32')
train_data = tf.data.Dataset.from_tensor_slices((tf_X_train, y_train))
train_data = train_data.repeat().shuffle(500).batch(batch_size).prefetch(1)

## Manually Defining Structure
If we aren't leveraging the Keras APIs, we'll be required to more explicitly list out the different weight structures and layers that we intend to use.

In [41]:
# Store layers weight & bias

# A random value generator to initialize weights.
random_normal = tf.initializers.RandomNormal()

weights = {
    'h1': tf.Variable(random_normal([num_features, n_hidden_1])),
    'out': tf.Variable(random_normal([n_hidden_1, num_classes])),
    #'out': tf.Variable(random_normal([n_hidden_2, num_classes]))
}
biases = {
    'b1': tf.Variable(tf.zeros([n_hidden_1])),
    'out': tf.Variable(tf.zeros([num_classes])),
    #'out': tf.Variable(tf.zeros([num_classes]))
}

### Model Flexibility

However, with this comes a lot of flexibility in the layer by layer operations.

Below we can see a more manual definition of the typical structure we've been creating through numpy alone. The benefit to implementing our networks through this approach vs tools like numpy are:
 - It provides easy access to an existing library of helper functions
   - Notice that I can hook into `tf.nn.relu` and `tf.keras.losses.BinaryCrossentropy`
 - Tensorflow is able to hook directly into CPU and GPU-based acceleration libraries like CBLAS and CUDA
 - Automated calculations of gradients
   - Notice the use of `tf.GradientTape()`

In [46]:
def neural_net(x):
    # Hidden fully connected layer with 128 neurons.
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    # Apply sigmoid to layer_1 output for non-linearity.
    layer_1 = tf.nn.relu(layer_1)

#     # Hidden fully connected layer with 256 neurons.
#     layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
#     # Apply sigmoid to layer_2 output for non-linearity.
#     layer_2 = tf.nn.relu(layer_2)

    # Output fully connected layer with a neuron for each class.
    out_layer = tf.matmul(layer_1, weights['out']) + biases['out']
    # Apply softmax to normalize the logits to a probability distribution.
    return tf.nn.sigmoid(out_layer)

In [None]:
# Create model.
def neural_net(x):
    # Hidden fully connected layer with 128 neurons.
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    # Apply sigmoid to layer_1 output for non-linearity.
    layer_1 = tf.nn.relu(layer_1)
    
#     # Hidden fully connected layer with 256 neurons.
#     layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
#     # Apply sigmoid to layer_2 output for non-linearity.
#     layer_2 = tf.nn.relu(layer_2)
    
    # Output fully connected layer with a neuron for each class.
    out_layer = tf.matmul(layer_1, weights['out']) + biases['out']
    # Apply softmax to normalize the logits to a probability distribution.
    return tf.nn.sigmoid(out_layer)

### Custom Loss/Metric Functions

Further extending what was done to create our network, we'll need to do some similar things when defining a purely custom model. Here we create the **cross_entropy** and **accuracy** functions that take our inputs/outputs to define our loss functions and scoring metrics similar to what we've used in the past.

In [42]:
# Cross-Entropy loss function.
def cross_entropy(y_pred, y_true):
    # Clip prediction values to avoid log(0) error.
    
#     # Compute cross-entropy.
#     y_pred = tf.cast(y_pred, tf.float32)
#     y_true = tf.cast(y_true, tf.float32)
#     #return tf.reduce_mean(-tf.math.log(y_pred)*y_true + -tf.math.log(1-y_pred)*(1-y_true))
#     return tf.compat.v1.losses.sigmoid_cross_entropy(y_true, y_pred)
    y_true_tf = tf.cast(tf.reshape(y_true, (-1, 1)), dtype=tf.float32)
    logits_tf = tf.cast(tf.reshape(y_pred, (-1, 1)), dtype=tf.float32)
    bce = tf.keras.losses.BinaryCrossentropy(from_logits=True)
    return bce(y_true_tf, logits_tf)

# Accuracy metric.
def accuracy(y_pred, y_true):
    y_pred = tf.math.round(y_pred)
    # Predicted class is the index of highest score in prediction vector (i.e. argmax).
    correct_prediction = tf.equal(tf.cast(y_pred, tf.int64), tf.cast(y_true, tf.int64))
    return tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Stochastic gradient descent optimizer.
optimizer = tf.keras.optimizers.Adam(learning_rate)

### Defining our Backward Prop
While many things need to be manually defined, we can use `tf.GradientTape()` to track trainable variables and determine their gradient in regards to certain calculations/outputs. This means we don't need to manually define our backwards propogation and simply rely on the `GradientTape().gradient(dy, dyx)` functionality.
 - Note: This can be changed for higher order gradients

In [43]:
# Optimization process. 
def run_optimization(x, y):
    # Wrap computation inside a GradientTape for automatic differentiation.
    with tf.GradientTape() as g:
        pred = neural_net(x)
        loss = cross_entropy(pred, y)
        
    # Variables to update, i.e. trainable variables.
    trainable_variables = list(weights.values()) + list(biases.values())

    # Compute gradients - d_loss/d_trainable_variables
    gradients = g.gradient(loss, trainable_variables)
    
    # Update W and b following gradients.
    optimizer.apply_gradients(zip(gradients, trainable_variables))

### Training our Model
Just as we need to manually define our network, we also may need to define our training cycle. Here we run through the following:
 1. Getting a batch from our dataset for ***training_steps*** number of times
 2. We update the weights/run the optimziation for these inputs
 3. We make a prediction, calculate loss, and determine the accuracy
 4. repeat

In [47]:
# Run training for the given number of steps.
for step, (batch_x, batch_y) in enumerate(train_data.take(training_steps), 1):
    # Run the optimization to update W and b values.
    run_optimization(batch_x, batch_y)
    
    if step % display_step == 0:
        pred = neural_net(batch_x)
        loss = cross_entropy(pred, batch_y)
        acc = accuracy(pred, batch_y)
        print("step: %i, loss: %f, accuracy: %f" % (step, loss, acc))

step: 100, loss: 0.771598, accuracy: 0.680000
step: 200, loss: 0.689500, accuracy: 0.720000
step: 300, loss: 0.803874, accuracy: 0.800000
step: 400, loss: 0.815437, accuracy: 0.920000
step: 500, loss: 0.809721, accuracy: 0.840000
step: 600, loss: 0.732772, accuracy: 0.640000
step: 700, loss: 0.795862, accuracy: 0.760000
step: 800, loss: 0.756262, accuracy: 0.760000
step: 900, loss: 0.715056, accuracy: 0.680000
step: 1000, loss: 0.723025, accuracy: 0.800000
step: 1100, loss: 0.768764, accuracy: 0.800000
step: 1200, loss: 0.742401, accuracy: 0.800000
step: 1300, loss: 0.765656, accuracy: 0.680000
step: 1400, loss: 0.660559, accuracy: 0.920000
step: 1500, loss: 0.721527, accuracy: 0.880000
step: 1600, loss: 0.690078, accuracy: 0.680000
step: 1700, loss: 0.609088, accuracy: 0.760000
step: 1800, loss: 0.717613, accuracy: 0.720000
step: 1900, loss: 0.666805, accuracy: 0.840000
step: 2000, loss: 0.629846, accuracy: 0.600000
step: 2100, loss: 0.651369, accuracy: 0.840000
step: 2200, loss: 0.63

## In Class
Use the model above to assess the accuracy of the model on the X_test/y_test holdout set.

In [60]:
# Space for work
X_test = np.array(X_test, dtype='float32')
test = tf.data.Dataset.from_tensor_slices((X_test,y_test))
p = neural_net(X_test)
l = cross_entropy(p, y_test)
a = accuracy(p, y_test)
print(l,a)

tf.Tensor(0.6418753, shape=(), dtype=float32) tf.Tensor(0.7881356, shape=(), dtype=float32)
