# Hello again

## Let's do some ML

We're assuming you have the data ready, the data should be in the form of **X**, **y**

**X** are all the feature data , training and test

**y** are all the labels, training and test

**If you've already split your data, make sure you use a version without splitting** We'll be using KFold Cross validation 

### Some imports as usual

In [None]:
import tensorflow as tf
import numpy as np

from tensorflow.python.keras._impl import keras
import pandas as pd
import time
%matplotlib inline
import matplotlib.pyplot as plt
from os.path import join
from sklearn.model_selection import StratifiedKFold
# Add any imports you may need to load your dataset

## Let's load our data  ( You should not need to do the below, and just load your X, y )

In [None]:
data = pd.read_csv('titanic_data.csv')\
    .dropna()\
    .drop(columns=['Ticket', 'PassengerId', 'Name', 'Cabin', 'Embarked'])
data['Sex'] = data['Sex'].apply({'female':0, 'male': 1}.get)
data['Fare'] = (data['Fare'] - data['Fare'].min()) / ( data['Fare'].max() - data['Fare'].min())
data['Age'] = (data['Age'] - data['Age'].min()) / ( data['Age'].max() - data['Age'].min())

X = np.array(data.drop("Survived", axis=1)) # Drop 'Survived', which is a column (axis 1) from our original data frame
y = np.array(data["Survived"]) 

## Let's setup some hyperparameters

Hyperparameters are values that you may tweak without changing the structure of your model


In [None]:
test_size = 0.2 # How much you want to split to use for the test
num_epochs = 50 # Number of epochs (Times to go through the whole dataset)
batch_size = 10 # The batch size defines how many examples to process before updating the weights
learning_rate = 0.01 # The learning rate of the optimization function

## Now let's define the functions that are needed by Tensorflow Estimator

### Starting with the input function

In [None]:
# This returns an input function for the model, the function 
def get_input_fn(x, y, model, batch_size=1, num_epochs=1, shuffle=False):
    return tf.estimator.inputs.numpy_input_fn(
        x={ model.input_names[0] : x},
        y=y,
        batch_size=batch_size,
        num_epochs=num_epochs,
        shuffle=shuffle)

### We convert the keras model to a tensorflow one

#### model + loss + optimizer = model ready to be trained

In [None]:
def get_estimator(options):
    model = options.get('model')
    loss = options.get('loss', 'mse')
    optimizer = options.get('optimizer', 'rmsprop')
    model.compile(loss=loss, optimizer=optimizer)
    return tf.keras.estimator.model_to_estimator(keras_model=model, model_dir=join('models', 'model-{}'.format(time.time())))


## Let's create our model

### What is Keras

Keras is a higher level API for building machine learning models, it has a Tensorflow implementation, which what we'll be using.

Keras can be a lot more expressive than even the high level API of tensorflow

See [Keras Layers](https://keras.io/layers/core/) for more layers

In this function, you can build your graph, whether it's a deep/shallow neural net, or a simple linear model as shown here

In [None]:
def CreateModel(input_shape):
    inputs = keras.layers.Input(shape=input_shape)
    output = keras.layers.Dense(units=1, activation='linear')(inputs)
    return keras.models.Model(inputs=inputs, outputs=output)


## And let's define the options to pass to our training function

In [None]:
options = {
    'model': CreateModel(X.shape[1:]), # Pass only the shape of the features ( pass timesteps if applicable too )
    'batch_size': batch_size,
    'num_epochs': num_epochs,
    'loss': 'mse', # See https://keras.io/losses/ for a list of loss functions
    'optimizer': keras.optimizers.SGD(lr=learning_rate) # See https://keras.io/optimizers/ for a list of optimizers
}

## Setting up and training/testing with kfold

In [None]:
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=10)
## We create a tensorflow session
cv_scores = []
with tf.Session() as sess:
    for train_index, test_index in kfold.split(X, y):
        estimator = get_estimator(options)
        train_input_fn = get_input_fn(X[train_index], 
                                      y[train_index], 
                                      model=options['model'], 
                                      batch_size=options['batch_size'],
                                      shuffle=True)
        print("Training....")
        estimator.train(input_fn=train_input_fn)
        print("Evalutating...")
        test_input_fn = get_input_fn(X[test_index], y[test_index], model=options['model'])
        score = estimator.evaluate(input_fn=test_input_fn)
        print(score)
        cv_scores.append(score['loss'])
    
    print("Average loss: {} with (+/- {:.2f}%)".format(np.mean(cv_scores), np.std(cv_scores)))
