# Hello again

## Let's do some ML

We're assuming you have the data ready, the data should be in the form of **X**, **y**

**X** are all the feature data , training and test

**y** are all the labels, training and test

**If you've already split your data, make sure you use a version without splitting** We'll be using KFold Cross validation 

### Some imports as usual

In [6]:
import tensorflow as tf
import numpy as np
from tensorflow.python.keras._impl import keras
import pandas as pd
import time
%matplotlib inline
import matplotlib.pyplot as plt
from os.path import join
from sklearn.model_selection import StratifiedKFold


In [7]:
# Add any imports you may need to load your dataset

## Let's load our data  ( You should not need to do the below, and just load your X, y )

In [8]:
data = pd.read_csv('titanic_data.csv')\
    .dropna()\
    .drop(columns=['Ticket', 'PassengerId', 'Name', 'Cabin', 'Embarked'])
data['Sex'] = data['Sex'].apply({'female':0, 'male': 1}.get)
data['Fare'] = (data['Fare'] - data['Fare'].min()) / ( data['Fare'].max() - data['Fare'].min())
data['Age'] = (data['Age'] - data['Age'].min()) / ( data['Age'].max() - data['Age'].min())

data = data.reset_index(drop=True)
X = data.drop(columns="Survived")# Drop 'Survived', which is a column (axis 1) from our original data frame
y = data["Survived"]

In [9]:
# Load your X, y here


## Let's get our hands dirty with Tensorflow

We'll be using the high level API within Tensorflow as they provide the following benefits:

- You can run Estimators-based models on a local host or on a distributed multi-server environment without changing your model. Furthermore, you can run Estimators-based models on CPUs, GPUs, or TPUs without recoding your model.
- Estimators simplify sharing implementations between model developers.
- You can develop a state of the art model with high-level intuitive code, In short, it is generally much easier to create models with Estimators than with the low-level TensorFlow APIs.
- Estimators are themselves built on tf.layers, which simplifies customization.
- Estimators build the graph for you. In other words, you don't have to build the graph.
- Estimators provide a safe distributed training loop that controls how and when to:
	- build the graph
	- initialize variables
	- start queues
	- handle exceptions
	- create checkpoint files and recover from failures
	- save summaries for TensorBoard

## What do we need to do to use an estimator?

- [ ] **One or more dataset import functions:** You can write the functions to return your X, y, 
- [ ] **Define a feature column:** Define the names and types of features
- [ ] **An estimator:** We'll be looking at LinearRegressor, and LinearClassifier

## Dataset import functions
They are essentially functions that the estimator will call to get its data, this separation allows a very easy swap of data sources.

You can also build a custom pipeline to do all the preprocessing that we've done so far. [more details](https://www.tensorflow.org/programmers_guide/datasets)

In [10]:
## We use the helper function numpy_input_fn it allows a good level of customization
batch_size = 10 # How many batches do we split our data
num_epochs = 10 # How many times do we loop over the data
shuffle = True
## These parameters will create a function that feeds the estimator the following:
## batches of 10 examples every iteration (model weights update)
## 890 ( total examples ) / 10 == 89 iterations
## 10 epochs * 89 iterations = 890 iterations 

## The function should return a features dictionary, and labels
def get_input_fn(input_x, input_y, num_epochs=1, batch_size=1, shuffle=False):
    # Transform our dataframe into a features dictionary
    x_dict = input_x.to_dict(orient='list')
    for input_x_item in x_dict:
        x_dict[input_x_item] = np.array(x_dict[input_x_item])
        
    # Call the numpy_input_fn helper to create a function that returns dataset in batches    
    return tf.estimator.inputs.numpy_input_fn(
            x=x_dict,
            y=input_y,
            batch_size=batch_size,
            num_epochs=num_epochs,
            shuffle=shuffle)

In [11]:
def my_input_fn():
    # Define your function here, the function should return 
    pass

## That's data, now what

- [x] **One or more dataset import functions:**
- [ ] **Define a feature column:**
- [ ] **An estimator:**

## Defining feature columns

Each tf.feature_column identifies a feature name, its type, and any input pre-processing 

For example:
- **numeric_column:** Represents real valued or numerical features.
- **categorical_column_with_hash_bucket:** Represents sparse features where ids are set by hashing.

We've already transformed all our data to numbers, so, it will all be numerical

[more on feature_columns](https://www.tensorflow.org/api_docs/python/tf/feature_column)

In [12]:
feature_columns = []
for col in X.columns:
    feature_columns.append(tf.feature_column.numeric_column(col))
                                              

In [13]:
# Define your feature_columns here 

## 2 down, 1 to go

- [x] **One or more dataset import functions:**
- [x] **Define a feature column:**
- [ ] **An estimator:**



## On to the estimator,

Estimators wrap the logic needed to run the experiment, the dataset, the features, and the model.

[See more on estimators](https://www.tensorflow.org/api_docs/python/tf/estimator)


Let's start with a simple pre-built estimators, since I'm predicting a class for my data, I'll use a [LinearClassifier](https://www.tensorflow.org/api_docs/python/tf/estimator/LinearClassifier),
However, depending on your problem type, you may want to use a [LinearRegressor](https://www.tensorflow.org/api_docs/python/tf/estimator/LinearRegressor)



In [14]:
## Let's define it as a function, so we can reuse it cleanly when needed 
## Modify this to return the appropriate estimator for your problem type

## You can also specify model_dir or use the temporary one to view with tensorboard
def get_estimator():
    return tf.estimator.LinearClassifier(
    feature_columns=feature_columns,
    )

## Great! now we have all the things we need to start training

- [x] **One or more dataset import functions:**
- [x] **Define a feature column:**
- [x] **An estimator:**



In [24]:
# Let's define our experiment function to tie things together

def run_experiment(X_train, y_train, X_test, y_test):
    print("X_train: {}, y_train:{}, X_test: {}, y_test: {}".format(len(X_train), len(y_train), len(X_test), len(y_test)))
    estimator = get_estimator()
    train_input_fn = get_input_fn(X_train, y_train, batch_size=10, num_epochs=10, shuffle=True)
    estimator.train(input_fn=train_input_fn)
    test_input_fn = get_input_fn(X_test, y_test, batch_size=len(X_test))
    return estimator.evaluate(input_fn=test_input_fn)


In [27]:
# Now we just need to run our experiment one or more times ( If we're doing KFold for instance )
kfold = StratifiedKFold(n_splits=2, shuffle=True, random_state=10)
cv_precisions = []
with tf.Session() as sess:
    for train_index, test_index in kfold.split(np.array(X), np.array(y)):
        score = run_experiment(X.iloc[train_index], y.iloc[train_index].values, X.iloc[test_index], y.iloc[test_index].values)
        cv_precisions.append(score['precision'])
        
print("------------------------------------")
print("Avg precision: {} , (+/- {:.2f} %)".format(np.mean(cv_precisions), np.std(cv_precisions)))
        

X_train: 91, y_train:91, X_test: 92, y_test: 92
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_train_distribute': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x11879c490>, '_evaluation_master': '', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_master': '', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_model_dir': '/var/folders/2c/_zmrrhgj4k5dn9j66nmjcs1m0000zz/T/tmpqrFXpg', '_global_id_in_cluster': 0, '_save_summary_steps': 100}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow: