# Hello again

## Let's do some ML

We're assuming you have the data ready, the data should be in the form of **X**, **y**

**X** are all the feature data , training and test

**y** are all the labels, training and test

**If you've already split your data, make sure you use a version without splitting** We'll be using KFold Cross validation 

### Some imports as usual

In [105]:
import tensorflow as tf
import numpy as np
from tensorflow.python.keras._impl import keras
import pandas as pd
import time
%matplotlib inline
import matplotlib.pyplot as plt
from os.path import join
from sklearn.model_selection import StratifiedKFold


In [106]:
# Add any imports you may need to load your dataset

## Let's load our data  ( You should not need to do the below, and just load your X, y )

In [108]:
data = pd.read_csv('titanic_data.csv')\
    .dropna()\
    .drop(columns=['Ticket', 'PassengerId', 'Name', 'Cabin', 'Embarked'])
data['Sex'] = data['Sex'].apply({'female':0, 'male': 1}.get)
data['Fare'] = (data['Fare'] - data['Fare'].min()) / ( data['Fare'].max() - data['Fare'].min())
data['Age'] = (data['Age'] - data['Age'].min()) / ( data['Age'].max() - data['Age'].min())

data = data.reset_index(drop=True)
X = data.drop(columns="Survived")# Drop 'Survived', which is a column (axis 1) from our original data frame
y = data["Survived"]

In [110]:
# Load your X, y here


## Let's get our hands dirty with Tensorflow

We'll be using the high level API within Tensorflow as they provide the following benefits:

- You can run Estimators-based models on a local host or on a distributed multi-server environment without changing your model. Furthermore, you can run Estimators-based models on CPUs, GPUs, or TPUs without recoding your model.
- Estimators simplify sharing implementations between model developers.
- You can develop a state of the art model with high-level intuitive code, In short, it is generally much easier to create models with Estimators than with the low-level TensorFlow APIs.
- Estimators are themselves built on tf.layers, which simplifies customization.
- Estimators build the graph for you. In other words, you don't have to build the graph.
- Estimators provide a safe distributed training loop that controls how and when to:
	- build the graph
	- initialize variables
	- start queues
	- handle exceptions
	- create checkpoint files and recover from failures
	- save summaries for TensorBoard

## What do we need to do to use an estimator?

- [ ] **One or more dataset import functions:** You can write the functions to return your X, y, 
- [ ] **Define a feature column:** Define the names and types of features
- [ ] **An estimator:** We'll be looking at LinearRegressor, and LinearClassifier [See more on estimators](https://www.tensorflow.org/api_docs/python/tf/estimator)

## Dataset import functions
They are essentially functions that the estimator will call to get its data, this separation allows a very easy swap of data sources.

You can also build a custom pipeline to do all the preprocessing that we've done so far. [more details](https://www.tensorflow.org/programmers_guide/datasets)

In [111]:
## We use the helper function numpy_input_fn it allows a good level of customization
batch_size = 10 # How many batches do we split our data
num_epochs = 10 # How many times do we loop over the data
shuffle = True
## These parameters will create a function that feeds the estimator the following:
## batches of 10 examples every iteration (model weights update)
## 890 ( total examples ) / 10 == 89 iterations
## 10 epochs * 89 iterations = 890 iterations 

## 'inputs' is the name of the input tensor will define in a bit
def get_input_fn(input_x, input_y, num_epochs=1, batch_size=1, shuffle=False):
    x_dict = input_x.to_dict(orient='list')
    for input_x_item in x_dict:
        x_dict[input_x_item] = np.array(x_dict[input_x_item])
    return tf.estimator.inputs.numpy_input_fn(
            x=x_dict,
            y=input_y,
            batch_size=batch_size,
            num_epochs=num_epochs,
            shuffle=shuffle)

In [112]:
def my_input_fn():
    # Define your function here, the function should return 
    pass

## That's data, now what

- [x] **One or more dataset import functions:**
- [ ] **Define a feature column:**
- [ ] **An estimator:**

## Defining feature columns

Each tf.feature_column identifies a feature name, its type, and any input pre-processing 

For example:
- **numeric_column:** Represents real valued or numerical features.
- **categorical_column_with_hash_bucket:** Represents sparse feature where ids are set by hashing.

We've already transformed all our data to numbers, so, it will all be numerical

[more on feature_columns](https://www.tensorflow.org/api_docs/python/tf/feature_column)

In [113]:
feature_columns = []
for col in X.columns:
    feature_columns.append(tf.feature_column.numeric_column(col))
                                              

In [114]:
# Define your feature_columns here 

## 2 down, 1 to go

- [x] **One or more dataset import functions:**
- [x] **Define a feature column:**
- [ ] **An estimator:**



## On to the estimator,

Let's start with a simple pre-built estimators, since I'm predicting a class for my data, I'll use a [LinearClassifier](https://www.tensorflow.org/api_docs/python/tf/estimator/LinearClassifier),
However, depending on your problem type, you may want to use a [LinearRegressor](https://www.tensorflow.org/api_docs/python/tf/estimator/LinearRegressor)



In [115]:
## Let's define it as a function, so we can reuse it cleanly when needed 
## Modify this to return the appropriate estimator for your problem type
def get_estimator():
    return tf.estimator.LinearClassifier(
    feature_columns=feature_columns,
    )

## Great! now we have all the things we need to start training

- [x] **One or more dataset import functions:**
- [x] **Define a feature column:**
- [x] **An estimator:**



In [116]:
# Let's define our experiment function to tie things together

def run_experiment(X_train, y_train, X_test, y_test):
    print("X_train: {}, y_train:{}, X_test: {}, y_test: {}".format(len(X_train), len(y_train), len(X_test), len(y_test)))
    estimator = get_estimator()
    train_input_fn = get_input_fn(X_train, y_train, batch_size=10, num_epochs=10, shuffle=True)
    estimator.train(input_fn=train_input_fn)
    test_input_fn = get_input_fn(X_test, y_test, batch_size=len(X_test))
    estimator.test(input_fn=test_input_fn)


In [117]:
# Now we just need to run our experiment one or more times ( If we're doing KFold for instance )
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=10)
with tf.Session() as sess:
    for train_index, test_index in kfold.split(np.array(X), np.array(y)):
        run_experiment(X.iloc[train_index], y.iloc[train_index], X.iloc[test_index], y.iloc[test_index])
        
        

X_train: 164, y_train:164, X_test: 19, y_test: 19
INFO:tensorflow:Using default config.




INFO:tensorflow:Using config: {'_model_dir': 'C:\\Users\\alija\\AppData\\Local\\Temp\\tmpir9ahl6l', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x0000024C32640588>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Graph was finalized.




INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self.loc[key]


InvalidArgumentError: assertion failed: [Label IDs must >= 0] [Condition x >= 0 did not hold element-wise:] [x (linear/head/ToFloat:0) = ] [[1][0][1]...]
	 [[Node: linear/head/assert_range/assert_non_negative/assert_less_equal/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_FLOAT], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](linear/head/assert_range/assert_non_negative/assert_less_equal/Assert/AssertGuard/Assert/Switch, linear/head/assert_range/assert_non_negative/assert_less_equal/Assert/AssertGuard/Assert/data_0, linear/head/assert_range/assert_non_negative/assert_less_equal/Assert/AssertGuard/Assert/data_1, linear/head/assert_range/assert_non_negative/assert_less_equal/Assert/AssertGuard/Assert/data_2, linear/head/assert_range/assert_non_negative/assert_less_equal/Assert/AssertGuard/Assert/Switch_1)]]

Caused by op 'linear/head/assert_range/assert_non_negative/assert_less_equal/Assert/AssertGuard/Assert', defined at:
  File "D:\Python\Python36\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "D:\Python\Python36\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\traitlets\config\application.py", line 658, in launch_instance
    app.start()
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\ipykernel\kernelapp.py", line 486, in start
    self.io_loop.start()
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tornado\platform\asyncio.py", line 127, in start
    self.asyncio_loop.run_forever()
  File "D:\Python\Python36\lib\asyncio\base_events.py", line 422, in run_forever
    self._run_once()
  File "D:\Python\Python36\lib\asyncio\base_events.py", line 1432, in _run_once
    handle._run()
  File "D:\Python\Python36\lib\asyncio\events.py", line 145, in _run
    self._callback(*self._args)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tornado\platform\asyncio.py", line 117, in _handle_events
    handler_func(fileobj, events)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tornado\stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\zmq\eventloop\zmqstream.py", line 450, in _handle_events
    self._handle_recv()
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\zmq\eventloop\zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\zmq\eventloop\zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tornado\stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\ipykernel\kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\ipykernel\kernelbase.py", line 233, in dispatch_shell
    handler(stream, idents, msg)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\ipykernel\kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\ipykernel\ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\ipykernel\zmqshell.py", line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\IPython\core\interactiveshell.py", line 2662, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\IPython\core\interactiveshell.py", line 2785, in _run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\IPython\core\interactiveshell.py", line 2903, in run_ast_nodes
    if self.run_code(code, result):
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\IPython\core\interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-117-b870c1e3bdee>", line 5, in <module>
    run_experiment(X.iloc[train_index], y.iloc[train_index], X.iloc[test_index], y.iloc[test_index])
  File "<ipython-input-116-3dcdc76465ae>", line 7, in run_experiment
    estimator.train(input_fn=train_input_fn)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\estimator\estimator.py", line 363, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\estimator\estimator.py", line 843, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\estimator\estimator.py", line 856, in _train_model_default
    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\estimator\estimator.py", line 831, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\estimator\canned\linear.py", line 311, in _model_fn
    config=config)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\estimator\canned\linear.py", line 164, in _linear_model_fn
    logits=logits)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\estimator\canned\head.py", line 1135, in create_estimator_spec
    features=features, mode=mode, logits=logits, labels=labels))
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\estimator\canned\head.py", line 1042, in create_loss
    labels = _assert_range(labels, 2)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\estimator\canned\head.py", line 1455, in _assert_range
    labels, message=message or 'Label IDs must >= 0')
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\ops\check_ops.py", line 263, in assert_non_negative
    return assert_less_equal(zero, x, data=data, summarize=summarize)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\ops\check_ops.py", line 630, in assert_less_equal
    return control_flow_ops.Assert(condition, data, summarize=summarize)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\util\tf_should_use.py", line 118, in wrapped
    return _add_should_use_warning(fn(*args, **kwargs))
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 180, in Assert
    guarded_assert = cond(condition, no_op, true_assert, name="AssertGuard")
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\util\deprecation.py", line 432, in new_func
    return func(*args, **kwargs)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2072, in cond
    orig_res_f, res_f = context_f.BuildCondBranch(false_fn)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 1913, in BuildCondBranch
    original_result = fn()
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 178, in true_assert
    condition, data, summarize, name="Assert")
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\ops\gen_logging_ops.py", line 51, in _assert
    name=name)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 3392, in create_op
    op_def=op_def)
  File "D:\Work\tuts\ml_crashcourse\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): assertion failed: [Label IDs must >= 0] [Condition x >= 0 did not hold element-wise:] [x (linear/head/ToFloat:0) = ] [[1][0][1]...]
	 [[Node: linear/head/assert_range/assert_non_negative/assert_less_equal/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_FLOAT], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](linear/head/assert_range/assert_non_negative/assert_less_equal/Assert/AssertGuard/Assert/Switch, linear/head/assert_range/assert_non_negative/assert_less_equal/Assert/AssertGuard/Assert/data_0, linear/head/assert_range/assert_non_negative/assert_less_equal/Assert/AssertGuard/Assert/data_1, linear/head/assert_range/assert_non_negative/assert_less_equal/Assert/AssertGuard/Assert/data_2, linear/head/assert_range/assert_non_negative/assert_less_equal/Assert/AssertGuard/Assert/Switch_1)]]
