Table of Contents
===

<a href="#about">About this notebook</a> <br />
<a href="#setup">Setting things up</a>

Local Experience
1. <a href="#local_preprocessing">Local preprocessing starting from csv files</a>
1. <a href="#local_training">Local training</a>
1. <a href="#local_prediction">Local prediction</a>
1. <a href="#local_batch_prediction">Local batch prediction</a>

<a name="about"></a>
About this notebook
======

This notebook uses the datalab structured data package for building and running a Tensorflow classification problems locally. This notebook uses the classic <a href="https://en.wikipedia.org/wiki/Iris_flower_data_set">Iris flower data set.</a>

In the notebooks that follow, an example of running preprocessing, training, and prediction using the Google Cloud Machine Learning Engine services are given. Note that running the cloud versions of preprocessing, training, and prediction take longer than the local versions. The performance advantage of using the cloud applies to very large data sets, and you don't see it with this sample because the data is small and run time is dominated by setup overhead.

<a name="setup"></a>
Setting things up
=====

In [1]:
import datalab_structured_data as sd

Lets look at the versions of structured_data and TF we have. Make sure TF is 1.0.0-rc1, and SD is 0.0.1.

In [57]:
import tensorflow as tf
from tensorflow.python.lib.io import file_io
import datalab.ml as ml
import os
print('tf ' + str(tf.__version__))
print('sd ' + str(sd.__version__))

tf 1.0.0
sd 0.0.1


This notebook will write files during preprocessing, training, and prediction. Please give a root folder you wish to use.

In [58]:
LOCAL_ROOT = './iris_notebook_workspace'
if not file_io.file_exists(LOCAL_ROOT):
  file_io.recursive_create_dir(LOCAL_ROOT)

The iris dataset is small, so the data is embedded into this notebook. Write the iris data set into 3 files: training, eval, prediction. Note that the prediction dataset does not have target values. 

In [59]:
%writefile {LOCAL_ROOT}/train.csv
Iris-setosa,4,4.6,3.1,1.5,0.2
Iris-setosa,20,5.1,3.8,1.5,0.3
Iris-setosa,43,4.4,3.2,1.3,0.2
Iris-versicolor,88,6.3,2.3,4.4,1.3
Iris-versicolor,76,6.6,3,4.4,1.4
Iris-versicolor,63,6,2.2,4,1
Iris-setosa,47,5.1,3.8,1.6,0.2
Iris-virginica,146,6.7,3,5.2,2.3
Iris-versicolor,53,6.9,3.1,4.9,1.5
Iris-versicolor,71,5.9,3.2,4.8,1.8
Iris-virginica,144,6.8,3.2,5.9,2.3
Iris-virginica,124,6.3,2.7,4.9,1.8
Iris-virginica,122,5.6,2.8,4.9,2
Iris-setosa,17,5.4,3.9,1.3,0.4
Iris-setosa,7,4.6,3.4,1.4,0.3
Iris-versicolor,87,6.7,3.1,4.7,1.5
Iris-virginica,131,7.4,2.8,6.1,1.9
Iris-setosa,2,4.9,3,1.4,0.2
Iris-virginica,147,6.3,2.5,5,1.9
Iris-setosa,29,5.2,3.4,1.4,0.2
Iris-versicolor,91,5.5,2.6,4.4,1.2
Iris-virginica,110,7.2,3.6,6.1,2.5
Iris-virginica,121,6.9,3.2,5.7,2.3
Iris-setosa,45,5.1,3.8,1.9,0.4
Iris-setosa,10,4.9,3.1,1.5,0.1
Iris-setosa,36,5,3.2,1.2,0.2
Iris-virginica,112,6.4,2.7,5.3,1.9
Iris-setosa,46,4.8,3,1.4,0.3
Iris-virginica,132,7.9,3.8,6.4,2
Iris-versicolor,77,6.8,2.8,4.8,1.4
Iris-setosa,6,5.4,3.9,1.7,0.4
Iris-versicolor,90,5.5,2.5,4,1.3
Iris-virginica,137,6.3,3.4,5.6,2.4
Iris-setosa,31,4.8,3.1,1.6,0.2
Iris-virginica,120,6,2.2,5,1.5
Iris-virginica,138,6.4,3.1,5.5,1.8
Iris-setosa,24,5.1,3.3,1.7,0.5
Iris-versicolor,96,5.7,3,4.2,1.2
Iris-versicolor,68,5.8,2.7,4.1,1
Iris-virginica,150,5.9,3,5.1,1.8
Iris-setosa,26,5,3,1.6,0.2
Iris-versicolor,98,6.2,2.9,4.3,1.3
Iris-versicolor,80,5.7,2.6,3.5,1
Iris-versicolor,72,6.1,2.8,4,1.3
Iris-versicolor,75,6.4,2.9,4.3,1.3
Iris-setosa,38,4.9,3.1,1.5,0.1
Iris-setosa,35,4.9,3.1,1.5,0.1
Iris-versicolor,89,5.6,3,4.1,1.3
Iris-versicolor,84,6,2.7,5.1,1.6
Iris-versicolor,51,7,3.2,4.7,1.4
Iris-virginica,116,6.4,3.2,5.3,2.3
Iris-versicolor,54,5.5,2.3,4,1.3
Iris-virginica,130,7.2,3,5.8,1.6
Iris-virginica,115,5.8,2.8,5.1,2.4
Iris-setosa,32,5.4,3.4,1.5,0.4
Iris-virginica,104,6.3,2.9,5.6,1.8
Iris-versicolor,64,6.1,2.9,4.7,1.4
Iris-setosa,18,5.1,3.5,1.4,0.3
Iris-versicolor,66,6.7,3.1,4.4,1.4
Iris-setosa,15,5.8,4,1.2,0.2
Iris-versicolor,52,6.4,3.2,4.5,1.5
Iris-virginica,103,7.1,3,5.9,2.1
Iris-setosa,9,4.4,2.9,1.4,0.2
Iris-versicolor,83,5.8,2.7,3.9,1.2
Iris-virginica,135,6.1,2.6,5.6,1.4
Iris-virginica,139,6,3,4.8,1.8
Iris-versicolor,85,5.4,3,4.5,1.5
Iris-virginica,106,7.6,3,6.6,2.1
Iris-setosa,27,5,3.4,1.6,0.4
Iris-virginica,140,6.9,3.1,5.4,2.1
Iris-versicolor,67,5.6,3,4.5,1.5
Iris-setosa,12,4.8,3.4,1.6,0.2
Iris-versicolor,56,5.7,2.8,4.5,1.3
Iris-virginica,113,6.8,3,5.5,2.1
Iris-versicolor,62,5.9,3,4.2,1.5
Iris-virginica,145,6.7,3.3,5.7,2.5
Iris-virginica,111,6.5,3.2,5.1,2
Iris-virginica,141,6.7,3.1,5.6,2.4
Iris-setosa,34,5.5,4.2,1.4,0.2
Iris-versicolor,81,5.5,2.4,3.8,1.1
Iris-setosa,8,5,3.4,1.5,0.2
Iris-virginica,129,6.4,2.8,5.6,2.1
Iris-versicolor,57,6.3,3.3,4.7,1.6
Iris-virginica,128,6.1,3,4.9,1.8
Iris-virginica,119,7.7,2.6,6.9,2.3
Iris-virginica,126,7.2,3.2,6,1.8
Iris-versicolor,58,4.9,2.4,3.3,1
Iris-virginica,117,6.5,3,5.5,1.8
Iris-virginica,127,6.2,2.8,4.8,1.8
Iris-setosa,16,5.7,4.4,1.5,0.4
Iris-setosa,3,4.7,3.2,1.3,0.2
Iris-virginica,108,7.3,2.9,6.3,1.8
Iris-virginica,118,7.7,3.8,6.7,2.2
Iris-setosa,42,4.5,2.3,1.3,0.3
Iris-virginica,142,6.9,3.1,5.1,2.3
Iris-setosa,14,4.3,3,1.1,0.1
Iris-virginica,134,6.3,2.8,5.1,1.5
Iris-versicolor,94,5,2.3,3.3,1
Iris-setosa,19,5.7,3.8,1.7,0.3
Iris-virginica,133,6.4,2.8,5.6,2.2
Iris-virginica,114,5.7,2.5,5,2
Iris-versicolor,86,6,3.4,4.5,1.6
Iris-versicolor,93,5.8,2.6,4,1.2
Iris-versicolor,92,6.1,3,4.6,1.4
Iris-virginica,109,6.7,2.5,5.8,1.8
Iris-virginica,102,5.8,2.7,5.1,1.9
Iris-setosa,41,5,3.5,1.3,0.3
Iris-versicolor,60,5.2,2.7,3.9,1.4
Iris-virginica,105,6.5,3,5.8,2.2
Iris-versicolor,65,5.6,2.9,3.6,1.3
Iris-setosa,28,5.2,3.5,1.5,0.2
Iris-versicolor,82,5.5,2.4,3.7,1
Iris-setosa,25,4.8,3.4,1.9,0.2
Iris-versicolor,79,6,2.9,4.5,1.5
Iris-setosa,1,5.1,3.5,1.4,0.2
Iris-versicolor,61,5,2,3.5,1
Iris-virginica,149,6.2,3.4,5.4,2.3
Iris-setosa,48,4.6,3.2,1.4,0.2
Iris-setosa,22,5.1,3.7,1.5,0.4
Iris-setosa,30,4.7,3.2,1.6,0.2

Overwriting ./iris_notebook_workspace/train.csv


In [60]:
%writefile {LOCAL_ROOT}/eval.csv
Iris-virginica,107,4.9,2.5,4.5,1.7
Iris-versicolor,100,5.7,2.8,4.1,1.3
Iris-versicolor,99,5.1,2.5,3,1.1
Iris-setosa,13,4.8,3,1.4,0.1
Iris-versicolor,70,5.6,2.5,3.9,1.1
Iris-setosa,11,5.4,3.7,1.5,0.2
Iris-setosa,37,5.5,3.5,1.3,0.2
Iris-versicolor,69,6.2,2.2,4.5,1.5
Iris-setosa,40,5.1,3.4,1.5,0.2
Iris-virginica,101,6.3,3.3,6,2.5
Iris-setosa,39,4.4,3,1.3,0.2
Iris-versicolor,74,6.1,2.8,4.7,1.2
Iris-versicolor,97,5.7,2.9,4.2,1.3
Iris-setosa,50,5,3.3,1.4,0.2
Iris-versicolor,95,5.6,2.7,4.2,1.3
Iris-setosa,44,5,3.5,1.6,0.6
Iris-virginica,123,7.7,2.8,6.7,2
Iris-setosa,23,4.6,3.6,1,0.2
Iris-versicolor,59,6.6,2.9,4.6,1.3
Iris-virginica,148,6.5,3,5.2,2
Iris-versicolor,55,6.5,2.8,4.6,1.5
Iris-setosa,49,5.3,3.7,1.5,0.2
Iris-versicolor,78,6.7,3,5,1.7
Iris-versicolor,73,6.3,2.5,4.9,1.5
Iris-virginica,136,7.7,3,6.1,2.3
Iris-setosa,33,5.2,4.1,1.5,0.1
Iris-virginica,125,6.7,3.3,5.7,2.1
Iris-virginica,143,5.8,2.7,5.1,1.9
Iris-setosa,21,5.4,3.4,1.7,0.2
Iris-setosa,5,5,3.6,1.4,0.2

Overwriting ./iris_notebook_workspace/eval.csv


In [61]:
%writefile {LOCAL_ROOT}/predict.csv
107,4.9,2.5,4.5,1.7
100,5.7,2.8,4.1,1.3
99,5.1,2.5,3,1.1
13,4.8,3,1.4,0.1
70,5.6,2.5,3.9,1.1
11,5.4,3.7,1.5,0.2
37,5.5,3.5,1.3,0.2
69,6.2,2.2,4.5,1.5
40,5.1,3.4,1.5,0.2
101,6.3,3.3,6,2.5
39,4.4,3,1.3,0.2
74,6.1,2.8,4.7,1.2
97,5.7,2.9,4.2,1.3
50,5,3.3,1.4,0.2
95,5.6,2.7,4.2,1.3
44,5,3.5,1.6,0.6
123,7.7,2.8,6.7,2
23,4.6,3.6,1,0.2
59,6.6,2.9,4.6,1.3
148,6.5,3,5.2,2
55,6.5,2.8,4.6,1.5
49,5.3,3.7,1.5,0.2
78,6.7,3,5,1.7
73,6.3,2.5,4.9,1.5
136,7.7,3,6.1,2.3
33,5.2,4.1,1.5,0.1
125,6.7,3.3,5.7,2.1
143,5.8,2.7,5.1,1.9
21,5.4,3.4,1.7,0.2
5,5,3.6,1.4,0.2

Overwriting ./iris_notebook_workspace/predict.csv


<a name="local_preprocessing"></a>
Local preprocessing starting from csv files
=====

A schema file is used to describe each column of the csv files. It is assumed that the train, eval, and prediction csv files all have the same schema, but the prediction file has a missing target column. The format of the  schema file is a valid BigQuery table schema file. This allows BigQuery to be used later in cloud preprocessing. Only 3 BigQuery types are supported: STRING (for categorical columns) and INTEGER and FLOAT (for numerical columns).

In [62]:
%writefile {LOCAL_ROOT}/schema.json
[
    {
        "mode": "NULLABLE",
        "name": "flower",
        "type": "STRING"
    },
    {
        "mode": "REQUIRED",
        "name": "key",
        "type": "INTEGER"
    },
    {
        "mode": "NULLABLE",
        "name": "sepal_length",
        "type": "FLOAT"
    },
    {
        "mode": "NULLABLE",
        "name": "sepal_width",
        "type": "FLOAT"
    },
    {
        "mode": "NULLABLE",
        "name": "petal_length",
        "type": "FLOAT"
    },
    {
        "mode": "NULLABLE",
        "name": "petal_width",
        "type": "FLOAT"
    }   
]

Overwriting ./iris_notebook_workspace/schema.json


In [63]:
!rm -fr {LOCAL_ROOT}/preprocess

In [64]:
train_csv = ml.CsvDataSet(
  file_pattern=os.path.join(LOCAL_ROOT, 'train.csv'),
  schema_file=os.path.join(LOCAL_ROOT, 'schema.json'))
eval_csv = ml.CsvDataSet(
  file_pattern=os.path.join(LOCAL_ROOT, 'eval.csv'),
  schema_file=os.path.join(LOCAL_ROOT, 'schema.json'))

In [65]:
sd.local_preprocess(
  dataset=train_csv,
  output_dir=os.path.join(LOCAL_ROOT, 'preprocess'),
)

Starting local preprocessing.
Local preprocessing done.


The output of preprocessing is a numerical_analysis file that contains analysis from the numerical columns, and a vocab file from each categorical column. The files preoduced by preprocessing are consumed in training, and you should not have to worry about these files. Just for fun, lets look at them.

In [66]:
!ls  {LOCAL_ROOT}/preprocess

numerical_analysis.json  schema.json  vocab_flower.csv


In [67]:
!cat {LOCAL_ROOT}/preprocess/numerical_analysis.json

{
  "sepal_width": {
    "max": 4.4,
    "mean": 3.050833333333332,
    "min": 2.0
  },
  "petal_width": {
    "max": 2.5,
    "mean": 1.2324999999999995,
    "min": 0.1
  },
  "sepal_length": {
    "max": 7.9,
    "mean": 5.867500000000002,
    "min": 4.3
  },
  "key": {
    "max": 150.0,
    "mean": 76.73333333333333,
    "min": 1.0
  },
  "petal_length": {
    "max": 6.9,
    "mean": 3.830833333333335,
    "min": 1.1
  }
}

In [68]:
!cat {LOCAL_ROOT}/preprocess/schema.json

[{"type": "STRING", "mode": "NULLABLE", "name": "flower"}, {"type": "INTEGER", "mode": "REQUIRED", "name": "key"}, {"type": "FLOAT", "mode": "NULLABLE", "name": "sepal_length"}, {"type": "FLOAT", "mode": "NULLABLE", "name": "sepal_width"}, {"type": "FLOAT", "mode": "NULLABLE", "name": "petal_length"}, {"type": "FLOAT", "mode": "NULLABLE", "name": "petal_width"}]

In [69]:
!cat {LOCAL_ROOT}/preprocess/vocab_flower.csv

Iris-virginica
Iris-setosa
Iris-versicolor

<a name="local_training"></a>
Local Training
===========

The files in the output folder of preprocessing are consumed by the trainer. Training requires a transform config file to describe what transforms to apply on the data. The key and target transform are the only required transform, a default transform will be applied to every other column if it is not listed in the transforms.

In [70]:
%writefile {LOCAL_ROOT}/transforms.json
{
  "sepal_length": {"transform": "scale"},
  "sepal_width": {"transform": "scale"},
  "key": {"transform": "key"},
  "flower": {"transform": "target"}
 }

Overwriting ./iris_notebook_workspace/transforms.json


In [71]:
!rm -fr {LOCAL_ROOT}/training

In [72]:
sd.local_train(
  train_dataset=train_csv,
  eval_dataset=eval_csv,
  transforms=os.path.join(LOCAL_ROOT, 'transforms.json'),
  preprocess_output_dir=os.path.join(LOCAL_ROOT, 'preprocess'),
  output_dir=os.path.join(LOCAL_ROOT, 'training'),
  model_type='linear_classification',
  top_n=3,
  max_steps=1000,
)

Starting local training.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_task_type': None, '_environment': 'local', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fa5c4708810>, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1
}
, '_task_id': 0, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_evaluation_master': '', '_keep_checkpoint_every_n_hours': 10000, '_master': ''}


INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_task_type': None, '_environment': 'local', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fa5c4708810>, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1
}
, '_task_id': 0, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_evaluation_master': '', '_keep_checkpoint_every_n_hours': 10000, '_master': ''}


Instructions for updating:
Monitors are deprecated. Please use tf.train.SessionRunHook.


Instructions for updating:
Monitors are deprecated. Please use tf.train.SessionRunHook.


Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.


Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Saving checkpoints for 1 into ./iris_notebook_workspace/training/train/model.ckpt.


INFO:tensorflow:Saving checkpoints for 1 into ./iris_notebook_workspace/training/train/model.ckpt.


INFO:tensorflow:loss = 1.09861, step = 1


INFO:tensorflow:loss = 1.09861, step = 1


Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.


Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.


INFO:tensorflow:Starting evaluation at 2017-02-24-17:39:30


INFO:tensorflow:Starting evaluation at 2017-02-24-17:39:30


INFO:tensorflow:Finished evaluation at 2017-02-24-17:39:30


INFO:tensorflow:Finished evaluation at 2017-02-24-17:39:30


INFO:tensorflow:Saving dict for global step 1: accuracy = 0.0, auc = 0.0, global_step = 1, loss = 0.0


INFO:tensorflow:Saving dict for global step 1: accuracy = 0.0, auc = 0.0, global_step = 1, loss = 0.0






INFO:tensorflow:Validation (step 100): loss = 0.0, auc = 0.0, global_step = 1, accuracy = 0.0


INFO:tensorflow:Validation (step 100): loss = 0.0, auc = 0.0, global_step = 1, accuracy = 0.0


INFO:tensorflow:global_step/sec: 94.0808


INFO:tensorflow:global_step/sec: 94.0808


INFO:tensorflow:loss = 0.555178, step = 101


INFO:tensorflow:loss = 0.555178, step = 101


INFO:tensorflow:global_step/sec: 331.693


INFO:tensorflow:global_step/sec: 331.693


INFO:tensorflow:loss = 0.390543, step = 201


INFO:tensorflow:loss = 0.390543, step = 201


INFO:tensorflow:global_step/sec: 318.484


INFO:tensorflow:global_step/sec: 318.484


INFO:tensorflow:loss = 0.329307, step = 301


INFO:tensorflow:loss = 0.329307, step = 301


INFO:tensorflow:global_step/sec: 326.093


INFO:tensorflow:global_step/sec: 326.093


INFO:tensorflow:loss = 0.290879, step = 401


INFO:tensorflow:loss = 0.290879, step = 401


INFO:tensorflow:global_step/sec: 343.256


INFO:tensorflow:global_step/sec: 343.256


INFO:tensorflow:loss = 0.329495, step = 501


INFO:tensorflow:loss = 0.329495, step = 501


INFO:tensorflow:global_step/sec: 317.692


INFO:tensorflow:global_step/sec: 317.692


INFO:tensorflow:loss = 0.23167, step = 601


INFO:tensorflow:loss = 0.23167, step = 601


INFO:tensorflow:global_step/sec: 313.219


INFO:tensorflow:global_step/sec: 313.219


INFO:tensorflow:loss = 0.24501, step = 701


INFO:tensorflow:loss = 0.24501, step = 701


INFO:tensorflow:global_step/sec: 324.624


INFO:tensorflow:global_step/sec: 324.624


INFO:tensorflow:loss = 0.230814, step = 801


INFO:tensorflow:loss = 0.230814, step = 801


INFO:tensorflow:global_step/sec: 318.551


INFO:tensorflow:global_step/sec: 318.551


INFO:tensorflow:loss = 0.196865, step = 901


INFO:tensorflow:loss = 0.196865, step = 901


INFO:tensorflow:Saving checkpoints for 1000 into ./iris_notebook_workspace/training/train/model.ckpt.


INFO:tensorflow:Saving checkpoints for 1000 into ./iris_notebook_workspace/training/train/model.ckpt.


INFO:tensorflow:Loss for final step: 0.23336.


INFO:tensorflow:Loss for final step: 0.23336.


Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.


Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.


INFO:tensorflow:Starting evaluation at 2017-02-24-17:39:34


INFO:tensorflow:Starting evaluation at 2017-02-24-17:39:34


INFO:tensorflow:Finished evaluation at 2017-02-24-17:39:34


INFO:tensorflow:Finished evaluation at 2017-02-24-17:39:34


INFO:tensorflow:Saving dict for global step 1000: accuracy = 0.0, auc = 0.0, global_step = 1000, loss = 0.0


INFO:tensorflow:Saving dict for global step 1000: accuracy = 0.0, auc = 0.0, global_step = 1000, loss = 0.0






INFO:tensorflow:Assets added to graph.


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:No assets to write.


INFO:tensorflow:No assets to write.


INFO:tensorflow:SavedModel written to: ./iris_notebook_workspace/training/train/export/intermediate_evaluation_models/1487957975041/saved_model.pb


INFO:tensorflow:SavedModel written to: ./iris_notebook_workspace/training/train/export/intermediate_evaluation_models/1487957975041/saved_model.pb


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:No assets to write.


INFO:tensorflow:No assets to write.


INFO:tensorflow:SavedModel written to: ./iris_notebook_workspace/training/train/export/intermediate_prediction_models/1487957975432/saved_model.pb


INFO:tensorflow:SavedModel written to: ./iris_notebook_workspace/training/train/export/intermediate_prediction_models/1487957975432/saved_model.pb


Local training done.


In [73]:
!ls {LOCAL_ROOT}/training/

evaluation_model  model  train


<a name="local_prediction"></a>
Local prediction
================

Local predict uses the model produced by training. The input data can be a csv string or Pandas DataFrame, but the schema must match the data set used for training, except the target column is missing. That is, if the training dataset had the values "id,target,value1,value2", the prediction data must be in the form "id,value1,value2".

In [74]:
sd.local_predict(
  training_ouput_dir=os.path.join(LOCAL_ROOT, 'training'),
  data=['101,6.3,3.3,6,2.5',
        '107,4.9,2.5,4.5,1.7',
        '100,5.7,2.8,4.1,1.3']
)

Starting local prediction.
Local prediction done.


Unnamed: 0,key_from_input,top_1_label,top_1_score,top_2_label,top_2_score,top_3_label,top_3_score
0,101,Iris-virginica,0.952918,Iris-versicolor,0.047047,Iris-setosa,3.5e-05
1,107,Iris-versicolor,0.708517,Iris-virginica,0.27715,Iris-setosa,0.014333
2,100,Iris-versicolor,0.811353,Iris-virginica,0.174092,Iris-setosa,0.014555


In [75]:
import pandas as pd
sd.local_predict(
  training_ouput_dir=os.path.join(LOCAL_ROOT, 'training'),
  data=pd.DataFrame(
    [[101,6.3,3.3,6,2.5],
     [107,4.9,2.5,4.5,1.7],
     [100,5.7,2.8,4.1,1.3]])
)

Starting local prediction.
Local prediction done.


Unnamed: 0,key_from_input,top_1_label,top_1_score,top_2_label,top_2_score,top_3_label,top_3_score
0,101,Iris-virginica,0.952918,Iris-versicolor,0.047047,Iris-setosa,3.5e-05
1,107,Iris-versicolor,0.708517,Iris-virginica,0.27715,Iris-setosa,0.014333
2,100,Iris-versicolor,0.811353,Iris-virginica,0.174092,Iris-setosa,0.014555


<a name="local_batch_prediction"></a>
Local batch prediction
============

Local batch prediction runs prediction on batched input data. This is ideal if the input dataset is very large or you have limited available main memory. However, for very large datasets, it is better to run batch prediction using the Google Cloud Machine Learning Engine services. Two output formats are supported, csv and json. The output may also be shardded. Another feature of batch prediction is the option to run evaluation--prediction on data that contains the target column. Like local_predict, the input data must batch the schema used for training.

In [76]:
!rm -fr {LOCAL_ROOT}/predict_out

In [77]:
sd.local_batch_predict(
  training_ouput_dir=os.path.join(LOCAL_ROOT, 'training'),
  prediction_input_file=os.path.join(LOCAL_ROOT, 'eval.csv'),
  output_dir=os.path.join(LOCAL_ROOT, 'predict_out'),
  output_format='csv',
  mode='evaluation'
)

Starting local batch prediction.
Local batch prediction done.


In [78]:
!ls {LOCAL_ROOT}/predict_out

csv_header.json  errors-00000-of-00001.txt  predictions-00000-of-00001.csv


In [79]:
!cat {LOCAL_ROOT}/predict_out/csv_header.json

[
  {
    "type": "INTEGER", 
    "mode": "NULLABLE", 
    "name": "key_from_input"
  }, 
  {
    "type": "STRING", 
    "mode": "NULLABLE", 
    "name": "target_from_input"
  }, 
  {
    "type": "STRING", 
    "mode": "NULLABLE", 
    "name": "top_1_label"
  }, 
  {
    "type": "FLOAT", 
    "mode": "NULLABLE", 
    "name": "top_1_score"
  }, 
  {
    "type": "STRING", 
    "mode": "NULLABLE", 
    "name": "top_2_label"
  }, 
  {
    "type": "FLOAT", 
    "mode": "NULLABLE", 
    "name": "top_2_score"
  }, 
  {
    "type": "STRING", 
    "mode": "NULLABLE", 
    "name": "top_3_label"
  }, 
  {
    "type": "FLOAT", 
    "mode": "NULLABLE", 
    "name": "top_3_score"
  }
]


In [80]:
!cat {LOCAL_ROOT}/predict_out/errors*

In [81]:
!head {LOCAL_ROOT}/predict_out/predictions-00000*

107,Iris-virginica,Iris-versicolor,0.708516895771,Iris-virginica,0.277149826288,Iris-setosa,0.0143332714215
100,Iris-versicolor,Iris-versicolor,0.811352550983,Iris-virginica,0.174092456698,Iris-setosa,0.0145549569279
99,Iris-versicolor,Iris-versicolor,0.82814437151,Iris-setosa,0.109754964709,Iris-virginica,0.0621007122099
13,Iris-setosa,Iris-setosa,0.953728973866,Iris-versicolor,0.0462122149765,Iris-virginica,5.88150469412e-05
70,Iris-versicolor,Iris-versicolor,0.889414310455,Iris-virginica,0.0985938981175,Iris-setosa,0.0119918379933
11,Iris-setosa,Iris-setosa,0.973035275936,Iris-versicolor,0.0268962308764,Iris-virginica,6.85172853991e-05
37,Iris-setosa,Iris-setosa,0.951622664928,Iris-versicolor,0.0482123084366,Iris-virginica,0.000165048244526
69,Iris-versicolor,Iris-virginica,0.52519184351,Iris-versicolor,0.474595069885,Iris-setosa,0.000213060498936
40,Iris-setosa,Iris-setosa,0.966367900372,Iris-versicolor,0.0335618816316,Iris-virginica,7.01119861333e-05
101,Iris-virginica,Ir

In [82]:
!rm -fr {LOCAL_ROOT}/predict_out

In [83]:
sd.local_batch_predict(
  training_ouput_dir=os.path.join(LOCAL_ROOT, 'training'),
  prediction_input_file=os.path.join(LOCAL_ROOT, 'predict.csv'),
  output_dir=os.path.join(LOCAL_ROOT, 'predict_out'),
  output_format='json',
  mode='prediction'
)

Starting local batch prediction.
Local batch prediction done.


In [84]:
!ls {LOCAL_ROOT}/predict_out

errors-00000-of-00001.txt  predictions-00000-of-00001.json


In [85]:
!head {LOCAL_ROOT}/predict_out/predictions*

{"top_2_label": "Iris-virginica","top_3_score": 0.0143332714214921,"top_1_label": "Iris-versicolor","top_2_score": 0.27714982628822327,"top_3_label": "Iris-setosa","top_1_score": 0.7085168957710266,"key_from_input": 107}
{"top_2_label": "Iris-virginica","top_3_score": 0.014554956927895546,"top_1_label": "Iris-versicolor","top_2_score": 0.17409245669841766,"top_3_label": "Iris-setosa","top_1_score": 0.811352550983429,"key_from_input": 100}
{"top_2_label": "Iris-setosa","top_3_score": 0.06210071220993996,"top_1_label": "Iris-versicolor","top_2_score": 0.10975496470928192,"top_3_label": "Iris-virginica","top_1_score": 0.828144371509552,"key_from_input": 99}
{"top_2_label": "Iris-versicolor","top_3_score": 5.881504694116302e-05,"top_1_label": "Iris-setosa","top_2_score": 0.04621221497654915,"top_3_label": "Iris-virginica","top_1_score": 0.953728973865509,"key_from_input": 13}
{"top_2_label": "Iris-virginica","top_3_score": 0.011991837993264198,"top_1_label": "Iris-versicolor","top_2_sc

Clean up
===

As everything was written to LOCAL_ROOT, we can simply remove this folder. If you want to delete those files, uncomment and run the next cell.

In [86]:
#!rm -fr {LOCAL_ROOT}