<a name="about"></a>
About this notebook
======

This notebook assumes you have ran the local Iris notebook and you have not deleted the LOCAL_ROOT folder. In this notebook, we will use batch prediction on a pre-trained Tensorflow model using Google Cloud Machine Learning Engine services. This notebook will does not assume that the notebook "4. Iris Classification Cloud Prediction" was executed.

<a name="setup"></a>
Setting things up
=====

In [1]:
import datalab_structured_data as sd

Lets look at the versions of structured_data and TF we have. Make sure TF is 1.0.0, and SD is 0.0.1.

In [2]:
import os

import tensorflow as tf
from tensorflow.python.lib.io import file_io

print('tf ' + str(tf.__version__))
print('sd ' + str(sd.__version__))

tf 1.0.0
sd 0.0.1


This notebook will write files during prediction. Please give a root folder you wish to use.

In [7]:
LOCAL_ROOT = './iris_notebook_workspace' # This should be the same as what was used in the local iris notebook
CLOUD_ROOT = 'gs://' + datalab_project_id() + 'iris-classification-datalab'

if not file_io.file_exists(LOCAL_ROOT):
  raise ValueError('LOCAL_ROOT not found. Did you run the local notebook?')
!gsutil mb {CLOUD_ROOT}

Creating gs://cloud-ml-deviris-classification-datalab/...
ServiceException: 409 Bucket cloud-ml-deviris-classification-datalab already exists.


First, let us put the csv files on GCS and the output of training.

In [3]:
!gsutil -m cp {os.path.join(LOCAL_ROOT, '*_data.csv')} {CLOUD_ROOT}
!gsutil -m cp -r {os.path.join(LOCAL_ROOT, 'training')} {CLOUD_ROOT}

/bin/sh: 1: Syntax error: "(" unexpected
/bin/sh: 1: Syntax error: "(" unexpected


In [4]:
!gsutil ls {CLOUD_ROOT}/training

CommandException: "ls" command does not support "file://" URLs. Did you mean to use a gs:// URL?


<a name="local_preprocessing"></a>
ML Engine Batch Prediction
=====

Batch prediction has two modes. In the 'evaluation' mode, the input data is expected to 100% match the training schema, meaning the target column should exist in the data. In 'prediction' mode, the input data files must match the training schema except that the target column is missing. Note that batch prediction can be slow on small datasets because it takes a while for a Dataflow job to start.

In [5]:
!gsutil -m rm -r {CLOUD_ROOT}/batch_prediction

CommandException: "rm" command does not support "file://" URLs. Did you mean to use a gs:// URL?


In [8]:
sd.cloud_batch_predict(
  training_ouput_dir=os.path.join(CLOUD_ROOT, 'training'),
  prediction_input_file=os.path.join(CLOUD_ROOT, 'eval.csv'),
  output_dir=str(os.path.join(CLOUD_ROOT, 'batch_prediction')),
  mode='evaluation',
  output_format='json'
)


Building package and uploading to gs://cloud-ml-deviris-classification-datalab/batch_prediction/staging/sd.tar.gz
Starting cloud batch prediction.
Dataflow Job submitted, see Job structured-data-batch-prediction-20170224184750 at https://console.developers.google.com/dataflow?project=cloud-ml-dev




See above link for job status.


When prediction is done, {CLOUD_ROOT}/batch_prediction should contain the prediction files and an errors file (that should be empty)

In [10]:
!gsutil ls  {CLOUD_ROOT}/batch_prediction

gs://cloud-ml-deviris-classification-datalab/batch_prediction/errors-00000-of-00001.txt
gs://cloud-ml-deviris-classification-datalab/batch_prediction/predictions-00000-of-00003.json
gs://cloud-ml-deviris-classification-datalab/batch_prediction/predictions-00001-of-00003.json
gs://cloud-ml-deviris-classification-datalab/batch_prediction/predictions-00002-of-00003.json
gs://cloud-ml-deviris-classification-datalab/batch_prediction/staging/
gs://cloud-ml-deviris-classification-datalab/batch_prediction/tmp/


In [12]:
!gsutil cat gs://cloud-ml-deviris-classification-datalab/batch_prediction/errors-00000-of-00001.txt

In [13]:
!gsutil cat gs://cloud-ml-deviris-classification-datalab/batch_prediction/predictions-00000-of-00003.json

{"top_2_label": "Iris-versicolor","top_3_score": 2.0761335690622218e-05,"top_1_label": "Iris-setosa","top_2_score": 0.017889974638819695,"top_3_label": "Iris-virginica","target_from_input": "Iris-setosa","top_1_score": 0.9820892810821533,"key_from_input": 39}
{"top_2_label": "Iris-virginica","top_3_score": 0.002998097101226449,"top_1_label": "Iris-versicolor","top_2_score": 0.17065127193927765,"top_3_label": "Iris-setosa","target_from_input": "Iris-versicolor","top_1_score": 0.8263506889343262,"key_from_input": 74}
{"top_2_label": "Iris-virginica","top_3_score": 0.017729466781020164,"top_1_label": "Iris-versicolor","top_2_score": 0.161696195602417,"top_3_label": "Iris-setosa","target_from_input": "Iris-versicolor","top_1_score": 0.8205742835998535,"key_from_input": 97}
{"top_2_label": "Iris-versicolor","top_3_score": 6.431270594475791e-05,"top_1_label": "Iris-setosa","top_2_score": 0.0320427305996418,"top_3_label": "Iris-virginica","target_from_input": "Iris-setosa","top_1_score": 0

Cleaning things up
=====

If you want to delete the files you made on GCS, uncomment and run the next cell.

In [None]:
#!gsutil rm -fr {CLOUD_ROOT}