# About this notebook

This notebook assumes you have ran the local Iris classification notebook ("1 Local End to End") and you have not deleted the LOCAL_ROOT folder. In this notebook, we will evaluate a trained model.

# Setting things up

In [2]:
import mltoolbox.classification.dnn as sd

In [3]:
import os
import tensorflow as tf
from tensorflow.python.lib.io import file_io

This notebook will write files during prediction. Please give a root folder you wish to use.

In [4]:
LOCAL_ROOT = './iris_notebook_workspace' # This should be the same as what was used in the local notebook
CLOUD_ROOT = 'gs://' + datalab_project_id() + '-iris-classification-datalab' # Feel free to change this line.

# No need to edit anything else in this cell.
LOCAL_TRAINING_DIR = os.path.join(LOCAL_ROOT, 'training')
CLOUD_TRAINING_DIR = os.path.join(CLOUD_ROOT, 'training')

LOCAL_EVAL_FILE = os.path.join(LOCAL_ROOT, 'eval.csv')
CLOUD_EVAL_FILE = os.path.join(CLOUD_ROOT, 'eval.csv')

CLOUD_BATCH_PREDICTION_DIR = os.path.join(CLOUD_ROOT, 'batch_prediction')
if not file_io.file_exists(LOCAL_ROOT):
  raise ValueError('LOCAL_ROOT not found. Did you run the local notebook?')
  
!gsutil mb {CLOUD_ROOT}

Creating gs://cloud-ml-dev-iris-classification-datalab/...
ServiceException: 409 Bucket cloud-ml-dev-iris-classification-datalab already exists.


First, let us put the csv files on GCS and the output of training.

In [5]:
!gsutil -m cp {LOCAL_EVAL_FILE} {CLOUD_EVAL_FILE}
!gsutil -m cp -r {LOCAL_TRAINING_DIR} {CLOUD_TRAINING_DIR}

Copying file://./iris_notebook_workspace/eval.csv [Content-Type=text/csv]...
/ [1/1 files][  973.0 B/  973.0 B] 100% Done                                    
Operation completed over 1 objects/973.0 B.                                      
Copying file://./iris_notebook_workspace/training/features_file.json [Content-Type=application/json]...
Copying file://./iris_notebook_workspace/training/model/saved_model.pb [Content-Type=application/octet-stream]...
Copying file://./iris_notebook_workspace/training/model/variables/variables.index [Content-Type=application/octet-stream]...
Copying file://./iris_notebook_workspace/training/model/variables/variables.data-00000-of-00001 [Content-Type=application/octet-stream]...
Copying file://./iris_notebook_workspace/training/model/assets.extra/schema.json [Content-Type=application/json]...
Copying file://./iris_notebook_workspace/training/model/assets.extra/vocab_flower.csv [Content-Type=text/csv]...
Copying file://./iris_notebook_workspace/training

In [6]:
!gsutil ls {CLOUD_TRAINING_DIR}

gs://cloud-ml-dev-iris-classification-datalab/training/features_file.json
gs://cloud-ml-dev-iris-classification-datalab/training/evaluation_model/
gs://cloud-ml-dev-iris-classification-datalab/training/model/
gs://cloud-ml-dev-iris-classification-datalab/training/train/


# ML Engine Batch Prediction

Batch prediction has two modes. In the 'evaluation' mode, the input data is expected to 100% match the training schema, meaning the target column should exist in the data. In 'prediction' mode, the input data files must match the training schema except that the target column is missing. Note that batch prediction can be slow on small datasets because it takes a while for a Dataflow job to start.

In [7]:
!gsutil -m rm -r {CLOUD_BATCH_PREDICTION_DIR}

CommandException: 1 files/objects could not be removed.


In [8]:
sd.batch_predict(
  cloud=True,
  training_dir=CLOUD_TRAINING_DIR,
  prediction_input_file=CLOUD_EVAL_FILE,
  output_dir=CLOUD_BATCH_PREDICTION_DIR,
  mode='evaluation',
  output_format='json'
)

Building package and uploading to gs://cloud-ml-dev-iris-classification-datalab/batch_prediction/staging/trainer.tar.gz
Dataflow Job submitted, see Job mltoolbox-batch-prediction-20170306205753 at https://console.developers.google.com/dataflow?project=cloud-ml-dev




When prediction is done, {CLOUD_ROOT}/batch_prediction should contain the prediction files and an errors file (that should be empty)

In [9]:
!gsutil ls  {CLOUD_BATCH_PREDICTION_DIR}

gs://cloud-ml-dev-iris-classification-datalab/batch_prediction/errors-00000-of-00001.txt
gs://cloud-ml-dev-iris-classification-datalab/batch_prediction/predictions-00000-of-00003.json
gs://cloud-ml-dev-iris-classification-datalab/batch_prediction/predictions-00001-of-00003.json
gs://cloud-ml-dev-iris-classification-datalab/batch_prediction/predictions-00002-of-00003.json
gs://cloud-ml-dev-iris-classification-datalab/batch_prediction/staging/
gs://cloud-ml-dev-iris-classification-datalab/batch_prediction/tmp/


In [10]:
!gsutil cat {CLOUD_BATCH_PREDICTION_DIR}/errors*

In [11]:
!gsutil cat {CLOUD_BATCH_PREDICTION_DIR}/predictions-00000-of*

{"top_2_label": "Iris-versicolor","top_3_score": 5.717890616484234e-17,"top_1_label": "Iris-setosa","top_2_score": 5.546812644752208e-06,"top_3_label": "Iris-virginica","target_from_input": "Iris-setosa","top_1_score": 0.9999943971633911,"key": 39}
{"top_2_label": "Iris-virginica","top_3_score": 0.00014417635975405574,"top_1_label": "Iris-versicolor","top_2_score": 0.0019425081554800272,"top_3_label": "Iris-setosa","target_from_input": "Iris-versicolor","top_1_score": 0.9979133009910583,"key": 74}
{"top_2_label": "Iris-setosa","top_3_score": 0.00026613849331624806,"top_1_label": "Iris-versicolor","top_2_score": 0.00030752422753721476,"top_3_label": "Iris-virginica","target_from_input": "Iris-versicolor","top_1_score": 0.9994263648986816,"key": 97}
{"top_2_label": "Iris-versicolor","top_3_score": 3.456151843404172e-17,"top_1_label": "Iris-setosa","top_2_score": 1.6586045603617094e-05,"top_3_label": "Iris-virginica","target_from_input": "Iris-setosa","top_1_score": 0.9999834299087524,

# Cleaning things up

If you want to delete the files you made on GCS, uncomment and run the next cell.

In [12]:
#!gsutil rm -fr {CLOUD_ROOT}