# About this notebook

This notebook assumes you have ran the local Census Regression notebook and you have not deleted the LOCAL_ROOT folder. In this notebook, we will evaluate a trained model.

# Setting things up

In [1]:
import mltoolbox.regression.dnn as sd

In [2]:
import os
import tensorflow as tf
from tensorflow.python.lib.io import file_io

This notebook will write files during prediction. Please give a root folder you wish to use.

In [3]:
LOCAL_ROOT = './census_regression_workspace' # This should be the same as what was used in the local census notebook
CLOUD_ROOT = 'gs://' + datalab_project_id() + '-census-regression-datalab'

# No need to edit anything else in this cell.
LOCAL_TRAINING_DIR = os.path.join(LOCAL_ROOT, 'training')
CLOUD_TRAINING_DIR = os.path.join(CLOUD_ROOT, 'training')

LOCAL_EVAL_FILE = os.path.join(LOCAL_ROOT, 'eval.csv')
CLOUD_EVAL_FILE = os.path.join(CLOUD_ROOT, 'eval.csv')

CLOUD_BATCH_PREDICTION_DIR = os.path.join(CLOUD_ROOT, 'batch_prediction')
if not file_io.file_exists(LOCAL_ROOT):
  raise ValueError('LOCAL_ROOT not found. Did you run the local notebook?')
  
!gsutil mb {CLOUD_ROOT}

Creating gs://cloud-ml-dev-census-regression-datalab/...
ServiceException: 409 Bucket cloud-ml-dev-census-regression-datalab already exists.


First, let us put the csv files on GCS and the output of training.

In [4]:
!gsutil -m cp {LOCAL_EVAL_FILE} {CLOUD_EVAL_FILE}
!gsutil -m cp -r {LOCAL_TRAINING_DIR} {CLOUD_TRAINING_DIR}

Copying file://./census_regression_workspace/eval.csv [Content-Type=text/csv]...
/ [1/1 files][ 18.8 KiB/ 18.8 KiB] 100% Done                                    
Operation completed over 1 objects/18.8 KiB.                                     
Copying file://./census_regression_workspace/training/model/variables/variables.index [Content-Type=application/octet-stream]...
Copying file://./census_regression_workspace/training/model/variables/variables.data-00000-of-00001 [Content-Type=application/octet-stream]...
Copying file://./census_regression_workspace/training/model/saved_model.pb [Content-Type=application/octet-stream]...
Copying file://./census_regression_workspace/training/model/assets.extra/schema.json [Content-Type=application/json]...
Copying file://./census_regression_workspace/training/features_file.json [Content-Type=application/json]...
Copying file://./census_regression_workspace/training/model/assets.extra/transforms.json [Content-Type=application/json]...
Copying file:/

In [5]:
!gsutil ls {CLOUD_TRAINING_DIR}

gs://cloud-ml-dev-census-regression-datalab/training/features_file.json
gs://cloud-ml-dev-census-regression-datalab/training/evaluation_model/
gs://cloud-ml-dev-census-regression-datalab/training/model/
gs://cloud-ml-dev-census-regression-datalab/training/train/
gs://cloud-ml-dev-census-regression-datalab/training/training/


<a name="local_preprocessing"></a>
ML Engine Batch Prediction
=====

Batch prediction has two modes. In the 'evaluation' mode, the input data is expected to 100% match the training schema, meaning the target column should exist in the data. In 'prediction' mode, the input data files must match the training schema except that the target column is missing. Note that batch prediction can be slow on small datasets because it takes a while for a Dataflow job to start.

In [6]:
!gsutil -m rm -r {CLOUD_BATCH_PREDICTION_DIR}

Removing gs://cloud-ml-dev-census-regression-datalab/batch_prediction/errors-00000-of-00001.txt#1488561009220515...
Removing gs://cloud-ml-dev-census-regression-datalab/batch_prediction/predictions-00000-of-00003.json#1488561020448145...
Removing gs://cloud-ml-dev-census-regression-datalab/batch_prediction/predictions-00001-of-00003.json#1488561020451165...
Removing gs://cloud-ml-dev-census-regression-datalab/batch_prediction/predictions-00002-of-00003.json#1488561020428507...
Removing gs://cloud-ml-dev-census-regression-datalab/batch_prediction/staging/sd.tar.gz#1488560762893455...
Removing gs://cloud-ml-dev-census-regression-datalab/batch_prediction/tmp/staging/structured-data-batch-prediction-20170303170603.1488560763.854491/dataflow_python_sdk.tar#1488560771093169...
Removing gs://cloud-ml-dev-census-regression-datalab/batch_prediction/tmp/staging/structured-data-batch-prediction-20170303170603.1488560763.854491/sd.tar.gz#1488560766969465...
Removing gs://cloud-ml-dev-census-regres

In [7]:
job = sd.batch_predict(
  cloud=True,
  training_output_dir=CLOUD_TRAINING_DIR,
  prediction_input_file=CLOUD_EVAL_FILE,
  output_dir=CLOUD_BATCH_PREDICTION_DIR,
  mode='evaluation',
  output_format='json'
)
job.wait()


Building package and uploading to gs://cloud-ml-dev-census-regression-datalab/batch_prediction/staging/sd.tar.gz
Dataflow Job submitted, see Job structured-data-batch-prediction-20170303194911 at https://console.developers.google.com/dataflow?project=cloud-ml-dev




Job structured-data-batch-prediction-20170303194911 completed

When prediction is done, {CLOUD_ROOT}/batch_prediction should contain the prediction files and an errors file (that should be empty)

In [8]:
!gsutil ls  {CLOUD_BATCH_PREDICTION_DIR}

gs://cloud-ml-dev-census-regression-datalab/batch_prediction/errors-00000-of-00001.txt
gs://cloud-ml-dev-census-regression-datalab/batch_prediction/predictions-00000-of-00003.json
gs://cloud-ml-dev-census-regression-datalab/batch_prediction/predictions-00001-of-00003.json
gs://cloud-ml-dev-census-regression-datalab/batch_prediction/predictions-00002-of-00003.json
gs://cloud-ml-dev-census-regression-datalab/batch_prediction/staging/
gs://cloud-ml-dev-census-regression-datalab/batch_prediction/tmp/


Cleaning things up
=====

If you want to delete the files you made on GCS, uncomment and run the next cell.

In [9]:
#!gsutil rm -fr {CLOUD_ROOT}