# About this notebook


This notebook assumes you have ran the local Iris classification notebook ("1 Local End to End") and you have not deleted the LOCAL_ROOT folder. In this notebook, we will use the ML Engine to train a model.

<a name="setup"></a>
Setting things up
=====

In [2]:
import mltoolbox.classification.dnn as sd

No handlers could be found for logger "oauth2client.contrib.multistore_file"


Lets look at the versions of datalab_structured_data and TF we have. Make sure TF and SD are 1.0.0

In [3]:
import os
import tensorflow as tf
from tensorflow.python.lib.io import file_io
import datalab.ml as ml

This notebook will write files during training. Please give a root folder you wish to use.

In [4]:
LOCAL_ROOT = './iris_notebook_workspace' # This should be the same as what was used in the local notebook
CLOUD_ROOT = 'gs://' + datalab_project_id() + '-iris-classification-datalab' # Feel free to change this line.

# No need to edit anything else in this cell.
LOCAL_PREPROCESSING_DIR = os.path.join(LOCAL_ROOT, 'preprocessing')
CLOUD_PREPROCESSING_DIR = os.path.join(CLOUD_ROOT, 'preprocessing')

CLOUD_TRAINING_DIR = os.path.join(CLOUD_ROOT, 'cloud_training')

LOCAL_TRAIN_FILE = os.path.join(LOCAL_ROOT, 'train.csv')
CLOUD_TRAIN_FILE = os.path.join(CLOUD_ROOT, 'train.csv')

LOCAL_EVAL_FILE = os.path.join(LOCAL_ROOT, 'eval.csv')
CLOUD_EVAL_FILE = os.path.join(CLOUD_ROOT, 'eval.csv')

LOCAL_SCHEMA_FILE = os.path.join(LOCAL_ROOT, 'schema.json')
CLOUD_SCHEMA_FILE = os.path.join(CLOUD_ROOT, 'schema.json')

LOCAL_FEATURES_FILE = os.path.join(LOCAL_ROOT, 'features.json')
CLOUD_FEATURES_FILE = os.path.join(CLOUD_ROOT, 'features.json')

if not file_io.file_exists(LOCAL_ROOT):
  raise ValueError('LOCAL_ROOT not found. Did you run the local notebook?')
  
!gsutil mb {CLOUD_ROOT}

Creating gs://cloud-ml-dev-iris-classification-datalab/...
ServiceException: 409 Bucket cloud-ml-dev-iris-classification-datalab already exists.


First, let us put the csv files on GCS and the output of preprocessing.

In [5]:
!gsutil -m cp {LOCAL_TRAIN_FILE} {CLOUD_TRAIN_FILE}
!gsutil -m cp {LOCAL_EVAL_FILE} {CLOUD_EVAL_FILE}
!gsutil -m cp {LOCAL_FEATURES_FILE} {CLOUD_FEATURES_FILE}
!gsutil -m cp {LOCAL_SCHEMA_FILE} {CLOUD_SCHEMA_FILE}
!gsutil -m cp -r {LOCAL_PREPROCESSING_DIR} {CLOUD_PREPROCESSING_DIR}

Copying file://./iris_notebook_workspace/train.csv [Content-Type=text/csv]...
/ [1/1 files][  3.8 KiB/  3.8 KiB] 100% Done                                    
Operation completed over 1 objects/3.8 KiB.                                      
Copying file://./iris_notebook_workspace/eval.csv [Content-Type=text/csv]...
/ [1/1 files][  973.0 B/  973.0 B] 100% Done                                    
Operation completed over 1 objects/973.0 B.                                      
Copying file://./iris_notebook_workspace/features.json [Content-Type=application/json]...
/ [1/1 files][  188.0 B/  188.0 B] 100% Done                                    
Operation completed over 1 objects/188.0 B.                                      
Copying file://./iris_notebook_workspace/schema.json [Content-Type=application/json]...
/ [1/1 files][  341.0 B/  341.0 B] 100% Done                                    
Operation completed over 1 objects/341.0 B.                                      
Copying file://

In [6]:
!gsutil ls {CLOUD_ROOT}

gs://cloud-ml-dev-iris-classification-datalab/eval.csv
gs://cloud-ml-dev-iris-classification-datalab/features.json
gs://cloud-ml-dev-iris-classification-datalab/schema.json
gs://cloud-ml-dev-iris-classification-datalab/train.csv
gs://cloud-ml-dev-iris-classification-datalab/batch_prediction/
gs://cloud-ml-dev-iris-classification-datalab/cloud_preprocessing/
gs://cloud-ml-dev-iris-classification-datalab/cloud_training/
gs://cloud-ml-dev-iris-classification-datalab/preprocessing/
gs://cloud-ml-dev-iris-classification-datalab/training/


# Training using the ML Engine

In [7]:
!gsutil -m rm -r {CLOUD_TRAINING_DIR}

Removing gs://cloud-ml-dev-iris-classification-datalab/cloud_training/staging/sd.tar.gz#1488501008972373...
/ [1/1 objects] 100% Done                                                       
Operation completed over 1 objects.                                              


In [8]:
train_csv = ml.CsvDataSet(
  file_pattern=CLOUD_TRAIN_FILE,
  schema_file=CLOUD_SCHEMA_FILE)
eval_csv = ml.CsvDataSet(
  file_pattern=CLOUD_EVAL_FILE,
  schema_file=CLOUD_SCHEMA_FILE)

In [9]:
ctc = ml.CloudTrainingConfig(
  region='us-central1',
  scale_tier='STANDARD_1' #See https://cloud.google.com/ml/reference/rest/v1beta1/projects.jobs#ScaleTier
  )

In [10]:
job = sd.train(
  train_dataset=train_csv,
  eval_dataset=eval_csv,
  features=CLOUD_FEATURES_FILE,
  analysis_output_dir=CLOUD_PREPROCESSING_DIR,
  output_dir=CLOUD_TRAINING_DIR,
  max_steps=2000,
  layer_sizes=[5, 3, 2],
  cloud=ctc,
)
job.wait()

/usr/local/lib/python2.7/dist-packages/mltoolbox/_structured_data/_package.pyc
proot /usr/local/lib/python2.7/dist-packages
setup_path /usr/local/lib/python2.7/dist-packages/mltoolbox/_structured_data/master_setup.py
Building package and uploading to gs://cloud-ml-dev-iris-classification-datalab/cloud_training/staging/sd.tar.gz
Job request send. View status of job at
https://console.developers.google.com/ml/jobs?project=cloud-ml-dev


Job structured_data_train_170303_004534 completed

When training is done, {CLOUD_TRAINING_DIR} should contain the folders train, model, evaluation_model, etc.

In [13]:
!gsutil ls  {CLOUD_TRAINING_DIR}

gs://cloud-ml-dev-iris-classification-datalab/cloud_training/evaluation_model/
gs://cloud-ml-dev-iris-classification-datalab/cloud_training/model/
gs://cloud-ml-dev-iris-classification-datalab/cloud_training/staging/
gs://cloud-ml-dev-iris-classification-datalab/cloud_training/train/


# Cleaning things up

If you want to delete the files you made on GCS, uncomment and run the next cell.

In [14]:
#!gsutil rm -fr {CLOUD_ROOT}