# About this notebook

This notebook assumes you have ran the local Census Regression notebook and you have not deleted the LOCAL_ROOT folder.In this notebook, we will train a Tensorflow model using the Google Cloud Machine Learning Engine training service. 

# Setting things up

In [1]:
import mltoolbox.regression.dnn as sd

In [2]:
import os
import tensorflow as tf
from tensorflow.python.lib.io import file_io
import datalab.ml as ml

This notebook will write files during training. Please give a root folder you wish to use.

In [3]:
LOCAL_ROOT = './census_regression_workspace' # This should be the same as what was used in the local notebook
CLOUD_ROOT = 'gs://' + datalab_project_id() + '-census-regression-datalab'

# No need to edit anything else in this cell.
LOCAL_PREPROCESSING_DIR = os.path.join(LOCAL_ROOT, 'preprocessing')
CLOUD_PREPROCESSING_DIR = os.path.join(CLOUD_ROOT, 'preprocessing')

CLOUD_TRAINING_DIR = os.path.join(CLOUD_ROOT, 'cloud_training')

LOCAL_TRAIN_FILE = os.path.join(LOCAL_ROOT, 'train.csv')
CLOUD_TRAIN_FILE = os.path.join(CLOUD_ROOT, 'train.csv')

LOCAL_EVAL_FILE = os.path.join(LOCAL_ROOT, 'eval.csv')
CLOUD_EVAL_FILE = os.path.join(CLOUD_ROOT, 'eval.csv')

LOCAL_SCHEMA_FILE = os.path.join(LOCAL_ROOT, 'schema.json')
CLOUD_SCHEMA_FILE = os.path.join(CLOUD_ROOT, 'schema.json')

LOCAL_FEATURES_FILE = os.path.join(LOCAL_ROOT, 'features.json')
CLOUD_FEATURES_FILE = os.path.join(CLOUD_ROOT, 'features.json')

if not file_io.file_exists(LOCAL_ROOT):
  raise ValueError('LOCAL_ROOT not found. Did you run the local notebook?')
  
!gsutil mb {CLOUD_ROOT}

Creating gs://cloud-ml-dev-census-regression-datalab/...
ServiceException: 409 Bucket cloud-ml-dev-census-regression-datalab already exists.


First, let us put the csv files on GCS and the output of preprocessing.

In [4]:
!gsutil -m cp {LOCAL_TRAIN_FILE} {CLOUD_TRAIN_FILE}
!gsutil -m cp {LOCAL_EVAL_FILE} {CLOUD_EVAL_FILE}
!gsutil -m cp {LOCAL_FEATURES_FILE} {CLOUD_FEATURES_FILE}
!gsutil -m cp {LOCAL_SCHEMA_FILE} {CLOUD_SCHEMA_FILE}
!gsutil -m cp -r {LOCAL_PREPROCESSING_DIR} {CLOUD_PREPROCESSING_DIR}

Copying file://./census_regression_workspace/train.csv [Content-Type=text/csv]...
/ [1/1 files][162.9 KiB/162.9 KiB] 100% Done                                    
Operation completed over 1 objects/162.9 KiB.                                    
Copying file://./census_regression_workspace/eval.csv [Content-Type=text/csv]...
/ [1/1 files][ 18.8 KiB/ 18.8 KiB] 100% Done                                    
Operation completed over 1 objects/18.8 KiB.                                     
Copying file://./census_regression_workspace/features.json [Content-Type=application/json]...
/ [1/1 files][  996.0 B/  996.0 B] 100% Done                                    
Operation completed over 1 objects/996.0 B.                                      
Copying file://./census_regression_workspace/schema.json [Content-Type=application/json]...
/ [1/1 files][  998.0 B/  998.0 B] 100% Done                                    
Operation completed over 1 objects/998.0 B.                                      

# Training using the ML Engine

In [5]:
!gsutil -m rm -r {CLOUD_TRAINING_DIR}

Removing gs://cloud-ml-dev-census-regression-datalab/cloud_training/evaluation_model/#1488560966515337...
Removing gs://cloud-ml-dev-census-regression-datalab/cloud_training/evaluation_model/assets.extra/#1488560968322702...
Removing gs://cloud-ml-dev-census-regression-datalab/cloud_training/evaluation_model/assets.extra/schema.json#1488560969263660...
Removing gs://cloud-ml-dev-census-regression-datalab/cloud_training/evaluation_model/assets.extra/transforms.json#1488560970156747...
Removing gs://cloud-ml-dev-census-regression-datalab/cloud_training/evaluation_model/saved_model.pb#1488560967430957...
Removing gs://cloud-ml-dev-census-regression-datalab/cloud_training/evaluation_model/variables/#1488560971530326...
Removing gs://cloud-ml-dev-census-regression-datalab/cloud_training/evaluation_model/variables/variables.data-00000-of-00001#1488560972354468...
Removing gs://cloud-ml-dev-census-regression-datalab/cloud_training/evaluation_model/variables/variables.index#1488560973040578...

In [6]:
train_csv = ml.CsvDataSet(
  file_pattern=CLOUD_TRAIN_FILE,
  schema_file=CLOUD_SCHEMA_FILE)
eval_csv = ml.CsvDataSet(
  file_pattern=CLOUD_EVAL_FILE,
  schema_file=CLOUD_SCHEMA_FILE)

In [7]:
ctc = ml.CloudTrainingConfig(
  region='us-central1',
  scale_tier='STANDARD_1' #See https://cloud.google.com/ml/reference/rest/v1beta1/projects.jobs#ScaleTier
  )

In [9]:
job = sd.train(
  cloud=ctc,
  train_dataset=train_csv,
  eval_dataset=eval_csv,
  features=CLOUD_FEATURES_FILE,
  analysis_output_dir=CLOUD_PREPROCESSING_DIR,
  output_dir=CLOUD_TRAINING_DIR,
  max_steps=2000,
  layer_sizes=[5, 5, 5],
)
job.wait()

Building package and uploading to gs://cloud-ml-dev-census-regression-datalab/cloud_training/staging/sd.tar.gz
Job request send. View status of job at
https://console.developers.google.com/ml/jobs?project=cloud-ml-dev


Job structured_data_train_170303_194419 completed

When training is done, CLOUD_TRAINING_DIRshould contain the folders train, model, evaluation_model, etc.

In [10]:
!gsutil ls  {CLOUD_TRAINING_DIR}

gs://cloud-ml-dev-census-regression-datalab/cloud_training/evaluation_model/
gs://cloud-ml-dev-census-regression-datalab/cloud_training/model/
gs://cloud-ml-dev-census-regression-datalab/cloud_training/staging/
gs://cloud-ml-dev-census-regression-datalab/cloud_training/train/


# Cleaning things up

If you want to delete the files you made on GCS, uncomment and run the next cell.

In [None]:
#!gsutil rm -fr {CLOUD_ROOT}