# Challenge Exercise, now with hyperparameter tuning via Cloud ML Engine 

Create a neural network that is capable of finding the volume of a cylinder given the radius of its base (r) and its height (h). Assume that the radius and height of the cylinder are both in the range 0.5 to 2.0. Unlike in the challenge exercise for b_estimator.ipynb, assume that your measurements of r, h and V are all rounded off to the nearest 0.1. Simulate the necessary training dataset. This time, you will need a lot more data to get a good predictor.
<p>
Now modify the "noise" so that instead of just rounding off the value, there is up to a 10% error (uniformly distributed) in the measurement followed by rounding off.

## Set up environment

In [17]:
PROJECT = 'cylinders'
BUCKET = 'cylinders'
REGION = 'europe-west1'

In [18]:
# for bash
import os
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION
os.environ['TFVERSION'] = '1.8'  # Tensorflow version

In [19]:
%%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

Updated property [core/project].
Updated property [compute/region].


## Generate cylinders and upload to bucket

In [77]:
%%bash
# Generate cylinders locally (used in local test) and upload to bucket (used in hyperparameter tuning job)
python generate_cylinders.py --filename "cylinders_train.csv" --size 8000
python generate_cylinders.py --filename "cylinders_eval.csv" --size 1000
python generate_cylinders.py --filename "cylinders_test.csv" --size 1000
gsutil cp "cylinders_*.csv" "gs://$BUCKET/"
gsutil cp "generate_cylinders.py" "gs://$BUCKET/"

saved 8000 cylinders to cylinders_train.csv
saved 1000 cylinders to cylinders_eval.csv
saved 1000 cylinders to cylinders_test.csv


Copying file://cylinders_test.csv [Content-Type=text/csv]...
/ [0 files][    0.0 B/ 12.0 KiB]                                                / [1 files][ 12.0 KiB/ 12.0 KiB]                                                Copying file://cylinders_train.csv [Content-Type=text/csv]...
/ [1 files][ 12.0 KiB/107.6 KiB]                                                / [2 files][107.6 KiB/107.6 KiB]                                                Copying file://cylinders_eval.csv [Content-Type=text/csv]...
/ [2 files][107.6 KiB/119.6 KiB]                                                / [3 files][119.6 KiB/119.6 KiB]                                                
Operation completed over 3 objects/119.6 KiB.                                    
Copying file://generate_cylinders.py [Content-Type=text/x-python]...
/ [0 files][    0.0 B/  2.8 KiB]                                                / [1 files][  2.8 KiB/  2.8 KiB]                                                
Operation compl

## Create command-line program

In order to submit to Cloud ML Engine, we need to create a distributed training program. Let's convert our housing example to fit that paradigm, using the Estimators API.

In [21]:
%%bash
rm -rf cylinder_prediction_module
mkdir cylinder_prediction_module
mkdir cylinder_prediction_module/trainer
touch cylinder_prediction_module/trainer/__init__.py

### task.py

In [22]:
%%writefile cylinder_prediction_module/trainer/task.py
import argparse
import os
import json
import shutil

from . import model
    
if __name__ == '__main__' and 'get_ipython' not in dir():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--learning_rate',
        type = float, 
        default = 0.01
    )
    parser.add_argument(
        '--batch_size',
        type = int, 
        default = 30
    )
    parser.add_argument(
        '--output_dir',
        help = 'GCS location to write checkpoints and export models.',
        required = True
    )
    parser.add_argument(
        '--job-dir',
        help = 'this model ignores this field, but it is required by gcloud',
        default = 'junk'
    )
    args = parser.parse_args()
    arguments = args.__dict__

    # Unused args provided by service
    arguments.pop('job_dir', None)
    arguments.pop('job-dir', None)

    # Append trial_id to path if we are doing hptuning
    # This code can be removed if you are not using hyperparameter tuning
    arguments['output_dir'] = os.path.join(
        arguments['output_dir'],
        json.loads(
            os.environ.get('TF_CONFIG', '{}')
        ).get('task', {}).get('trial', '')
    )

    # Run the training
    shutil.rmtree(arguments['output_dir'], ignore_errors=True) # start fresh each time

    # Pass the command line arguments to our model's train_and_evaluate function
    model.train_and_evaluate(arguments)

Writing cylinder_prediction_module/trainer/task.py


### model.py

In [71]:
%%writefile cylinder_prediction_module/trainer/model.py
from io import BytesIO
import numpy as np
import pandas as pd
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)

# Read cylinder CSVs from bucket
import generate_cylinders
traindf = generate_cylinders.generate_cylinder_df(8000)
evaldf = generate_cylinders.generate_cylinder_df(2000)

# Train and eval input functions
def train_input_fn(df, batch_size):
    return tf.estimator.inputs.pandas_input_fn(
        x = traindf,
        y = traindf['volume'],
        num_epochs = None,
        batch_size = batch_size,
        shuffle = True)

def eval_input_fn(df, batch_size):
    return tf.estimator.inputs.pandas_input_fn(
        x = evaldf,
        y = evaldf['volume'],
        num_epochs = 1,
        batch_size = batch_size,
        shuffle = False)

# Define feature columns
features = [
    tf.feature_column.numeric_column(key='radius', dtype=tf.float64),
    tf.feature_column.numeric_column(key='height', dtype=tf.float64)]

def train_and_evaluate(args):
    # Compute appropriate number of steps.
    num_steps = (len(traindf) / args['batch_size']) / args['learning_rate']
    # Thus, if learning_rate = 0.01, hundred epochs

    # Create custom optimizer
    myopt = tf.train.FtrlOptimizer(learning_rate=args['learning_rate'])

    # Create rest of the estimator as usual
    estimator = tf.estimator.LinearRegressor(
        model_dir = args['output_dir'], 
        feature_columns = features, 
        optimizer = myopt)
    
    #Add rmse evaluation metric
    def rmse(labels, predictions):
        pred_values = tf.cast(predictions['predictions'], tf.float64)
        return {'rmse': tf.metrics.root_mean_squared_error(labels, pred_values)}

    estimator = tf.contrib.estimator.add_metrics(estimator, rmse)

    train_spec = tf.estimator.TrainSpec(
        input_fn = train_input_fn(df = traindf, batch_size = args['batch_size']),
        max_steps = num_steps)

    eval_spec = tf.estimator.EvalSpec(
        input_fn = eval_input_fn(df = evaldf, batch_size = len(evaldf)),
        steps = None)

    tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

Overwriting cylinder_prediction_module/trainer/model.py


## Monitoring with TensorBoard
Use "refresh" in Tensorboard during training to see progress.

In [72]:
from google.datalab.ml import TensorBoard
OUTDIR = './cylinders_trained'
TensorBoard().start(OUTDIR)

7645

### Train the model locally to see if everything works

In [73]:
%%bash
rm -rf cylinders_trained
export PYTHONPATH=${PYTHONPATH}:${PWD}/cylinder_prediction_module
gcloud ml-engine local train \
    --module-name=trainer.task \
    --job-dir=cylinders_trained \
    --package-path=$(pwd)/trainer \
    -- \
    --batch_size=30 \
    --learning_rate=0.02 \
    --output_dir=cylinders_trained

  from ._conv import register_converters as _register_converters
INFO:tensorflow:TF_CONFIG environment variable: {'cluster': {}, 'environment': 'cloud', 'job': {'args': ['--batch_size=30', '--learning_rate=0.02', '--output_dir=cylinders_trained', '--job-dir', 'cylinders_trained'], 'job_name': 'trainer.task'}, 'task': {}}
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_steps': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_global_id_in_cluster': 0, '_session_config': None, '_keep_checkpoint_max': 5, '_save_summary_steps': 100, '_train_distribute': None, '_is_chief': True, '_evaluation_master': '', '_task_type': 'worker', '_num_worker_replicas': 1, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f54f5e7bda0>, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_master': '', '_service': None, '_task_id': 0, '_model_dir': 'cylinders_trained/', '_save_checkpoints_secs': 600}
INFO:tensorflow

### Shut TensorBoard down

In [74]:
pids_df = TensorBoard.list()
if not pids_df.empty:
    for pid in pids_df['pid']:
        TensorBoard().stop(pid)
        print('Stopped TensorBoard with pid {}'.format(pid))

Stopped TensorBoard with pid 7608
Stopped TensorBoard with pid 7645


## Create hyperparam.yaml

In [55]:
%%writefile hyperparam.yaml
trainingInput:
    hyperparameters:
        goal: MINIMIZE
        maxTrials: 16
        maxParallelTrials: 2
        hyperparameterMetricTag: rmse
        params:
        - parameterName: batch_size
          type: INTEGER
          minValue: 8
          maxValue: 64
          scaleType: UNIT_LINEAR_SCALE
        - parameterName: learning_rate
          type: DOUBLE
          minValue: 0.01
          maxValue: 0.1
          scaleType: UNIT_LOG_SCALE

Writing hyperparam.yaml


## Submit the hyperparameter tuning job to Cloud ML Engine

In [78]:
%%bash
OUTDIR=gs://${BUCKET}/cylinders_trained   # CHANGE bucket name appropriately
gsutil rm -rf $OUTDIR
export PYTHONPATH=${PYTHONPATH}:${PWD}/cylinder_prediction_module
gcloud ml-engine jobs submit training cylinders_$(date -u +%y%m%d_%H%M%S) \
    --config=hyperparam.yaml \
    --module-name=trainer.task \
    --package-path=$(pwd)/cylinder_prediction_module/trainer \
    --job-dir=$OUTDIR \
    --runtime-version=$TFVERSION \
    --\
    --output_dir=$OUTDIR \

jobId: cylinders_190314_141023
state: QUEUED


Removing gs://cylinders/cylinders_trained/packages/7ecd38714b6a9e928ec1068e385f7e14c7de6db48d86fa031a728edc445797d0/trainer-0.0.0.tar.gz#1552572021789591...
/ [1 objects]                                                                   
Operation completed over 1 objects.                                              
Job [cylinders_190314_141023] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ml-engine jobs describe cylinders_190314_141023

or continue streaming the logs with the command

  $ gcloud ml-engine jobs stream-logs cylinders_190314_141023


### Monitor the job

In [79]:
!gcloud ml-engine jobs describe cylinders_190314_141023  # Change jobId to what the previous cell printed

createTime: '2019-03-14T14:10:26Z'
etag: YgrWrjIYK_8=
jobId: cylinders_190314_141023
startTime: '2019-03-14T14:10:28Z'
state: RUNNING
trainingInput:
  args:
  - --output_dir=gs://cylinders/cylinders_trained
  hyperparameters:
    goal: MINIMIZE
    hyperparameterMetricTag: rmse
    maxParallelTrials: 2
    maxTrials: 16
    params:
    - maxValue: 64.0
      minValue: 8.0
      parameterName: batch_size
      scaleType: UNIT_LINEAR_SCALE
      type: INTEGER
    - maxValue: 0.1
      minValue: 0.01
      parameterName: learning_rate
      scaleType: UNIT_LOG_SCALE
      type: DOUBLE
  jobDir: gs://cylinders/cylinders_trained
  packageUris:
  - gs://cylinders/cylinders_trained/packages/a7c1b89ca7bf4f1154a0bf147a640e93d0cb6dfcb6ce32daf10858b63358073b/trainer-0.0.0.tar.gz
  pythonModule: trainer.task
  region: europe-west1
  runtimeVersion: '1.8'
trainingOutput:
  isHyperparameterTuningJob: true

View job in the Cloud Console at:
https://console.cloud.google.com/ml/jobs/cylinders_190314_14