# Submitting Training jobs to Google Cloud ML Engine and Serving that model for predictions

The notebook goes through step by step process to create, train and deploy your custom estimator models to [Google Cloud ML-Engine](https://cloud.google.com/ml-engine/)

# Submitting Training Jobs
## Part 0 --> Getting the data
The dataset I am using is Kaggle's House Price Prediction dataset. You can download the dataset from [here](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)

## Have a look at the data and the preprocessing steps
The data file is currently located in my local disk, so I'll be using that for demonstrating the steps. The steps that I show here will be used while creating the python package

In [46]:
import pandas as pd
from sklearn.preprocessing import StandardScaler, PowerTransformer
path = 'kaggle_housing_prices.csv'

df_train = pd.read_csv(path)
df_train = df_train[df_train.columns.difference(['Id'])]
df_train = df_train.fillna(0)
df_train = pd.get_dummies(df_train)

X = df_train[df_train.columns.difference(['SalePrice'])].values
y = df_train[['SalePrice']].values

pt_X = PowerTransformer(method='yeo-johnson', standardize=False)
sc_y = StandardScaler()
sc_X = StandardScaler()
y = sc_y.fit_transform(y)
X = sc_X.fit_transform(X)

print(X[:5], X.shape)
print()
print(y[:5], y.shape)

[[-0.79343379  1.16185159 -0.11633929 ...  1.05099379  0.87866809
   0.13877749]
 [ 0.25714043 -0.79516323 -0.11633929 ...  0.15673371 -0.42957697
  -0.61443862]
 [-0.62782603  1.18935062 -0.11633929 ...  0.9847523   0.83021457
   0.13877749]
 [-0.52173356  0.93727612 -0.11633929 ... -1.86363165 -0.72029809
  -1.36765473]
 [-0.04561126  1.61787729 -0.11633929 ...  0.95163156  0.73330753
   0.13877749]] (1460, 304)

[[ 0.34727322]
 [ 0.00728832]
 [ 0.53615372]
 [-0.51528106]
 [ 0.8698426 ]] (1460, 1)




## Part 1 --> Creating The Python Package
The file structure of your python package will look like this

In [2]:
!tree trainer/

trainer/
├── __init__.py
├── model.py
└── task.py

0 directories, 3 files


1. model.py will contain functions defining your model architecture, input function and serving input function. (Mandatory)
2. task.py is where you define your hyperparameters, estimator specs, data preprocessing steps and all other necessary functions required to create the dataset. It is optional to create this file, but it keeps the code clean and manageable. (Optional)

## model.py
In this file, minimum of 3 functions are needed to be defined. They are -->
1. model_fn() or keras_estimator(). For the sake of this example, I am using keras_estimator(). the function will return estimator instance of your compiled keras model.
2. input_fn() which is used for passing input to your model. it returns a dataset iterator object. 
3. serving_input_fn() defines the features to be passed to the model during inference, for ex-> TensorServingInputReceiver object

In [3]:
%cat trainer/model.py

import tensorflow as tf
from tensorflow.keras.layers import Dense
from tensorflow.python.keras import models
from tensorflow.keras.optimizers import Adam
tf.logging.set_verbosity(tf.logging.INFO)

def keras_estimator(model_dir, config, params):
    model = models.Sequential()
    model.add(Dense(units=480, kernel_initializer='random_uniform', activation= 'relu',
    			input_shape=(params['num_features'],)))
    model.add(Dense(units=480, kernel_initializer='random_uniform', activation= 'relu'))
    model.add(Dense(units=10, kernel_initializer='random_uniform', activation= 'relu'))
    model.add(Dense(units=1, kernel_initializer='random_uniform', activation= 'elu'))
    optimizer = Adam(lr=0.0015, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
    model.compile(optimizer=optimizer, loss='mean_squared_logarithmic_error', metrics=['mse'])
    print('####', type(tf.k))
    return tf.keras.estimator.model_to_estimator(keras_model=model, model_dir=model_di

## task.py
1. Reads and parses model parameters, like location of the training data and output model, # hidden layers, batch size, etc.
2. loads data from the location specified and applies preprocessing logic.
3. Calls the model training logic located in model.py with said parameters.

In [4]:
%cat trainer/task.py

import argparse
from . import model
import numpy as np
import tensorflow as tf
import os
import subprocess
from sklearn.preprocessing import PowerTransformer, StandardScaler
import pandas as pd


WORKING_DIR = os.getcwd()
TEMP_DIR = 'tmp/'
DATA_FILE_NAME = 'kaggle_housing_prices.csv'


def download_files_from_gcs(source, destination):
    local_file_names = [destination]
    gcs_input_paths = [source]
    
    raw_local_files_data_paths = [os.path.join(WORKING_DIR, local_file_name) for local_file_name in local_file_names]
    for i, gcs_input_path in enumerate(gcs_input_paths):
        if gcs_input_path:
            subprocess.check_call(['gsutil', 'cp', gcs_input_path, raw_local_files_data_paths[i]])
    
    return raw_local_files_data_paths
    

def load_data(path='kaggle_housing_prices.csv', test_split=0.2, seed=113):
    assert 0 <= test_split < 1
    if not path:
        raise ValueError('No dataset file defined')

    if path.startswith('gs://'):

## setup.py
There is one more file you need to have if you are using some python libraries which are not a part of standard library. The file is setup.py. In our case, we need a specific verison of scikit-learn, so we will define that in the setup.py file.A setup.py configuration file will look like this

In [24]:
%cat setup.py

from setuptools import find_packages
from setuptools import setup

REQUIRED_PACKAGES = [
  'scikit-learn==0.20.1',
]

setup(
    name='trainer',
    version='0.1',
    install_requires=REQUIRED_PACKAGES,
    packages=find_packages(),
    include_package_data=True,
    requires=[]
)


## Part 2 --> Training
You have 2 options for training your model. Either you train it locally or you can create a training job on cloud-ml engine. I will demonstrate both

### 1. Training locally

In [14]:
#Define some environment variable for our training job
%env JOB_DIR house_pricing_training_job/
!rm -rf $JOB_DIR
%env TRAIN_FILE kaggle_housing_prices.csv

env: JOB_DIR=house_pricing_training_job/
env: TRAIN_FILE=kaggle_housing_prices.csv


In [15]:
#Starting the local training job
!gcloud ml-engine local train --module-name=trainer.task \
 --package-path=trainer \
 --train-file=$TRAIN_FILE \
 --job-dir=$JOB_DIR

INFO:tensorflow:TF_CONFIG environment variable: {'environment': 'cloud', 'cluster': {}, 'job': {'args': ['--train-file=kaggle_housing_prices.csv', '--job-dir=house_pricing_training_job/'], 'job_name': 'trainer.task'}, 'task': {}}
INFO:tensorflow:Using the Keras model provided.
2018-12-27 15:12:17.641337: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-12-27 15:12:17.641633: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.
INFO:tensorflow:Using config: {'_model_dir': 'house_pricing_training_job/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_che

INFO:tensorflow:SavedModel written to: house_pricing_training_job/export/exporter/temp-b'1545903759'/saved_model.pb
INFO:tensorflow:global_step/sec: 26.2791
INFO:tensorflow:loss = 0.00061364105, step = 501 (3.806 sec)
INFO:tensorflow:global_step/sec: 44.0313
INFO:tensorflow:loss = 0.00054410036, step = 601 (2.271 sec)
INFO:tensorflow:global_step/sec: 45.518
INFO:tensorflow:loss = 0.00043586182, step = 701 (2.197 sec)
INFO:tensorflow:global_step/sec: 46.0551
INFO:tensorflow:loss = 0.00010561759, step = 801 (2.171 sec)
INFO:tensorflow:global_step/sec: 46.5945
INFO:tensorflow:loss = 0.00031923706, step = 901 (2.146 sec)
INFO:tensorflow:Saving checkpoints for 1000 into house_pricing_training_job/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-12-27-09:42:51
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from house_pricing_training_job/model.ckpt-1000
INFO:tensorflow:Running local_ini

INFO:tensorflow:Saving checkpoints for 3000 into house_pricing_training_job/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-12-27-09:43:41
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from house_pricing_training_job/model.ckpt-3000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-12-27-09:43:41
INFO:tensorflow:Saving dict for global step 3000: global_step = 3000, loss = 0.06754769, mse = 1.0110337
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 3000: house_pricing_training_job/model.ckpt-3000
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['serving_default']
INFO:tensorflow:

INFO:tensorflow:Restoring parameters from house_pricing_training_job/model.ckpt-4563
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: house_pricing_training_job/export/exporter/temp-b'1545903862'/saved_model.pb
INFO:tensorflow:Loss for final step: 9.165338e-05.


When the job finishes, it will create a directory named house_pricing_training_job which we previously defined as our JOB_DIR. The directory will contain your model checkpoint, graph signature and tensorflow SavedModel which will be written to JOB_DIR/export/exporter/. 
If you want to check the model spec and model signature definition, you can navigate to that directory and use saved_model_cli. You can refer this [blog](https://medium.com/@yuu.ishikawa/how-to-show-signatures-of-tensorflow-saved-model-5ac56cf1960f) to know how to install that.

The cell below demonstrates the same thing.

In [25]:
%cd house_pricing_training_job/export/exporter/1545903862/
!saved_model_cli show --dir ./ --all

/Users/monarkunadkat/Desktop/Searce/cloudml-samples/boston/tensorflow/keras/house_pricing_training_job/export/exporter/1545903862

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 304)
        name: Placeholder:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['dense_3'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: dense_3/Elu:0
  Method name is: tensorflow/serving/predict


Similarly you can submit a training job to cloud ml-engine. To do that, you will first need to upload your training file to GCS

### 2. Training on Google Cloud ML-Engine

In [16]:
!gsutil cp kaggle_housing_prices.csv gs://cloud-ml-data-storage-bucket

Copying file://kaggle_housing_prices.csv [Content-Type=text/csv]...
\ [1 files][449.9 KiB/449.9 KiB]                                                
Operation completed over 1 objects/449.9 KiB.                                    


In [21]:
#Defining Environment Variable
%env JOB_NAME housing_job_3
%env GCS_JOB_DIR gs://cloud-ml-job-bucket
%env TRAIN_FILE gs://cloud-ml-data-storage-bucket/kaggle_housing_prices.csv
%env REGION us-central1

env: JOB_NAME=housing_job_3
env: GCS_JOB_DIR=gs://cloud-ml-job-bucket
env: TRAIN_FILE=gs://cloud-ml-data-storage-bucket/kaggle_housing_prices.csv
env: REGION=us-central1


As soon as you run the below command, a training job will be submitted to cloud ml-engine. you can see the logs in the terminal itself or you can login to your Google Cloud console to see the logs.

In [22]:
#Run in Google Cloud ML Engine:
!gcloud ml-engine jobs submit training $JOB_NAME \
 --stream-logs \
 --runtime-version 1.10 \
 --job-dir=$GCS_JOB_DIR \
 --package-path=trainer \
 --module-name trainer.task \
 --region $REGION -- \
 --train-file=$TRAIN_FILE

Job [housing_job_3] submitted successfully.
INFO	2018-12-27 15:35:06 +0530	service		Validating job requirements...
INFO	2018-12-27 15:35:06 +0530	service		Job creation request has been successfully validated.
INFO	2018-12-27 15:35:07 +0530	service		Job housing_job_3 is queued.
INFO	2018-12-27 15:35:07 +0530	service		Waiting for job to be provisioned.
INFO	2018-12-27 15:35:13 +0530	service		Waiting for training program to start.
INFO	2018-12-27 15:36:05 +0530	master-replica-0		Running task with arguments: --cluster={"master": ["127.0.0.1:2222"]} --task={"type": "master", "index": 0} --job={  "package_uris": ["gs://cloud-ml-job-bucket/packages/d1c80a5dcb380377a509f9f22b9343fb946a7860a34a4ab508c59738a85d2dec/trainer-0.1.tar.gz"],  "python_module": "trainer.task",  "args": ["--train-file\u003dgs://cloud-ml-data-storage-bucket/kaggle_housing_prices.csv"],  "region": "us-central1",  "runtime_version": "1.10",  "job_dir": "gs://cloud-ml-job-bucket/",  "run_on_raw_vm": true}
INFO	2018-12-27 15

INFO	2018-12-27 15:37:01 +0530	master-replica-0		Finished evaluation at 2018-12-27-10:07:01
INFO	2018-12-27 15:37:01 +0530	master-replica-0		Saving dict for global step 500: global_step = 500, loss = 0.1984755, mse = 1.9340057
INFO	2018-12-27 15:37:02 +0530	master-replica-0		Saving 'checkpoint_path' summary for global step 500: gs://cloud-ml-job-bucket/model.ckpt-500
INFO	2018-12-27 15:37:04 +0530	master-replica-0		Calling model_fn.
INFO	2018-12-27 15:37:04 +0530	master-replica-0		Done calling model_fn.
INFO	2018-12-27 15:37:04 +0530	master-replica-0		Signatures INCLUDED in export for Eval: None
INFO	2018-12-27 15:37:04 +0530	master-replica-0		Signatures INCLUDED in export for Classify: None
INFO	2018-12-27 15:37:04 +0530	master-replica-0		Signatures INCLUDED in export for Regress: None
INFO	2018-12-27 15:37:04 +0530	master-replica-0		Signatures INCLUDED in export for Predict: ['serving_default']
INFO	2018-12-27 15:37:04 +0530	master-replica-0		Signatures INCLUDED in export for Train: 

INFO	2018-12-27 15:37:53 +0530	master-replica-0		Calling model_fn.
INFO	2018-12-27 15:37:53 +0530	master-replica-0		Done calling model_fn.
INFO	2018-12-27 15:37:53 +0530	master-replica-0		Starting evaluation at 2018-12-27-10:07:53
INFO	2018-12-27 15:37:54 +0530	master-replica-0		Graph was finalized.
INFO	2018-12-27 15:37:54 +0530	master-replica-0		Restoring parameters from gs://cloud-ml-job-bucket/model.ckpt-2000
INFO	2018-12-27 15:37:54 +0530	master-replica-0		Running local_init_op.
INFO	2018-12-27 15:37:54 +0530	master-replica-0		Done running local_init_op.
INFO	2018-12-27 15:37:54 +0530	master-replica-0		Finished evaluation at 2018-12-27-10:07:54
INFO	2018-12-27 15:37:54 +0530	master-replica-0		Saving dict for global step 2000: global_step = 2000, loss = 0.1984755, mse = 1.9340057
INFO	2018-12-27 15:37:55 +0530	master-replica-0		Saving 'checkpoint_path' summary for global step 2000: gs://cloud-ml-job-bucket/model.ckpt-2000
INFO	2018-12-27 15:37:55 +0530	master-replica-0		Calling mod

INFO	2018-12-27 15:38:37 +0530	master-replica-0		global_step/sec: 88.0816
INFO	2018-12-27 15:38:37 +0530	master-replica-0		loss = 0.25695705, step = 3301 (1.135 sec)
INFO	2018-12-27 15:38:38 +0530	master-replica-0		global_step/sec: 77.434
INFO	2018-12-27 15:38:38 +0530	master-replica-0		loss = 0.29215527, step = 3401 (1.292 sec)
INFO	2018-12-27 15:38:40 +0530	master-replica-0		Saving checkpoints for 3500 into gs://cloud-ml-job-bucket/model.ckpt.
INFO	2018-12-27 15:38:44 +0530	master-replica-0		Calling model_fn.
INFO	2018-12-27 15:38:44 +0530	master-replica-0		Done calling model_fn.
INFO	2018-12-27 15:38:44 +0530	master-replica-0		Starting evaluation at 2018-12-27-10:08:44
INFO	2018-12-27 15:38:44 +0530	master-replica-0		Graph was finalized.
INFO	2018-12-27 15:38:44 +0530	master-replica-0		Restoring parameters from gs://cloud-ml-job-bucket/model.ckpt-3500
INFO	2018-12-27 15:38:45 +0530	master-replica-0		Running local_init_op.
INFO	2018-12-27 15:38:45 +0530	master-replica-0		Done running

INFO	2018-12-27 15:39:30 +0530	master-replica-0		Running local_init_op.
INFO	2018-12-27 15:39:30 +0530	master-replica-0		Done running local_init_op.
INFO	2018-12-27 15:39:30 +0530	master-replica-0		Finished evaluation at 2018-12-27-10:09:30
INFO	2018-12-27 15:39:30 +0530	master-replica-0		Saving dict for global step 4562: global_step = 4562, loss = 0.1984755, mse = 1.9340057
INFO	2018-12-27 15:39:30 +0530	master-replica-0		Saving 'checkpoint_path' summary for global step 4562: gs://cloud-ml-job-bucket/model.ckpt-4562
INFO	2018-12-27 15:39:31 +0530	master-replica-0		Calling model_fn.
INFO	2018-12-27 15:39:31 +0530	master-replica-0		Done calling model_fn.
INFO	2018-12-27 15:39:31 +0530	master-replica-0		Signatures INCLUDED in export for Eval: None
INFO	2018-12-27 15:39:31 +0530	master-replica-0		Signatures INCLUDED in export for Classify: None
INFO	2018-12-27 15:39:31 +0530	master-replica-0		Signatures INCLUDED in export for Regress: None
INFO	2018-12-27 15:39:31 +0530	master-replica-0		

After the job is finished, it will create a folder in the bucket with the name JOB_DIR we specified above which will contain the same files and folders that were created when we ran the model locally.

## Part 2 --> Deploying the trained model

### Step 1 --> creating the model

In [28]:
%env MODEL_NAME kaggle_housing_price_prediction
%env MODEL_PATH gs://cloud-ml-job-bucket/export/exporter/1545905371

#CREATE MODEL
!gcloud ml-engine models create $MODEL_NAME

env: MODEL_NAME=kaggle_housing_price_prediction
env: MODEL_PATH=gs://cloud-ml-job-bucket/export/exporter/1545905371
Created ml engine model [projects/searce-sandbox/models/kaggle_housing_price_prediction].


### Step 2 --> creating version for the model you just created

In [29]:
!gcloud ml-engine versions create "version_1" --model $MODEL_NAME --origin $MODEL_PATH

Creating version (this might take a few minutes)......done.                    


## Part 3 --> Serving the model for predictions
The model expects a json input. So let us create that first.

In [58]:
#take a sample and save it to a json file
with open('test_data.json', 'w') as outfile:
  json.dump(list(X[35]), outfile)

In [56]:
#gcloud command to make predictions through the model we just deployed.
!gcloud ml-engine predict --model kaggle_housing_price_prediction --version version_1 --json-instances test_data.json

DENSE_3
[-1.0]


In [59]:
#perform inverse transform to get the actual price. 
#You can add this step in the model.py file itself by defining an output_fn() which will take the 
#prediction outputs from the model and will perform this step for you.
sc_y.inverse_transform([-1.0])

array([101505.90400434])

### This is how you can use Google Cloud ML-Engine to create, deploy and serve your own model on your own dataset.