<h1> Scaling up ML using Cloud ML Engine </h1>

In this notebook, you will take a previously developed TensorFlow model to predict taxifare rides and package it up so that it can be run in Cloud MLE. For now, the model will be trained on a small dataset. The model is still rather simplistic, and therefore, the accuracy of the model is not great either.  However, this notebook illustrates *how* to package up a TensorFlow model to run it within Cloud MLE. 

Later in the course, you will look at ways to make a more effective machine learning model.



---
Before you start, **make sure that you are logged in with your student account**. Otherwise you may incur Google Cloud charges for using this notebook. 

---

In [0]:
#@markdown Copy-paste your GCP Project ID in the following field:

PROJECT = "" #@param {type: "string"}

#@markdown **OPTIONAL:** Modify the following to use a different Google Cloud region for your Cloud MLE jobs.
REGION = "us-central1" #@param {type: "string"}

#@markdown Next, use Shift-Enter to run this cell and complete authentication.

try:  
  from google.colab import auth
  auth.authenticate_user()  
  print("AUTHENTICATED")
except:
  print("FAILED to authenticate")
  
BUCKET = PROJECT  

#@markdown Remember to uncheck "Reset all runtimes before running"

#@markdown As you know, reseting the runtime will delete any files you may have on your notebook file system. 
#@markdown ![](https://i.imgur.com/9dgw0h0.png)

# Copy taxi-*.csv files from github if they are missing from the runtime.
!wget --quiet -nc https://github.com/osipov/training-data-analyst/raw/master/bootcamps/serverless_ml/taxi-11k-datasets.zip
!unzip -q -n taxi-11k-datasets.zip 

<h2> Environment variables for project and bucket </h2>

Note that:
<ol>
<li> Your project id is the *unique* string that identifies your project (not the project name). You can find this from the GCP Console dashboard's Home page.  
<li> Cloud training often involves saving and restoring model files. If you don't have a bucket already, I suggest that you create one from the GCP console (because it will dynamically check whether the bucket name you want is available). A common pattern is to prefix the bucket name by the project id, so that it is unique. Also, for cost reasons, you might want to use a single region bucket. </li>
</ol>
<b>Change the cell below</b> to reflect your Project ID and bucket name.


In [0]:
# for bash
import os
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION
os.environ['TF_VERSION'] = '1.12'  # Cloud MLE Latest supported Tensorflow version

In [0]:
%%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

Allow the Cloud ML Engine service account to read/write to the bucket containing training data.

In [0]:
%%bash
PROJECT_ID=$PROJECT
AUTH_TOKEN=$(gcloud auth print-access-token)
SVC_ACCOUNT=$(curl -X GET -H "Content-Type: application/json" \
    -H "Authorization: Bearer $AUTH_TOKEN" \
    https://ml.googleapis.com/v1/projects/${PROJECT_ID}:getConfig \
    | python -c "import json; import sys; response = json.load(sys.stdin); \
    print response['serviceAccount']")

echo "Authorizing the Cloud ML Service $SVC_ACCOUNT to access files in $BUCKET"
gsutil -m defacl ch -u $SVC_ACCOUNT:R gs://$BUCKET

echo "NOTE: the following CommandException (No URLs matched if bucket is empty) can be ignored"
gsutil -q -m acl ch -u $SVC_ACCOUNT:R -r gs://$BUCKET  # error message (if bucket is empty) can be ignored

gsutil -m acl ch -u $SVC_ACCOUNT:W gs://$BUCKET

<h2> Packaging up the code </h2>

Take your code and put into a standard Python package structure.  <a href="taxifare/trainer/model.py">model.py</a> and <a href="taxifare/trainer/task.py">task.py</a> contain the Tensorflow code from earlier (explore the <a href="taxifare/trainer/">directory structure</a>).

In [0]:
%%bash
rm -rf taxifare
mkdir -p taxifare/trainer

for file in taxifare/setup.py \
            taxifare/trainer/__init__.py \
            taxifare/trainer/model.py \
            taxifare/trainer/task.py
do
  wget --quiet -nc \
  https://github.com/osipov/training-data-analyst/raw/master/bootcamps/serverless_ml/cloudmle/$file \
  -O $file
done

find taxifare

<h2> Find absolute paths to your data </h2>

In [0]:
%%bash
rm -rf $PWD/taxi_trained

head -1 $PWD/taxi-train.csv
head -1 $PWD/taxi-valid.csv

<h2> Submit training job using gcloud </h2>

First copy the training data to the cloud.  Then, launch a training job.

After you submit the job, go to the cloud console (http://console.cloud.google.com) and select <b>Machine Learning | Jobs</b> to monitor progress.  

<b>Note:</b> Don't be concerned if the notebook stalls (with a blue progress bar) or returns with an error about being unable to refresh auth tokens. This is a long-lived Cloud job and work is going on in the cloud.  Use the Cloud Console link (above) to monitor the job.

In [0]:
%%bash
echo $BUCKET
gsutil -m rm -rf gs://${BUCKET}/taxifare/11k/
gsutil -m cp ${PWD}/*.csv gs://${BUCKET}/taxifare/11k/

In [0]:
%%bash
OUTDIR=gs://${BUCKET}/taxifare/11k/taxi_trained

JOBNAME=mle_train_$(date -u +%y%m%d_%H%M%S)

echo $OUTDIR $REGION $JOBNAME

gsutil -m rm -rf $OUTDIR
gcloud ml-engine jobs submit training $JOBNAME \
   --region=$REGION \
   --module-name=trainer.task \
   --package-path=${PWD}/taxifare/trainer \
   --job-dir=$OUTDIR \
   --staging-bucket=gs://$BUCKET \
   --scale-tier=BASIC \
   --runtime-version=${TF_VERSION} \
   -- \
   --train_data_paths="gs://${BUCKET}/taxifare/11k/taxi-train*" \
   --eval_data_paths="gs://${BUCKET}/taxifare/11k/taxi-valid*"  \
   --output_dir=$OUTDIR \
   --train_steps=10000

In [0]:
OUTDIR="gs://{}/taxifare/11k/taxi_trained".format(BUCKET)
#@markdown Run this cell to start TensorBoard.

#@markdown Once the TensorBoard comes up, click on the vertical ellipsis in the upper right of this cell, and choose view output full screen.
%tensorboard --logdir $OUTDIR

Don't worry if you see a message about files/objects that could not be removed. This message occurs because gsutil mr command tries to remove the output directory for trained model checkpoint files.

To monitor the progress of the job from the GCP user interface, navigate to [Jobs](https://console.cloud.google.com/mlengine/jobs) part of the Cloud ML Engine service. Use the "View Logs" link to get the details. In the upcoming lab, you will also monitor training details using TensorBoard.

Copyright 2019 Counter Factual .AI LLC. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License