# Cloud Machine Learning Engine

In [None]:
import os
# use here the name of your own bucket
BUCKET = 'telemar-formazione-master-day6'

os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = 'europe-west1'

## Authorize CMLE
Cloud Machine Learning Engine needs to have access to train and test csv files.

In [None]:
%%bash
PROJECT_ID=$(gcloud config get-value core/project)

AUTH_TOKEN=$(gcloud auth print-access-token)
SVC_ACCOUNT=$(curl -X GET -H "Content-Type: application/json" \
    -H "Authorization: Bearer $AUTH_TOKEN" \
    https://ml.googleapis.com/v1/projects/${PROJECT_ID}:getConfig \
    | python -c "import json; import sys; response = json.load(sys.stdin); \
    print response['serviceAccount']")

echo "Authorizing the Cloud ML Service account $SVC_ACCOUNT to access files in $BUCKET"
gsutil -m defacl ch -u $SVC_ACCOUNT:R gs://$BUCKET
gsutil -m acl ch -u $SVC_ACCOUNT:R -r gs://$BUCKET  # error message (if bucket is empty) can be ignored
gsutil -m acl ch -u $SVC_ACCOUNT:W gs://$BUCKET

## Run a full training session on datalab
Code below launches a training job on Google Cloud CMLE. 
Note that:
- python package is contained in sub-folder flights: you can browse source cod using datalab;
- JOBNAME environmental variable is created starting from current date and time: check and find relevant lines of code;
- OUTPUT_DIR points to a specific folder in your bucket: it will contain training informations. 


In [None]:
%%bash

OUTPUT_DIR=gs://${BUCKET}/flights/chapter9/output
DATA_DIR=gs://${BUCKET}/flights/chapter8/output
ORIGIN_FILE=gs://${BUCKET}/flights/chapter8/keys/origin.txt
DEST_FILE=gs://${BUCKET}/flights/chapter8/keys/dest.txt
JOBNAME=flights_$(date -u +%y%m%d_%H%M%S)

PATTERN="Flights*"

echo "Launching training job ... trained model will be in $OUTPUT_DIR"

gsutil -m rm -rf $OUTPUT_DIR
gcloud ml-engine jobs submit training $JOBNAME \
  --region=$REGION \
  --module-name=trainer.task \
  --package-path=$(pwd)/flights/trainer \
  --job-dir=$OUTPUT_DIR \
  --staging-bucket=gs://$BUCKET \
  --runtime-version="1.6" \
  --scale-tier=STANDARD_1 \
  -- \
   --output_dir=$OUTPUT_DIR \
   --traindata $DATA_DIR/train$PATTERN \
   --evaldata $DATA_DIR/test$PATTERN   \
   --origin_file $ORIGIN_FILE          \
   --dest_file $DEST_FILE              \
   --num_training_epochs=5

## Control CMLE log status
Browse to [https://console.cloud.google.com/mlengine](https://console.cloud.google.com/mlengine) and select your job

## Launch TensorBoard: visualize graph and metrics
We can keep track of the behavior of accuracy (on test set) and ()

In [None]:
from google.datalab.ml import TensorBoard
TensorBoard().start('gs://'+BUCKET+'/flights/chapter9/output')
TensorBoard().list()

In [None]:
# to stop TensorBoard
for pid in TensorBoard.list()['pid']:
    TensorBoard().stop(pid)
    print 'Stopped TensorBoard with pid {}'.format(pid)