In [1]:
from datetime import datetime
import os

PROJECT = "qwiklabs-gcp-00-dd86874625fb"  
BUCKET = "qwiklabs-gcp-00-dd86874625fb"  
REGION = "us-central1" 
MODEL_TYPE = "dnn_dropout"  # "linear", "dnn_dropout", "cnn", or "dnn"

# Do not change these
os.environ["PROJECT"] = PROJECT
os.environ["BUCKET"] = BUCKET
os.environ["REGION"] = REGION
os.environ["MODEL_TYPE"] = MODEL_TYPE
os.environ["TFVERSION"] = "2.1"  
os.environ["IMAGE_URI"] = os.path.join("gcr.io", PROJECT, "mnistmodel")

In [2]:
%%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

Updated property [core/project].
Updated property [compute/region].


## Building a dynamic model

The boilerplate structure for this module has already been set up in the folder `mnistmodel`. The module lives in the sub-folder, `trainer`, and is designated as a python package with the empty `__init__.py` (`mnistmodel/trainer/__init__.py`) file. It still needs the model and a trainer to run it, so let's make them.

Let's start with the trainer file first. This file parses command line arguments to feed into the model.

In [3]:
%%writefile mnistmodel/trainer/task.py
import argparse
import json
import os
import sys

from . import model


def _parse_arguments(argv):
    """Parses command-line arguments."""
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--model_type',
        help='Which model type to use',
        type=str, default='linear')
    parser.add_argument(
        '--epochs',
        help='The number of epochs to train',
        type=int, default=10)
    parser.add_argument(
        '--steps_per_epoch',
        help='The number of steps per epoch to train',
        type=int, default=100)
    parser.add_argument(
        '--job-dir',
        help='Directory where to save the given model',
        type=str, default='mnistmodel/')
    return parser.parse_known_args(argv)


def main():
    """Parses command line arguments and kicks off model training."""
    args = _parse_arguments(sys.argv[1:])[0]

    # Configure path for hyperparameter tuning.
    trial_id = json.loads(
        os.environ.get('TF_CONFIG', '{}')).get('task', {}).get('trial', '')
    output_path = args.job_dir if not trial_id else args.job_dir + '/'

    model_layers = model.get_layers(args.model_type)
    image_model = model.build_model(model_layers, args.job_dir)
    model_history = model.train_and_evaluate(
        image_model, args.epochs, args.steps_per_epoch, args.job_dir)


if __name__ == '__main__':
    main()


Overwriting mnistmodel/trainer/task.py


Next, let's group non-model functions into a util file to keep the model file simple. We'll copy over the `scale` and `load_dataset` functions from the previous lab.

In [4]:
%%writefile mnistmodel/trainer/util.py
import tensorflow as tf


def scale(image, label):
    """Scales images from a 0-255 int range to a 0-1 float range"""
    image = tf.cast(image, tf.float32)
    image /= 255
    image = tf.expand_dims(image, -1)
    return image, label


def load_dataset(
        data, training=True, buffer_size=5000, batch_size=100, nclasses=10):
    """Loads MNIST dataset into a tf.data.Dataset"""
    (x_train, y_train), (x_test, y_test) = data
    x = x_train if training else x_test
    y = y_train if training else y_test
    # One-hot encode the classes
    y = tf.keras.utils.to_categorical(y, nclasses)
    dataset = tf.data.Dataset.from_tensor_slices((x, y))
    dataset = dataset.map(scale).batch(batch_size)
    if training:
        dataset = dataset.shuffle(buffer_size).repeat()
    return dataset


Overwriting mnistmodel/trainer/util.py


Finally, let's code the models! The [tf.keras API](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras) accepts an array of [layers](https://www.tensorflow.org/api_docs/python/tf/keras/layers) into a [model object](https://www.tensorflow.org/api_docs/python/tf/keras/Model), so we can create a dictionary of layers based on the different model types we want to use. The below file has two functions: `get_layers` and `create_and_train_model`. We will build the structure of our model in `get_layers`. Last but not least, we'll copy over the training code from the previous lab into `train_and_evaluate`.

These models progressively build on each other. Look at the imported `tensorflow.keras.layers` modules and the default values for the variables defined in `get_layers` for guidance.

In [5]:
%%writefile mnistmodel/trainer/model.py
import os
import shutil

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.layers import (
    Conv2D, Dense, Dropout, Flatten, MaxPooling2D, Softmax)

from . import util


# Image Variables
WIDTH = 28
HEIGHT = 28


def get_layers(
        model_type,
        nclasses=10,
        hidden_layer_1_neurons=400,
        hidden_layer_2_neurons=100,
        dropout_rate=0.25,
        num_filters_1=64,
        kernel_size_1=3,
        pooling_size_1=2,
        num_filters_2=32,
        kernel_size_2=3,
        pooling_size_2=2):
    """Constructs layers for a keras model based on a dict of model types."""
    model_layers = {
        'linear': [
            Flatten(),
            Dense(nclasses),
            Softmax()
        ],
        'dnn': [
            Flatten(),
            Dense(hidden_layer_1_neurons, activation='relu'),
            Dense(hidden_layer_2_neurons, activation='relu'),
            Dense(nclasses),
            Softmax()
        ],
        'dnn_dropout': [
            Flatten(),
            Dense(hidden_layer_1_neurons, activation='relu'),
            Dense(hidden_layer_2_neurons, activation='relu'),
            Dropout(dropout_rate),
            Dense(nclasses),
            Softmax()
        ],
        'dnn': [
            Conv2D(num_filters_1, kernel_size=kernel_size_1,
                   activation='relu', input_shape=(WIDTH, HEIGHT, 1)),
            MaxPooling2D(pooling_size_1),
            Conv2D(num_filters_2, kernel_size=kernel_size_2,
                   activation='relu'),
            MaxPooling2D(pooling_size_2),
            Flatten(),
            Dense(hidden_layer_1_neurons, activation='relu'),
            Dense(hidden_layer_2_neurons, activation='relu'),
            Dropout(dropout_rate),
            Dense(nclasses),
            Softmax()
        ]
    }
    return model_layers[model_type]


def build_model(layers, output_dir):
    """Compiles keras model for image classification."""
    model = Sequential(layers)
    model.compile(optimizer='adam',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model


def train_and_evaluate(model, num_epochs, steps_per_epoch, output_dir):
    """Compiles keras model and loads data into it for training."""
    mnist = tf.keras.datasets.mnist.load_data()
    train_data = util.load_dataset(mnist)
    validation_data = util.load_dataset(mnist, training=False)

    callbacks = []
    if output_dir:
        tensorboard_callback = TensorBoard(log_dir=output_dir)
        callbacks = [tensorboard_callback]

    history = model.fit(
        train_data,
        validation_data=validation_data,
        epochs=num_epochs,
        steps_per_epoch=steps_per_epoch,
        verbose=2,
        callbacks=callbacks)

    if output_dir:
        export_path = os.path.join(output_dir, 'keras_export')
        model.save(export_path, save_format='tf')

    return history


Overwriting mnistmodel/trainer/model.py


## Local Training

Now that we know that our models are working as expected, let's run it on the [Google Cloud AI Platform](https://cloud.google.com/ml-engine/docs/). We can run it as a python module locally first using the command line.

The below cell transfers some of our variables to the command line as well as create a job directory including a timestamp.

You can change the model_type to try out different models.

In [6]:
current_time = datetime.now().strftime("%y%m%d_%H%M%S")
model_type = 'dnn'

os.environ["MODEL_TYPE"] = model_type
os.environ["JOB_DIR"] = "gs://{}/mnist_{}_{}/".format(
    BUCKET, model_type, current_time)
os.environ["JOB_NAME"] = "mnist_{}_{}".format(
    model_type, current_time)

The cell below runs the local version of the code. The epochs and steps_per_epoch flag can be changed to run for longer or shorther, as defined in our `mnistmodel/trainer/task.py` file.

In [7]:
%%bash
python3 -m mnistmodel.trainer.task \
    --job-dir=$JOB_DIR \
    --epochs=5 \
    --steps_per_epoch=50 \
    --model_type=$MODEL_TYPE

Train for 50 steps, validate for 100 steps
Epoch 1/5
50/50 - 18s - loss: 1.0516 - accuracy: 0.6702 - val_loss: 0.2909 - val_accuracy: 0.9196
Epoch 2/5
50/50 - 8s - loss: 0.3123 - accuracy: 0.9050 - val_loss: 0.1788 - val_accuracy: 0.9454
Epoch 3/5
50/50 - 8s - loss: 0.2139 - accuracy: 0.9378 - val_loss: 0.1532 - val_accuracy: 0.9524
Epoch 4/5
50/50 - 7s - loss: 0.1651 - accuracy: 0.9460 - val_loss: 0.1110 - val_accuracy: 0.9657
Epoch 5/5
50/50 - 7s - loss: 0.1329 - accuracy: 0.9608 - val_loss: 0.0925 - val_accuracy: 0.9698


2020-08-10 06:39:19.478492: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2020-08-10 06:39:19.478907: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x558c6119e650 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-10 06:39:19.478935: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-08-10 06:39:19.479041: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2020-08-10 06:39:29.115263: I tensorflow/core/profiler/lib/profiler_session.cc:225] Profiler session started.
2020-08-10 06:40:10.557412: W tensorflow/python/util/util.cc:319] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
Instructions for updating:
If using Keras pass *_constraint arguments t

## Training on the cloud

Since we're using an unreleased version of TensorFlow on AI Platform, we can instead use a [Deep Learning Container](https://cloud.google.com/ai-platform/deep-learning-containers/docs/overview) in order to take advantage of libraries and applications not normally packaged with AI Platform. Below is a simple [Dockerlife](https://docs.docker.com/engine/reference/builder/) which copies our code to be used in a TF2 environment.

In [8]:
%%writefile mnistmodel/Dockerfile
FROM gcr.io/deeplearning-platform-release/tf2-cpu
COPY mnistmodel/trainer /mnistmodel/trainer
ENTRYPOINT ["python3", "-m", "mnistmodel.trainer.task"]

Overwriting mnistmodel/Dockerfile


The below command builds the image and ships it off to Google Cloud so it can be used for AI Platform. When built, it will show up [here](http://console.cloud.google.com/gcr) with the name `mnistmodel`. ([Click here](https://console.cloud.google.com/cloud-build) to enable Cloud Build)

In [9]:
!docker build -f mnistmodel/Dockerfile -t $IMAGE_URI ./

Sending build context to Docker daemon  2.046MB
Step 1/3 : FROM gcr.io/deeplearning-platform-release/tf2-cpu
 ---> 81f04d2f5f5c
Step 2/3 : COPY mnistmodel/trainer /mnistmodel/trainer
 ---> 6c3c92c0ea40
Step 3/3 : ENTRYPOINT ["python3", "-m", "mnistmodel.trainer.task"]
 ---> Running in c3238b673dd7
Removing intermediate container c3238b673dd7
 ---> e1de2f8b1b70
Successfully built e1de2f8b1b70
Successfully tagged gcr.io/qwiklabs-gcp-00-dd86874625fb/mnistmodel:latest


In [10]:
!docker push $IMAGE_URI

The push refers to repository [gcr.io/qwiklabs-gcp-00-dd86874625fb/mnistmodel]

[1Ba1960148: Preparing 
[1Becc34b2f: Preparing 
[1B0ba548c6: Preparing 
[1Be58dce6f: Preparing 
[1B59643149: Preparing 
[1Bf1bc8335: Preparing 
[1B7fce4bef: Preparing 
[1B25416dd1: Preparing 
[1B9bef6670: Preparing 
[1B397a1d54: Preparing 
[1B5579b7a8: Preparing 
[1B6c16a28a: Preparing 
[1Bf6c43556: Preparing 
[1B17461bc6: Preparing 
[1B474ff053: Preparing 
[1B7a04bf9f: Preparing 
[1B00d84994: Preparing 
[1B52ea2c16: Preparing 
[1Bc9e5703f: Preparing 
[20B1960148: Pushed lready exists 3kB[17A[2K[15A[2K[13A[2K[11A[2K[10A[2K[7A[2K[2A[2K[1A[2K[20A[2Klatest: digest: sha256:7a218f7c23431264a0a73e58d4e263bc201f2f5ae8d158fab4c0eb7ee22bed26 size: 4501


Finally, we can kickoff the [AI Platform training job](https://cloud.google.com/sdk/gcloud/reference/ai-platform/jobs/submit/training). We can pass in our docker image using the `master-image-uri` flag.

In [11]:
%%bash
echo $JOB_DIR $REGION $JOB_NAME
gcloud ai-platform jobs submit training $JOB_NAME \
    --staging-bucket=gs://$BUCKET \
    --region=$REGION \
    --master-image-uri=$IMAGE_URI \
    --scale-tier=BASIC_GPU \
    --job-dir=$JOB_DIR \
    -- \
    --model_type=$MODEL_TYPE

gs://qwiklabs-gcp-00-dd86874625fb/mnist_dnn_200810_063916/ us-central1 mnist_dnn_200810_063916
jobId: mnist_dnn_200810_063916
state: QUEUED


Job [mnist_dnn_200810_063916] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe mnist_dnn_200810_063916

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs mnist_dnn_200810_063916


## Deploying and predicting with model

Once you have a model you're proud of, let's deploy it! All we need to do is give AI Platform the location of the model. Below uses the keras export path of the previous job, but `${JOB_DIR}keras_export/` can always be changed to a different path.

In [None]:
%%bash
MODEL_NAME="mnist"
MODEL_VERSION=${MODEL_TYPE}
MODEL_LOCATION=${JOB_DIR}keras_export/
echo "Deleting and deploying $MODEL_NAME $MODEL_VERSION from $MODEL_LOCATION ... this will take a few minutes"
yes | gcloud ai-platform versions delete ${MODEL_VERSION} --model ${MODEL_NAME}
yes | gcloud ai-platform models delete ${MODEL_NAME}
gcloud ai-platform models create ${MODEL_NAME} --regions $REGION
gcloud ai-platform versions create ${MODEL_VERSION} \
    --model ${MODEL_NAME} \
    --origin ${MODEL_LOCATION} \
    --framework tensorflow \
    --runtime-version=2.1