# The MNIST Classifier locally and on GCP

The aim of this tutorial is to set up a Python package for a MNIST Classifier which can be trained and deployed on Google Cloud Platform (GCP). We close this tutorial by requesting predictions from our model on GCP.

## 1) Writing a Python package containing our code

We start by creating some Python variables. By convention, we will use small names for Python variables and capital names for Shell variables.

In [None]:
import os

package_path = 'trainer'
module_name = 'task'
model_dir = 'mnist_classifier'

init_path = os.path.join(package_path,'__init__.py')  # path to the __init__.py file 
module_path = os.path.join(package_path, module_name+'.py') # path to the task.py file
model_path = os.path.join(package_path, 'model.py') # path to the model.py file

os.environ['PACKAGE_PATH'] = package_path

### Creating a Python package

A python package is a directory which contains a typically empty `__init__.py` file along with some Python files containing the actual code. First, we create a directory which will become our Python package.

In [None]:
import shutil

if os.path.exists(package_path):
    print('The path already exists. Shall we delete its content? (y/n)')
    answer = input()
    if answer == 'y' or answer == 'Y':
        shutil.rmtree(package_path, ignore_errors = True)
        os.mkdir(package_path)
else:        
    os.mkdir(package_path)

We create an empty `__init__.py` file inside our package directory to define this directory as a python package.

In [None]:
%%writefile $init_path   



We create a `model.py` file containing the model code.

In [None]:
%%writefile $model_path   

import tensorflow as tf

AUTOTUNE = tf.data.experimental.AUTOTUNE

def input_fn(features, labels, shuffle = True, epochs = None, batch_size = 128):
    
    if labels is None:
        inputs = features
    else:
        inputs = (features, labels)
        
    dataset = tf.data.Dataset.from_tensor_slices(inputs).shuffle(10000).cache()
    dataset = dataset.batch(batch_size).repeat(epochs).prefetch(AUTOTUNE)
    
    return dataset


def build_model(input_shape, learning_rate):
    
    model = tf.keras.Sequential([
        tf.keras.layers.Reshape(target_shape=[28,28,1], input_shape=input_shape),
        tf.keras.layers.Conv2D(filters=8, kernel_size=(3,3), activation='relu'),
        tf.keras.layers.MaxPooling2D(pool_size=(2,2)),
        tf.keras.layers.Conv2D(filters=16, kernel_size=(3,3), activation='relu'),
        tf.keras.layers.MaxPooling2D(pool_size=(2,2)),
        tf.keras.layers.Conv2D(filters=32, kernel_size=(3,3), activation='relu'),
        tf.keras.layers.MaxPooling2D(pool_size=(2,2)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dropout(rate=0.2),
        tf.keras.layers.Dense(units=128, activation='relu'),
        tf.keras.layers.Dense(units=10, activation='softmax')
        ])
    
    optimizer = tf.keras.optimizers.Adam(lr = learning_rate)
    loss = tf.keras.losses.SparseCategoricalCrossentropy()
    model.compile(loss = loss, optimizer = optimizer, metrics=['accuracy'])
    
    return model


We create a `task.py` file containing the code to be executed.

In [None]:
%%writefile $module_path   

import tensorflow as tf
import os
from argparse import ArgumentParser
# Importing job specific libraries
from .model import input_fn, build_model
   


def get_arguments():
    
    parser = ArgumentParser()
    
    parser.add_argument('--gpu_memory_growth',
                        type = bool, 
                        default = False,
                        help = 'whether or not gpu memory growth will be enabled')
    
    parser.add_argument( '--shuffle',
                        type = bool, 
                        default = True,
                        help = 'whether to shuffle the input samples or not')
    
    parser.add_argument('--epochs',
                        type = int,
                        default = 100,
                        help = 'the maximal number of epochs during training')
    
    parser.add_argument('--batch_size',
                        type = int,
                        default = 128,
                        help = 'how many samples should be processed during one training step')
    
    parser.add_argument('--learning_rate',
                        type = float, 
                        default = 0.01,
                        help = 'the initial learning rate of the Adam optimizer')
    
    parser.add_argument('--model_dir',
                        type = str, 
                        required = True,
                        help = 'where to store checkpoints and the exported model')
    
    parser.add_argument('--log_dir',
                        type = str,
                        required = True,
                        help = 'where to store logs needed for Tensorboard')
    
    parser.add_argument('--ckpt_dir',
                        type = str,
                        required = True,
                        help = 'where to store the checkpoints')

    parser.add_argument('--verbose',
                        type = int,
                        default = 0,
                        choices = [0,1,2])    
    
    parser.add_argument('--verbosity',
                        type = str,
                        choices=['DEBUG', 'ERROR', 'FATAL', 'INFO', 'WARN'],
                        default='INFO')
    
    args, _ =  parser.parse_known_args()
    return args
    
    

def train_and_evaluate(args):
    
    # Enable memory growth if training is done locally
    if args.gpu_memory_growth:
        for gpu in tf.config.list_physical_devices('GPU'):
            tf.config.experimental.set_memory_growth(gpu, True)
            
    
    # Loading the data
    (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
    
    # Create a training dataset
    train_dataset = input_fn(x_train, y_train, shuffle=args.shuffle, epochs=args.epochs, batch_size=args.batch_size)
    
    # Create a validation dataset
    valid_dataset = input_fn(x_test, y_test, shuffle=False, epochs=1, batch_size=args.batch_size)
    
    
    # Building the model
    input_shape = x_train[0].shape
    model = build_model(input_shape = input_shape, learning_rate = args.learning_rate)
    
    
    # Defining some callbacks
    reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(factor = 0.1, 
                                                     monitor = 'val_accuracy', 
                                                     patience = 10)
    tensorboard = tf.keras.callbacks.TensorBoard(log_dir = args.log_dir)
    checkpoints = tf.keras.callbacks.ModelCheckpoint(filepath = args.ckpt_dir, 
                                                     monitor = 'val_accuracy', 
                                                     save_best_only = True,
                                                     save_weights_only = True)
    early_stopping = tf.keras.callbacks.EarlyStopping(monitor = 'val_accuracy',
                                                      min_delta = 0.005,
                                                      patience = 20)
    callbacks = [reduce_lr, tensorboard, checkpoints, early_stopping]
    
    
    # Starting the training
    steps_per_epoch = x_train.shape[0] // args.batch_size
    history = model.fit(train_dataset, 
                        validation_data = valid_dataset, 
                        steps_per_epoch = steps_per_epoch,
                        epochs = args.epochs,
                        callbacks = callbacks,
                        verbose = args.verbose)
    
    # Loading the best weights into the model
    model.load_weights(args.ckpt_dir)
    
    # Saving the model
    model_path = os.path.join(args.model_dir)
    model.save(model_path)
    print(f'Keras model exported to {args.model_dir}')
    
    
if __name__ == '__main__':
    args = get_arguments()
    tf.compat.v1.logging.set_verbosity(args.verbosity)
    train_and_evaluate(args)

## 2) Training models

### Training a model locally

Before we can train our model with the `gcloud` command line tool, we need to export some variables to make them available in the Shell. We will use small variable names for Python variables and capital names for Shell variabels.

In [None]:
os.environ['MODULE_NAME'] = package_path + '.' + module_name  # Note the 'dot' between the package name and the module name
os.environ['MODEL_DIR'] = model_dir
os.environ['LOG_DIR'] = 'logs'
os.environ['CKPT_DIR'] = 'checkpoints/weights'

Let's start to train our model locally using the command line tool `gcloud`.

In [None]:
%%bash 

gcloud ai-platform local train \
    --package-path $PACKAGE_PATH \
    --module-name $MODULE_NAME \
    --job-dir $MODEL_DIR \
    -- \
    --gpu_memory_growth True \
    --model_dir $MODEL_DIR \
    --log_dir $MODEL_DIR/$LOG_DIR \
    --ckpt_dir $MODEL_DIR/$CKPT_DIR \
    --verbose 2 

### Training a model on GCP

First of all we define some variables and export them. The `GOOGLE_APPLICATION_CREDENTIALS` are used by the Google API's. 

In [None]:
# Some project variables
credentials = # path to your service account key file
project_id = # your project id
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credentials
os.environ['PROJECT_ID'] = project_id

# Some bucket variables
bucket_name = 'gs://' + project_id + '-2nd-bucket'
region = 'europe-west2'
os.environ['BUCKET_NAME'] = bucket_name
os.environ['BUCKET_REGION'] = region

# Some job variables
os.environ['JOB_NAME'] = model_dir + '_v2'
os.environ['JOB_REGION'] = region   # It is always good to choose the bucket region to avoid latency
os.environ['JOB_DIR'] = bucket_name + '/' + model_dir

If not already done, we create a bucket on Google Cloud Storage to hold our files.

In [None]:
#%%bash

#gsutil mb -l $BUCKET_REGION -p $PROJECT_ID $BUCKET_NAME

It's time to send a training job to GCP.

In [None]:
%%bash

gcloud ai-platform jobs submit training $JOB_NAME \
    --package-path $PACKAGE_PATH \
    --module-name $MODULE_NAME \
    --region $JOB_REGION \
    --python-version 3.7 \
    --runtime-version 2.1 \
    --job-dir $JOB_DIR \
    -- \
    --model_dir $JOB_DIR \
    --log_dir $JOB_DIR/$LOG_DIR \
    --ckpt_dir $JOB_DIR/$CKPT_DIR \
    --verbose 0
 

## 3) Set up a model on GCP

We set up some variables. As before, small names will be used for Python variables and capital names for Shell variables.

In [None]:
model_name = 'mnist_classifier'
model_version = 'v0001'
model_region = 'europe-west1'  # 'europe-west2' is not available

os.environ['MODEL_NAME'] = model_name
os.environ['VERSION'] = model_version
os.environ['MODEL_REGION'] = model_region

This command creates a model. However, the actual creation happens in the next step.

In [None]:
%%bash

gcloud ai-platform models create $MODEL_NAME \
    --regions $MODEL_REGION

We create a model version. It is this step when the model is finally created.

In [None]:
%%bash

gcloud ai-platform versions create $VERSION \
    --model $MODEL_NAME \
    --runtime-version 2.1 \
    --python-version 3.7 \
    --framework tensorflow \
    --origin $JOB_DIR
    

## 4) Prediction with models on GCP

Let us load some images.

In [None]:
import tensorflow as tf

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

### Prediction with gcloud

We'll send a json file using the command line tool `gcloud`. Each row in the json file is one individual sample. The result will be shown on the screen.

In [None]:
import json

with open('prediction_input.json', 'w') as json_file:
    for image in x_test[:3]:
        json.dump(image.tolist(), json_file)
        json_file.write('\n')


In [None]:
%%bash 

gcloud ai-platform predict \
    --model $MODEL_NAME \
    --version $VERSION \
    --json-instances prediction_input.json

### Prediction with googleapiclient

To use a prediction within Python we need a different strategy. This allows us to save the results in a variable which can be used elsewhere. **Note** that we need to export `GOOGLE_APPLICATION_CREDENTIALS` before we can send the request.

In [None]:
import googleapiclient.discovery

service = googleapiclient.discovery.build('ml', 'v1').projects()

name = f'projects/{project_id}/models/{model_name}/versions/{model_version}'
body = {'signature': 'serving_default',  # This is optional
        'instances': x_test[:3].tolist()}

response = service.predict(name = name, body = body).execute()  # Don't forget to call  'execute()'

if 'error' in response:
    raise RuntimeError(response['error'])

response['predictions']