## TensorFlow script mode training and serving
Script mode is a training script format for TensorFlow to execute any TensorFlow training script in SageMaker with minimal modification. The SageMaker Python SDK handles transferring script to a SageMaker training instance. On the training instance, SageMaker's native TensorFlow support sets up training-related environment variables and executes training script. Using the SageMaker Python SDK to launch a training job and deploy the trained model.

### Set up the environment
Let's start by setting up the environment:

In [None]:
import os
import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

role = get_execution_role()
region = sagemaker_session.boto_session.region_name

### Training Data
The MNIST dataset has been loaded to the public S3 buckets 'sagemaker-sample-data-<REGION>' under the prefix 'tensorflow/mnist'. There are four .npy file under this prefix:

- train_data.npy
- eval_data.npy
- train_labels.npy
- eval_labels.npy

In [None]:
training_data_uri = 's3://sagemaker-sample-data-{}/tensorflow/mnist'.format(region)

In [None]:
!aws s3 ls $training_data_uri/

### Construct a script for distributed training


In [None]:
!wget https://github.com/srushtii-m/MLOps-With-AWS/blob/main/Custom_Model/mnist-tf2.py

In [None]:
# TensorFlow 2.x script
!pygmentize 'mnist-tf2.py'

### Create a training job using the TensorFlow estimator
The sagemaker.tensorflow.TensorFlow estimator handles locating the script mode container, uploading our script to a S3 location and creating a SageMaker training job. 

distribution is used to configure the distributed training setup. It's required only if you are doing distributed training either across a cluster of instances or across multiple GPUs. Here I am using parameter server as the distributed training schema. SageMaker training jobs run on homogeneous clusters. To make parameter server more performant in the SageMaker setup, we run a parameter server on every instance in the cluster, so there is no need to specify the number of parameter servers to launch. 

instance_type specify the EC2 instance used for training. Here I am using 'ml.c5.xlarge'.
 

You can also initiate an estimator to train with TensorFlow 2.1 script. The only things that you will need to change are the script name and `framework_version`

In [None]:
from sagemaker.tensorflow import TensorFlow
mnist_estimator = TensorFlow(entry_point='mnist-tf2.py',
                             role=role,
                             instance_count=2,
                             instance_type='ml.m5.large',
                             framework_version='2.1.0',
                             py_version='py3',
                             distribution={'parameter_server': {'enabled': True}})

### Calling fit
To start a training job, we call estimator.fit(training_data_uri).

An S3 location is used  as the input. fit creates a default channel named 'training', which points to this S3 location. In the training script we can then access the training data from the location stored in SM_CHANNEL_TRAINING. fit accepts a couple other types of input as well. 

When training starts, the TensorFlow container executes mnist-tf2.py, passing hyperparameters and model_dir from the estimator as script arguments. Because we didn't define either in this example, no hyperparameters are passed, and model_dir defaults to 's3://<DEFAULT_BUCKET>/<TRAINING_JOB_NAME>', so the script execution is as follows:

python mnist.py --model_dir s3://<DEFAULT_BUCKET>/<TRAINING_JOB_NAME>

When training is complete, the training job will upload the saved model for TensorFlow serving.

Calling fit to train a model with TensorFlow 2.1 script.

In [None]:
mnist_estimator.fit(training_data_uri)

### Deploy the trained model to an endpoint
The deploy() method creates a SageMaker model, which is then deployed to an endpoint to serve prediction requests in real time. We will use the TensorFlow Serving container for the endpoint, because we trained with script mode. This serving container runs an implementation of a web server that is compatible with SageMaker hosting protocol.

In [None]:
# cell 08
predictor = mnist_estimator.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

### Invoke the endpoint
download the training data and use that as input for inference.

In [None]:
import numpy as np

!aws --region {region} s3 cp s3://sagemaker-sample-data-{region}/tensorflow/mnist/train_data.npy train_data.npy
!aws --region {region} s3 cp s3://sagemaker-sample-data-{region}/tensorflow/mnist/train_labels.npy train_labels.npy

train_data = np.load('train_data.npy')
train_labels = np.load('train_labels.npy')

The formats of the input and the output data correspond directly to the request and response formats of the Predict method in the TensorFlow Serving REST API. SageMaker's TensforFlow Serving endpoints can also accept additional input formats that are not part of the TensorFlow REST API, including the simplified JSON format, line-delimited JSON objects ("jsons" or "jsonlines"), and CSV data.

In [None]:
np.argmax(predictions['predictions'][0])

In [None]:
predictions = predictor.predict(train_data[:50])
for i in range(0, 50):
    prediction = np.argmax(predictions['predictions'][i])
    label = train_labels[i]
    print('prediction is {}, label is {}, matched: {}'.format(prediction, label, prediction == label))

# Delete the endpoint

In [None]:
predictor.delete_endpoint()