# Train a review classifier with BERT and Amazon SageMaker
##### train a text classifier using a variant of BERT called RoBERTa - a Robustly Optimized BERT Pretraining Approach - within a PyTorch model ran as a SageMaker Training Job.

Let's review Amazon SageMaker "Bring Your Own Script" scheme.:

![](sagemaker_scriptmode.png)

In [None]:
# please ignore warning messages during the installation
!pip install --disable-pip-version-check -q sagemaker==2.35.0
!conda install -q -y pytorch==1.6.0 -c pytorch
!pip install --disable-pip-version-check -q transformers==3.5.1

import boto3, sagemaker, pandas as pd, numpy as np, botocore

config = botocore.config.Config(user_agent_extra='dlai-pds/c2/w2')

# low-level service client of the boto3 session
sm = boto3.client(service_name='sagemaker', config=config)
sm_runtime = boto3.client('sagemaker-runtime', config=config)
sess = sagemaker.Session(sagemaker_client=sm, sagemaker_runtime_client=sm_runtime)

bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = sess.boto_region_name

import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format='retina'

## Configure dataset, hyper-parameters and evaluation metrics

In [None]:
#Create a train and validation data channel
processed_train_data_s3_uri= "s3://sagemaker-us-east-1-170235698766/sagemaker-scikit-learn-2022-07-26-06-44-43-422/output/sentiment-train/"
processed_validation_data_s3_uri= "s3://sagemaker-us-east-1-170235698766/sagemaker-scikit-learn-2022-07-26-06-44-43-422/output/sentiment-validation/"

s3_input_train_data = sagemaker.inputs.TrainingInput(s3_data= processed_train_data_s3_uri) # sagemaker sdk automatically write this data to s3 for you
s3_input_validation_data = sagemaker.inputs.TrainingInput(s3_data= processed_validation_data_s3_uri)

data_channels = { 'train': processed_train_data_s3_uri, 'validation': s3_input_validation_data }

#emli-channel: train_data = sagemaker.session.s3_input(s3_train_data, distribution="FullyReplicated", content_type="text/plain", s3_data_type= S3Prefix)

#Configure model hyper-parameters
max_seq_length=128 # maximum number of input tokens passed to BERT model
freeze_bert_layer=False # specifies the depth of training within the network
epochs=3; learning_rate=2e-5; train_batch_size=256; train_steps_per_epoch=50; validation_batch_size=256; validation_steps_per_epoch=50
seed=42; run_validation=True; train_instance_count=1; train_instance_type='ml.p2.xlarge'; train_volume_size=256; input_mode='File'

#PyTorch estimator hyperparameters argument.
hyperparameters={'max_seq_length': max_seq_length,'freeze_bert_layer': freeze_bert_layer,'epochs': epochs,'learning_rate': learning_rate,'train_batch_size': train_batch_size,'train_steps_per_epoch': train_steps_per_epoch,'validation_batch_size': validation_batch_size,'validation_steps_per_epoch': validation_steps_per_epoch,    'seed': seed,'run_validation': run_validation}

#Setup evaluation metrics; 
#Choose loss and accuracy as the evaluation metrics.`Regex` will capture the values of metrics that the algorithm will emit and produce the metrics graph in CloudWatch:
metric_definitions = [{'Name': 'validation:loss', 'Regex': 'val_loss: ([0-9.]+)'}, {'Name': 'validation:accuracy', 'Regex': 'val_acc: ([0-9.]+)'},]



### Setup Debugger and Profiler
Amazon SageMaker Debugger can be used to profile machine learning models, helping to identify and fix training issues caused by hardware resource usage. Setting some parameters in the SageMaker estimator, without any change to the training code, you can enable the collection of infrastructure and model metrics such as: CPU and GPU, RAM and GPU RAM, data loading time, time spent in ML operators running on CPU and GPU, distributed training metrics and many more. In addition, you can visualize how much time is spent in different phases, such as preprocessing, training loop, and postprocessing. If needed, you can drill down on each training epoch, and even on each function in your training script.

Define Debugger Rules as described here: https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-built-in-rules.html

In [None]:
from sagemaker.debugger import Rule, ProfilerRule, rule_configs
from sagemaker.debugger import DebuggerHookConfig #provides options to customize how debugging information is emitted and saved.
from sagemaker.debugger import ProfilerConfig, FrameworkProfile #ProfilerConfig sets the configuration for collecting system and framework metrics of SageMaker Training Jobs.

debugger_hook_config = DebuggerHookConfig(s3_output_path='s3://{}'.format(bucket),) #debugger output stored in this s3 location

profiler_config = ProfilerConfig(system_monitor_interval_millis=500,#sets the time interval to collect system metrics (in milliseconds)
    framework_profile_params=FrameworkProfile(local_path="/opt/ml/output/profiler/",  # the object for framework metrics profiling.
                                              start_step=5, #step at which to start profiling
                                              num_steps=10)) #the number of steps to profile

#For monitoring and profiling the built-in rules you can use the ProfilerReport. It creates a profiling report and updates when the individual rules are triggered. If you trigger this ProfilerReport rule without any customized parameter as in the cell below, then the ProfilerReport rule triggers all of the built-in rules for monitoring and profiling with their default parameter values. The profiling report can be downloaded while the Training Job is running or after the job has finished.
rules=[ProfilerRule.sagemaker(rule_configs.ProfilerReport())]

## Train model
##### Setup the RoBERTa and PyTorch script to run on SageMaker

Setup the PyTorch estimator to train our model. For more information on the PyTorch estimator, see the documentation [here](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/sagemaker.pytorch.html). 

In [None]:
#container = get_image_uri(region, 'xgboost', repo_version = 'latest')
#bt_model = sagemaker.estimator.Estimator(container, role, train_instance_count, train_instalnce_type, rtain_volume_size, train_max_run, input_mode, output_path, sagemaker_session)

from sagemaker.pytorch import PyTorch as PyTorchEstimator

#define estimator
estimator = PyTorchEstimator(
    entry_point='train.py', source_dir='./c2w2/src', role=role, instance_count=train_instance_count, instance_type=train_instance_type, # pass train instance_type as "local" to train locally
    volume_size=train_volume_size, py_version='py3', # dynamically retrieves the correct training image (Python 3)
    framework_version='1.6.0', # dynamically retrieves the correct training image (PyTorch)
    hyperparameters=hyperparameters, metric_definitions=metric_definitions, input_mode=input_mode, debugger_hook_config=debugger_hook_config,
    profiler_config=profiler_config, rules=rules  )

#Launch estimator/ start training
estimator.fit(inputs= data_channels, #, # train and validation input
            wait=False ) # do not wait for the job to complete before continuing

In [None]:
print(estimator.latest_training_job.describe().keys())

In [None]:
#Pull the Training Job status from the Training Job description.
training_job_name = estimator.latest_training_job.describe()['TrainingJobName']
training_job_status_primary = estimator.latest_training_job.describe()['TrainingJobStatus'] 
print('Training Job status: {}'.format(training_job_status_primary))

#Review the Training Job in the console.
from IPython.core.display import display, HTML
display(HTML('<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/jobs/{}">Training Job</a></b>'.format(region, training_job_name)))
display(HTML('<b>Review <a target="blank" href="https://s3.console.aws.amazon.com/s3/buckets/{}/{}/?region={}&tab=overview">S3 output data</a> after the Training Job has completed</b>'.format(bucket, training_job_name, region)))


In [None]:
#Wait until the ^^ Training Job ^^ completes above
#%%time
estimator.latest_training_job.wait(logs=False)
df_metrics = estimator.training_job_analytics.dataframe()
#You can query and plot the training metrics:
df_metrics.query("metric_name=='validation:accuracy'").plot(x='timestamp', y='value')

#Analyze Debugger results
display(HTML('<b>Review <a target="blank" href="https://s3.console.aws.amazon.com/s3/buckets/{}?prefix={}/">S3 debugger output data</a></b>'.format(bucket, training_job_name)))


In [None]:
#Download SageMaker debugger profiling report
profiler_report_s3_uri = "s3://{}/{}/rule-output/ProfilerReport/profiler-output".format(bucket, training_job_name)
!aws s3 ls $profiler_report_s3_uri/
!aws s3 cp --recursive $profiler_report_s3_uri ./profiler_report/

#review the profiler report in the console
display(HTML('<b>Review <a target="blank" href="./profiler_report/profiler-report.html">profiler report</a></b>'))


## Deploy the model

In [None]:
#turn off the endpoints when not in use with lambda function
#use inference pipeline for pre and post processing: can create uptp 5 containers and run the sequentially in same EC2 instance. eg: one contaier-feature processing, passing output to model inferencing cotainer which then passed to post processing which can be like if statement based on the confidence level or something..
from sagemaker.predictor import Predictor
from sagemaker.serializers import JSONLinesSerializer
from sagemaker.deserializers import JSONLinesDeserializer

#Create a custom SentimentPredictor that encapsulates a JSONLines serializer and deserializer. To be passed into the PyTorchModel it needs to be wrapped as a class.
class SentimentPredictor(Predictor):
    def __init__(self, endpoint_name, sagemaker_session):
        super().__init__(endpoint_name, sagemaker_session=sagemaker_session, serializer=JSONLinesSerializer(), deserializer=JSONLinesDeserializer())

#config a model for deployment
import time
from sagemaker.pytorch.model import PyTorchModel
timestamp = int(time.time())
pytorch_model_name = '{}-{}-{}'.format(training_job_name, 'pt', timestamp)

model = PyTorchModel(name=pytorch_model_name, model_data=estimator.model_data, predictor_cls=SentimentPredictor, entry_point='inference.py',
                     source_dir='src', framework_version='1.6.0', py_version='py3', role=role)

#deploy model to an endpoint
pytorch_endpoint_name = '{}-{}-{}'.format(training_job_name, 'pt', timestamp)
#%%time
predictor = model.deploy(initial_instance_count=1, instance_type='ml.m5.large', endpoint_name=pytorch_endpoint_name) #pass instANCE_TYPE AS "local" for local deployment

#Review the Endpoint in the AWS console
display(HTML('<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/endpoints/{}">SageMaker REST Endpoint</a></b>'.format(region, pytorch_endpoint_name)))


## Test model
Here, we will pass sample strings of text to the endpoint in order to see the sentiment.

In [None]:
inputs = [
    {"features": ["I love this product!"]},
    {"features": ["OK, but not great."]},
    {"features": ["This is not the right product."]},]

predictor = SentimentPredictor(endpoint_name=pytorch_endpoint_name, sagemaker_session=sess)
predicted_classes = predictor.predict(inputs)

for predicted_class in predicted_classes:
    print("Predicted class {} with probability {}".format(predicted_class['predicted_label'], predicted_class['probability']))