## Training and Deploying with the AWS SageMaker
In order to productionize the idea of the model to the real world, AWS SageMaker is a no-match to anything. In order to deploy the model to the real world, train and deploy with the AWS SageMaker will require an engineering workflow. 
### Importing the Necessary Libraries.

In [83]:
import pandas as pd
import boto3
import sagemaker
import os

### Getting Hold of the AWS SageMaker credentials, Role and Bucket.
The Current SageMaker Session running throughout this notebook and beyond will be much required to get hold of the underlying bucket, execution role and IAM specifics and specific permissions and privillages of the current user.

In [84]:
session = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = session.default_bucket()
bucket

'sagemaker-us-west-2-782510500637'

### Uploading to Datasets to S3: This may take significant amount of time because of Large Size of Data.
Clean up of the bucket mentioned above after training the model will be required not to incur additional charges on the AWS Bills or not to exceed the free tier or credits that have been applied to the AWS account.

In [4]:
%%time
data_dir = 'data'
train_prefix = 'train_chest_xray/train'
test_prefix = 'test_chest_xray/test'
#uploading both of these two to S3 for Sagemaker Inference:
train_data = session.upload_data(os.path.join(data_dir, 'workdir'), key_prefix = train_prefix)
test_data = session.upload_data(os.path.join(data_dir, 'testdir'), key_prefix = test_prefix)

In [None]:
empty_check = []
for obj in boto3.resource('s3').Bucket(bucket).objects.all():
    empty_check.append(obj.key)
    print(obj.key)

assert len(empty_check) !=0, 'S3 bucket is empty.'

In [6]:
print(train_data)
print(test_data)

s3://sagemaker-us-west-2-782510500637/train_chest_xray
s3://sagemaker-us-west-2-782510500637/test_chest_xray


In [85]:
train_data = 's3://sagemaker-us-west-2-782510500637/train_chest_xray'

### Training the Model with the SageMaker PyTorch Estimator:
Now we train the model with SageMaker's PyTorch Estimator and pass in the location of the training dataset which is available via the **S3 Bucket** connected to this SageMaker Session. The url to the training data has been provided which will be downloaded in the training instance at the time of training and each of the image file will be passed through a cascade of **Torchvision Transformers** which will resize them and convert them to multidimensional tensors upon which our model can perform stochastic calculations, which will then be converted into Deterministic classification.  

**The training process below will require 30-45 minutes on CUDA and might take 2-3 hours on CPU to complete.**

In [119]:
from sagemaker.pytorch import PyTorch
model_prefix = 'chest_xray_model'
chest_xray_pyt = PyTorch(role = role, 
                         entry_point='train.py',
                         source_dir='sagemaker_scripts', 
                         train_instance_count=1,
                         train_instance_type = 'ml.p2.xlarge', 
                         sagemaker_session = session, 
                         framework_version='0.4.0',
                         hyperparameters={
                             'epochs':3
                         }
                        )                        

In [120]:
%%time
chest_xray_pyt.fit({'train':train_data})

2020-04-10 21:41:20 Starting - Starting the training job...
2020-04-10 21:41:21 Starting - Launching requested ML instances......
2020-04-10 21:42:24 Starting - Preparing the instances for training......
2020-04-10 21:43:44 Downloading - Downloading input data.........
2020-04-10 21:45:14 Training - Downloading the training image..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2020-04-10 21:45:34,353 sagemaker-containers INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2020-04-10 21:45:34,378 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m

2020-04-10 21:45:36 Training - Training image download completed. Training in progress.[34m2020-04-10 21:45:40,599 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2020-04-10 21:45:40,817 sagemaker-containers INFO     Module train does not provide a setup.py

In [None]:
#if no associated training jobs are found, attach the estimator with a training job. 
# chest_xray_pyt = chest_xray_pyt.attach(training_job_name='sagemaker-pytorch-2020-04-09-21-15-35-146', sagemaker_session=session) 

In [121]:
# Creating provisions for Automatic Data Capture :
from time import gmtime, strftime
prefix = 'auto_data_capture'
data_capture_prefix = '{}/datacapture'.format(prefix)
s3_capture_upload_path = 's3://{}/{}'.format(bucket, data_capture_prefix)
reports_prefix = '{}/reports'.format(prefix)
s3_report_path = 's3://{}/{}'.format(bucket,reports_prefix)

from sagemaker.model_monitor import DataCaptureConfig
endpoint_name = 'chest-xray-with-data-capt-'+strftime("%Y-%m-%d-%H-%M-%S", gmtime())

data_capture_config = DataCaptureConfig(enable_capture=True, 
                                        sampling_percentage=70, 
                                        destination_s3_uri=s3_capture_upload_path)

In [122]:
%%time
predictor = chest_xray_pyt.deploy(initial_instance_count=1,
                                  instance_type='ml.m4.xlarge', 
                                  endpoint_name = endpoint_name, 
                                  data_capture_config=data_capture_config)

---------------!CPU times: user 197 ms, sys: 7.77 ms, total: 205 ms
Wall time: 7min 32s


### Performing Inference via the Deployed Endpoint over the Test Dataset:
The following block of code may take significant amount of time as it goes ahead and sends out one single image at a time from the test dataset and performs inference over it and compares both of their labels and calculates overall accuracy. As there are 32 batches in total with 20 images per batch, so it's a time hungry process.

In [123]:
%%time
from torchvision import transforms, datasets
import torch
test_dir = 'data/testdir'
image_transformer = transforms.Compose([transforms.Resize((224, 224)), transforms.ToTensor()])
test_data = datasets.ImageFolder(test_dir, transform=image_transformer)
batch_size=20
num_workers=0
test_loader = torch.utils.data.DataLoader(test_data,
                                          batch_size=batch_size,
                                          num_workers=num_workers, 
                                          shuffle=True)
dataiter = iter(test_loader)
predictions = []
labels_target = []
for ii in range(len(dataiter)-1):
    images, labels = dataiter.next()
    for i in range(len(images)-1):
        pred = predictor.predict(images[i].unsqueeze_(0))
        pred = pred.argmax()
        predictions.append(pred)
        targ = labels.data[i].item()
        labels_target.append(targ)
from sklearn.metrics import accuracy_score
acc = accuracy_score(predictions, labels_target)
print("Test Accuracy is : ")
print(acc)

Test Accuracy is : 
0.7860780984719864
CPU times: user 25.7 s, sys: 378 ms, total: 26 s
Wall time: 8min 57s


### Performing inference over a single input image:
In order to perfrom the inference over input images the images need to go thorough a series of transformation steps like it needs to be transformed into PIL images then it needs to be resized to 224x224 and after that it needs to be transformed into a tensor. As we feed the model endpoint the transformed image as a multidimensional tensor it returns back probabilities of the input being one of the three classes. We deterministically choose the class using **argmax**.

In [138]:
%%time
import sys
import cv2
classes = ['Bacterial Pneumonia Diagnosed', 'Pneumonia Not Diagnosed', 'Viral Pneumonia Diagnosed']
im = cv2.imread(os.path.join('data/testdir/virus', os.listdir(os.path.join('data/testdir', 'virus'))[60]))
image_transformer = transforms.Compose([transforms.ToPILImage(), transforms.Resize((224, 224)), transforms.ToTensor()])
im = image_transformer(im)
pred = predictor.predict(im.unsqueeze_(0))
sys.stderr.write(classes[pred.argmax()])

CPU times: user 216 ms, sys: 4 ms, total: 220 ms
Wall time: 890 ms


Viral Pneumonia Diagnosed

In [None]:
predictor.delete_endpoint()