## Amazon SageMaker DICOM Training Overview

In this example we will demonstrate how to integrate the [MONAI](http://monai.io) framework into Amazon SageMaker, and give example code of MONAI pre-processing transforms and neural network (DenseNet) that you can use to train a medical image classification model using DICOM images directly.  

Please also visit [Build a medical image analysis pipeline on Amazon SageMaker using the MONAI framework](https://aws.amazon.com/blogs/industries/build-a-medical-image-analysis-pipeline-on-amazon-sagemaker-using-the-monai-framework/) for additional details on how to deploy the MONAI model, pipe input data from S3, and perform batch inferences using SageMaker batch transform.

For more information about the PyTorch in SageMaker, please visit [sagemaker-pytorch-containers](https://github.com/aws/sagemaker-pytorch-containers) and [sagemaker-python-sdk](https://github.com/aws/sagemaker-python-sdk) github repositories.

Sample dataset is obtained from this [source COVID-CT-MD](https://github.com/ShahinSHH/COVID-CT-MD). The total dataset contains volumetric chest CT scans (DICOM files) of 169 patients positive for COVID-19 infection, 60 patients with CAP (Community Acquired Pneumonia), and 76 normal patients. For this demo purpose, only 26 images are randomly selected. The selection and preprocessing are not included in this demo. 

## Install necessary libries

In [3]:
import os
os.getcwd()

'/home/ec2-user/SageMaker/MONAI-MedicalImage-SageMaker/Classification'

In [4]:
!pip install -r ./code/requirements.txt
!mkdir -p data



In [1]:
!pip install --upgrade torch torchvision  ## upgrade torchvision to ensure consistent performance

Collecting torch
  Using cached torch-1.10.2-cp36-cp36m-manylinux1_x86_64.whl (881.9 MB)


In [2]:
import os
from pathlib import Path
from dotenv import load_dotenv
import sagemaker 

sess = sagemaker.Session()
env_path = Path('.') / 'set.env'
load_dotenv(dotenv_path=env_path)

bucket=sess.default_bucket() ## replace with <your bucket for the dataset>
bucket_path=os.environ.get('BUCKET_PATH')
user=os.environ.get('DICOM_USER')
password = os.environ.get('DICOM_PASSWORD')
datadir = 'data'
print('Bucket: '+bucket)

Bucket: sagemaker-us-east-1-741261399688


## Upload training dataset in S3

for this demo, we only use 25 images for model training
I have already downloaded the image, save in data folder 

+ *.dcm are the dicome images
+ manifest.json stores labels for each image

In [22]:
image_file_list=os.listdir(datadir)


image_file_list = [x  for x in image_file_list if x.endswith('dcm') ]

## Preprocess the dataset and display them

In [3]:
import monai
from monai.transforms import Compose, LoadImage, Resize, ScaleIntensity, ToTensor, SqueezeDim, RandRotate,RandFlip,RandZoom
import matplotlib.pyplot as plt
# define transform functions 
## preprocess the dataset before trainining using MONAI.  Based on img.shape, this is a channel last image
train_transforms = Compose([
    LoadImage(image_only=True),
    ScaleIntensity(),
    RandRotate(range_x=15, prob=0.5, keep_size=True),
    RandFlip(spatial_axis=0, prob=0.5),
    #RandZoom(min_zoom=0.9, max_zoom=1.1, prob=0.5, keep_size=True),
    Resize(spatial_size=(512,-1)),
    ToTensor()
])
img = train_transforms(datadir+'/'+image_file_list[0])
img.shape ## check image size after preprocessing

MLflow support for Python 3.6 is deprecated and will be dropped in an upcoming release. At that point, existing Python 3.6 workflows that use MLflow will continue to work without modification, but Python 3.6 users will no longer get access to the latest MLflow features and bugfixes. We recommend that you upgrade to Python 3.7 or newer.


NameError: name 'image_file_list' is not defined

In [None]:
#Display sample of DICOM Images
inf_test = []
inf_test_label = []

trans = Compose([LoadImage(image_only=True), Resize(spatial_size=(512,-1))])
plt.subplots(2, 2, figsize=(8, 8))
for i in range(0,4):
    #s3.download_file(bucket, image_file_list[i], datadir+'/'+image_file_list[i])
    
    img = trans(datadir+'/'+image_file_list[i])
    print(img.shape)
    plt.subplot(2, 2, i + 1)
    plt.xlabel(image_file_list[i])
    plt.imshow(img, cmap='gray')
    plt.title(image_file_list[i].split('-')[0])
    inf_test.append(datadir+'/'+image_file_list[i])
    inf_test_label.append(image_file_list[i].split('-')[0])
    
plt.tight_layout()
plt.show()

In [5]:
datadir_test='test_data'

image_file_list=os.listdir(datadir_test)
image_file_list


['normal-IM0062.dcm', 'cap-IM0032.dcm', 'covid-IM0073.dcm', 'cap-IM0064.dcm']

## Data

### Create Sagemaker session and S3 location for DICOM dataset

In [31]:
import sagemaker
from sagemaker.s3 import S3Downloader, S3Uploader
import os

sess = sagemaker.Session()
role = sagemaker.get_execution_role()
#inputs='s3://dataset-pathology/CovidTrainingV2'

sess =sess.default_bucket()

key='CovidTraining'
path=os.path.join("s3://",bucket,key)
#S3Uploader.upload('./data', path) 


## IF UPLOAD THE DATA TO S3, DO THE FOLLOWING STEP. we may skip the step if the the data has already been uploaded
inputs = sess.upload_data(path=datadir, bucket=bucket,key_prefix=key)

print('input spec as an S3 path: {}'.format(inputs))

AttributeError: 'str' object has no attribute 'upload_data'

## Train Model
### Training

The ```monai_dicom.py``` script provides all the code we need for training and hosting a SageMaker model (model_fn function to load a model). The training script is very similar to a training script you might run outside of SageMaker, but you can access useful properties about the training environment through various environment variables, such as:

* SM_MODEL_DIR: A string representing the path to the directory to write model artifacts to. These artifacts are uploaded to S3 for model hosting.
* SM_NUM_GPUS: The number of gpus available in the current container.
* SM_CURRENT_HOST: The name of the current container on the container network.
* SM_HOSTS: JSON encoded list containing all the hosts .
Supposing one input channel, 'training', was used in the call to the PyTorch estimator's fit() method, the following will be set, following the format SM_CHANNEL_[channel_name]:

* SM_CHANNEL_TRAINING: A string representing the path to the directory containing data in the 'training' channel.
For more information about training environment variables, please visit [SageMaker Containers](https://github.com/aws/sagemaker-containers).

A typical training script loads data from the input channels, configures training with hyperparameters, trains a model, and saves a model to model_dir so that it can be hosted later. Hyperparameters are passed to your script as arguments and can be retrieved with an argparse.ArgumentParser instance.

In [26]:
!pygmentize source/monai_dicom_json.py

[37m# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.[39;49;00m
[37m# SPDX-License-Identifier: MIT-0[39;49;00m

[34mimport[39;49;00m [04m[36margparse[39;49;00m
[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36mlogging[39;49;00m
[34mimport[39;49;00m [04m[36mos[39;49;00m
[34mimport[39;49;00m [04m[36msys[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m[04m[36m.[39;49;00m[04m[36mdistributed[39;49;00m [34mas[39;49;00m [04m[36mdist[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m[04m[36m.[39;49;00m[04m[36mnn[39;49;00m [34mas[39;49;00m [04m[36mnn[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m[04m[36m.[39;49;00m[04m[36mnn[39;49;00m[04m[36m.[39;49;00m[04m[36mfunctional[39;49;00m [34mas[39;49;00m [04m[36mF[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m[04m[36m.[39;49;00m[04m[36mopt

## Run training in SageMaker

The `PyTorch` class allows us to run our training function as a training job on SageMaker infrastructure.  We need to configure it with our training script, an IAM role, the number of training instances, the training instance type, and hyperparameters.  In this case we are going to run our training job on ```ml.m5.2xlarge``` instance.  But this example can be ran on one or multiple, cpu or gpu instances ([full list of available instances](https://aws.amazon.com/sagemaker/pricing/instance-types/)).  The hyperparameters parameter is a dict of values that will be passed to your training script -- you can see how to access these values in the ```monai_dicom.py``` script above.

In [33]:
from sagemaker.pytorch import PyTorch

estimator = PyTorch(entry_point='train.py',
                    source_dir='code',
                    role=role,
                    framework_version='1.5.0',
                    py_version='py3',
                    instance_count=1,
                    instance_type='ml.m5.2xlarge',
                    hyperparameters={
                        'backend': 'gloo',
                        'epochs': 100
                    })

After we've constructed our PyTorch object, we can fit it using the DICOM dataset we uploaded to S3.

In [None]:
%time
estimator.fit({'train': inputs})

CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 9.06 µs
2022-04-22 12:50:12 Starting - Starting the training job...ProfilerReport-1650631812: InProgress
...
2022-04-22 12:51:06 Starting - Preparing the instances for training......
2022-04-22 12:52:07 Downloading - Downloading input data...
2022-04-22 12:52:26 Training - Downloading the training image...
2022-04-22 12:53:08 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2022-04-22 12:53:10,505 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2022-04-22 12:53:10,519 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-04-22 12:53:10,529 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2022-04-22 12:53:10,534 sagemaker_pytorch_container

[34mDistributed training - False[0m
[34mNumber of gpus available - 0[0m
[34mTraining count = 25[0m
[34mGet train data loader[0m
[34m----------[0m
[34mepoch 1/100[0m
[34minputs shape is ----- torch.Size([25, 512, 512, 1])[0m
[34minputs shape after is ----- torch.Size([25, 1, 512, 512])[0m
[34m1/0, train_loss: 1.1249[0m
[34mepoch 1 average loss: 1.1249[0m
[34m----------[0m
[34mepoch 2/100[0m
[34minputs shape is ----- torch.Size([25, 512, 512, 1])[0m
[34minputs shape after is ----- torch.Size([25, 1, 512, 512])[0m
[34m1/0, train_loss: 1.1110[0m
[34mepoch 2 average loss: 1.1110[0m
[34m----------[0m
[34mepoch 3/100[0m
[34minputs shape is ----- torch.Size([25, 512, 512, 1])[0m
[34minputs shape after is ----- torch.Size([25, 1, 512, 512])[0m
[34m1/0, train_loss: 1.1044[0m
[34mepoch 3 average loss: 1.1044[0m
[34m----------[0m
[34mepoch 4/100[0m
[34minputs shape is ----- torch.Size([25, 512, 512, 1])[0m
[34minputs shape after is ----- torch.Siz

[34m1/0, train_loss: 0.5861[0m
[34mepoch 70 average loss: 0.5861[0m
[34m----------[0m
[34mepoch 71/100[0m
[34minputs shape is ----- torch.Size([25, 512, 512, 1])[0m
[34minputs shape after is ----- torch.Size([25, 1, 512, 512])[0m
[34m1/0, train_loss: 0.6204[0m
[34mepoch 71 average loss: 0.6204[0m
[34m----------[0m
[34mepoch 72/100[0m
[34minputs shape is ----- torch.Size([25, 512, 512, 1])[0m
[34minputs shape after is ----- torch.Size([25, 1, 512, 512])[0m
[34m1/0, train_loss: 0.5834[0m
[34mepoch 72 average loss: 0.5834[0m
[34m----------[0m
[34mepoch 73/100[0m
[34minputs shape is ----- torch.Size([25, 512, 512, 1])[0m
[34minputs shape after is ----- torch.Size([25, 1, 512, 512])[0m
[34m1/0, train_loss: 0.6370[0m
[34mepoch 73 average loss: 0.6370[0m
[34m----------[0m
[34mepoch 74/100[0m
[34minputs shape is ----- torch.Size([25, 512, 512, 1])[0m
[34minputs shape after is ----- torch.Size([25, 1, 512, 512])[0m
[34m1/0, train_loss: 0.5253[0m

## Deploy the endpoint 

+ default inference with `numpy` as input

+ customized inference with `JSON` file pointing to the image file in S3 [./source/inference.py]

for further information, you may refer to [pytoch-inference-hander](https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/master/src/sagemaker_pytorch_serving_container/default_pytorch_inference_handler.py) 

In [42]:
## Option 0: default
predictor = estimator.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge')

-------!

In [None]:
model_data=estimator.__dict__['output_path']+estimator.__dict__['_current_job_name']+'/output/model.tar.gz'

In [None]:
## Option 1: BYOS
from sagemaker.pytorch.model import PyTorchModel


model = PyTorchModel(
    entry_point="inference.py", ## inference code with customerization
    source_dir="code",        ## folder with the inference code
    role=role,
    model_data=model_data,
    framework_version="1.5.0",
    py_version="py3",
)
predictor2 = model.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge',entry_point='inference.py',source_dir='code',
                            serializer=sagemaker.serializers.JSONSerializer(),deserializer=sagemaker.deserializers.JSONDeserializer())

## Inference with deployed endpoints

+ use `Numpy` dataset as input for predictor

+ use `JSON` file as input for predictor2

### Option 1: using Default handler 

In [15]:
image_file_list
inf_test_label=[x.split('-')[0] for x in image_file_list]
inf_test_label

['normal', 'cap', 'covid', 'cap']

In [25]:
from code.train import DICOMDataset
import torch

def get_val_data_loader(valX, valY):
    val_transforms = Compose([
    LoadImage(image_only=True),
    ScaleIntensity(),
    Resize(spatial_size=(512,-1)),
    ToTensor()
    ])
    
    dataset = DICOMDataset(valX, valY, val_transforms)
    return torch.utils.data.DataLoader(dataset, batch_size=1, num_workers=1)

val_loader = get_val_data_loader(image_file_list, inf_test_label)

In [26]:
class_names = [ "Normal","Cap", "Covid",]
for i, val_data in enumerate(val_loader):
    inputs = val_data[0].permute(0,3, 2, 1)
    print(inputs)
    response = predictor.predict(inputs)
    pred = torch.nn.functional.softmax(torch.tensor(response), dim=1)
    top_p, top_class = torch.topk(pred, 1)
    actual_label = val_data[1]
    print('actual class is ', val_data[1])
    
    print("predicted probability: ", pred)
    print('predicted class: '+class_names[top_class])
    print('predicted class probablity: '+str(round(top_p.item(),2)))
    print()


=== Transform input info -- LoadImage ===

=== Transform input info -- LoadImage ===
Data statistics:
Type: <class 'str'> None
Value: 01b0366f-11e3-74ee-60aa-0d3098fc4743-IM0022.dcm
Data statistics:
Type: <class 'str'> None
Value: 01b0366f-11e3-74ee-60aa-0d3098fc4743-IM0022.dcm

=== Transform input info -- LoadImage ===

=== Transform input info -- LoadImage ===

=== Transform input info -- LoadImage ===
Data statistics:
Type: <class 'str'> None
Value: f5e77d4d-cf30-cf2d-9097-a70286ee6241-IM0043.dcm
Data statistics:
Type: <class 'str'> None
Value: f5e77d4d-cf30-cf2d-9097-a70286ee6241-IM0043.dcm
Data statistics:
Type: <class 'str'> None
Value: f5e77d4d-cf30-cf2d-9097-a70286ee6241-IM0043.dcm

=== Transform input info -- LoadImage ===

=== Transform input info -- LoadImage ===

=== Transform input info -- LoadImage ===

=== Transform input info -- LoadImage ===
Data statistics:
Type: <class 'str'> None
Value: 573e8d77-550a-e889-8ff4-1e8d8944897c-IM0106.dcm
Data statistics:
Type: <class '

RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/monai/transforms/transform.py", line 82, in apply_transform
    return _apply_transform(transform, data, unpack_items)
  File "/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/monai/transforms/transform.py", line 53, in _apply_transform
    return transform(parameters)
  File "/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/monai/transforms/io/array.py", line 194, in __call__
    img = reader.read(filename)
  File "/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/monai/data/image_reader.py", line 217, in read
    img_.append(itk.imread(name, **kwargs_))
  File "/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/itk/support/extras.py", line 965, in imread
    reader = template_reader_type.New(**kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/itk/support/template_class.py", line 661, in New
    itk.ImageFileReader, False, "FileName", *args, **kwargs
  File "/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/itk/support/template_class.py", line 161, in _NewImageReader
    f"Could not create IO object for reading file {inputFileName}" + msg
RuntimeError: Could not create IO object for reading file 01b0366f-11e3-74ee-60aa-0d3098fc4743-IM0022.dcm
The file doesn't exist. 
Filename = 01b0366f-11e3-74ee-60aa-0d3098fc4743-IM0022.dcm

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ec2-user/SageMaker/MONAI-MedicalImage-SageMaker/Classification/code/train.py", line 42, in __getitem__
    return self.transforms(self.image_files[index]), self.labels[index]
  File "/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/monai/transforms/compose.py", line 160, in __call__
    input_ = apply_transform(_transform, input_, self.map_items, self.unpack_items)
  File "/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/monai/transforms/transform.py", line 106, in apply_transform
    raise RuntimeError(f"applying transform {transform}") from e
RuntimeError: applying transform <monai.transforms.io.array.LoadImage object at 0x7f27b4ea6908>


### Option 2: using handler with image input in S3

In [6]:
datadir_test


'test_data'

In [7]:
key='test_data'
#S3Uploader.upload('./data', path) 


## IF UPLOAD THE DATA TO S3, DO THE FOLLOWING STEP. we may skip the step if the the data has already been uploaded
test_input = sess.upload_data(path=datadir_test, bucket=bucket,key_prefix=key)



In [9]:
from sagemaker.predictor import Predictor
sess =sagemaker.Session()
predictor2=Predictor(endpoint_name='pytorch-inference-2022-04-22-13-30-08-191', sagemaker_session=sess, serializer=sagemaker.serializers.JSONSerializer(),
    deserializer=sagemaker.deserializers.JSONDeserializer())
                    # serializer=<sagemaker.serializers.IdentitySerializer object>, deserializer=<sagemaker.deserializers.BytesDeserializer object>, **kwargs)

In [11]:
bucket

'sagemaker-us-east-1-741261399688'

In [12]:
%%time
payload={"bucket": bucket,
    "key":"test_data/normal-IM0062.dcm"}

predictor2.predict(payload)

CPU times: user 4.48 ms, sys: 333 µs, total: 4.81 ms
Wall time: 2.67 s


{'results': {'class': 'Normal', 'probability': 0.89}}