## Step 0
### Get to know the environment
- Run BASH commands from this notebook
- Go up and down the directory tree

## Step 1
### Download the Kaggle Chest X-ray (Pneumonia) Dataset
- Create a Kaggle account
- Go to the [account](https://www.kaggle.com/udacityinc/account) page.
- Create and download an API token to your personal system.

## Step 2
### Install the Kaggle API \[[Reference](https://www.kaggle.com/docs/api#installation)\]

## Step 3 
### Set up Kaggle API token \[[Reference](https://www.kaggle.com/docs/api#authentication)\]
- Move the Kaggle API token to a directory named `.kaggle` inside the home directory 

Check the directory we are in.

Create the hidden directory `.kaggle` inside the home directory

Check that the direcotry has been created.

From the GUI upload the kaggle.json API token file to the current direcotry 
then move it to the newly created directory

\[OPTIONAL\]Restrict access rights to the API token.

In [None]:
!chmod 600 /home/ec2-user/.kaggle/kaggle.json

## Step 4
### Set up the dataset in Sagemaker
- Create a directory named `data`
- Download the [pneumonia dataset](https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia) using the Kaggle API
- Unzip the dataset

## Step 5
### Explore a few data samples
- Look at the direcotry structure of the dataset
- Pay attention to the naming scheme of the image files in the NORMAL and PNEUMONIA sub-directories 
- Plot a few images from the two categories
- Is there a pronounced difference between normal and pneumonia X-rays?
- How large are the images? Is the image size fixed?

## Step 6
### Create Pytorch dataloaders for training, validation and testing
- Decide data tranformations
- Create Pytorch datasets from the folder structure
- Create dataloaders from the corresponding datasets

In [None]:
data_root = './data/chest_xray/'
train_data_dir = 'train'
test_data_dir = 'test'
val_data_dir = 'val'

In [None]:
import os
import torch
from # Add import to create a dataset from the pneumonia data structure
from # Add import for image transformations

In [1]:
## Why Imagenet?
IMAGENET_MEAN = [0.485, 0.456, 0.406]
IMAGENET_STD  = [0.229, 0.224, 0.225]

In [None]:
## All normalizations below done using the image net mean and std. deviation
## as described here: https://discuss.pytorch.org/t/how-to-preprocess-input-for-pre-trained-networks/683/2 

train_transforms = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop((224,224)),
    transforms.RandomRotation(degrees=5),
    transforms.ColorJitter(brightness=0.1, contrast=0.1),
    transforms.ToTensor(),
    transforms.Normalize(IMAGENET_MEAN,
                         IMAGENET_STD)
])

test_transforms =  transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop((224,224)),
    transforms.ToTensor(),
    transforms.Normalize(IMAGENET_MEAN,
                         IMAGENET_STD)
])


val_transforms =  transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop((224,224)),
    transforms.ToTensor(),
    transforms.Normalize(IMAGENET_MEAN,
                         IMAGENET_STD)])

In [None]:
train_dataset = ImageFolder(os.path.join(data_root,train_data_dir),transform=train_transforms)
test_dataset = ImageFolder(os.path.join(data_root,test_data_dir), transform=test_transforms)
val_dataset = ImageFolder(os.path.join(data_root,val_data_dir), transform=val_transforms)
print(train_dataset, test_dataset, val_dataset)

In [None]:
from torch.utils.data import DataLoader

In [None]:
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=True)

## Step 7
### Sanity test the data 
- Plot a few random data points

In [None]:
def denormalize(x):
    return x * IMAGENET_STD+IMAGENET_MEAN

def tensor_to_img(t):
    return t.numpy().transpose(1,2,0)

In [None]:
sample_X, sample_y = next(iter(train_loader))

In [None]:
plt.imshow(denormalize(tensor_to_img(sample_X[7])));plt.show()

## Step 8
### Shop around for a model \[[Reference](https://pytorch.org/vision/stable/models.html)\]