# Train a PyTorch Classification Model with Azure ML and AML Compute

Here we have a driver notebook that uses Azure ML Python SDK to create AmlCompute (compute cluster for training) and a PyTorch estimator to tell Azure ML where to find the right resources and how to train.

Note:
* Please use the "Python 3.6 - Azure ML" kernel for this notebook or install appropriate library versions below.

## Imports

In [None]:
import sys

# Install/upgrade the Azure ML SDK using 
# pip and the correct Python kernel with sys.executable
! {sys.executable} -m pip install --upgrade azureml-sdk==1.2.0
! {sys.prefix}/bin/pip install matplotlib
! {sys.prefix}/bin/pip install --upgrade torch==1.2 torchvision==0.3.0

In [None]:
from azureml.core import Workspace, Experiment, Datastore
from azureml.exceptions import ProjectSystemException
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core import Dataset
from azureml.train.dnn import PyTorch
from azureml.widgets import RunDetails
from azureml.core.model import Model

from torchvision import transforms, datasets

import shutil
import os
import json
import time

In [None]:
# Check core SDK version number
import azureml.core
import torch

print("SDK version: ", azureml.core.VERSION)
print("PyTorch version: ", torch.__version__)

## Diagnostics
Opt-in diagnostics for better experience, quality, and security of future releases.

In [None]:
from azureml.telemetry import set_diagnostics_collection

set_diagnostics_collection(send_diagnostics=True)

## Initialize workspace
Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`.

In [None]:
from azureml.core.workspace import Workspace

ws = Workspace.from_config(path='config.json')

## Create or Attach existing AmlCompute
You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.

**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.

As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.

In [None]:
# choose a name for your cluster - under 16 characters
cluster_name = "gpuforpytorch"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    # AML Compute config - if max_nodes are set, it becomes persistent storage that scales
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
                                                        min_nodes=0,
                                                        max_nodes=3)
    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
    compute_target.wait_for_completion(show_output=True)

Check the provisioning status of the cluster.

In [None]:
# use get_status() to get a detailed status for the current cluster. 
print(compute_target.get_status().serialize())

In [None]:
# Create a project directory and copy training script to it
project_folder = os.path.join(os.getcwd(), 'project')
os.makedirs(project_folder, exist_ok=True)
shutil.copy(os.path.join(os.getcwd(), 'pytorch_train_transfer.py'), project_folder)

## Create an experiment

Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this transfer learning PyTorch tutorial.

Think of an experiment like a scenario such as "finding images of people fighting in CCTV feeds".  An experiment usually will have many "runs" which could entail updates to the data, hyperparameters, training code itself, and other optimizations.

In [None]:
# Create an experiment
experiment_name = 'suspicious-behavior'
experiment = Experiment(ws, name=experiment_name)

## Set up Dataset

The data source is a subset of ImageNet.  It can be downloaded by clicking:  https://download.pytorch.org/tutorial/hymenoptera_data.zip.  The following steps set up the Dataset from the default data store in the Workspace and register it so that scripts and compute can access.

In [None]:
datastore_name = 'suspicious_behavior'

# Get the datastore to upload prepared data
datastore = ws.get_default_datastore()

# Create a File Dataset from 1 URL path
url_path = ['https://github.com/harris-soh-copeland-puca/SampleFiles/raw/master/caviar_small.zip']
behavior_ds = Dataset.File.from_files(path=url_path)

# Register the dataset so that scripts and compute may access
behavior_ds = behavior_ds.register(workspace=ws,
                                 name='behavior_ds',
                                 description='Subset of CAVIAR dataset')

## Train

To train the PyTorch model we are going to use a Azure ML Estimator specific to PyTorch - see [Train models with Azure Machine Learning using estimator](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-train-ml-models) for more on Estimators.  We will use the Datastore we specified earlier which mounts the Blob Storage container to the remote compute target for training in this case.

To learn more about where read and write files in a local or remote compute see [Where to save and write files for Azure Machine Learning experiments](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-save-write-experiment-files).

In [None]:
# Set up for training ("trans" flag means - use transfer learning and 
# this should download a model on compute)
# Using /tmp to store model and info due to the fact that
# creating new folders and files on the Azure Function host
# will trigger the function to restart.
script_params = {
    '--data_dir': behavior_ds.as_named_input('behavior_ds').as_mount(),
    '--num_epochs': 30,
    '--learning_rate': 0.01,
    '--output_dir': './outputs',
    '--trans': 'True'
}

In [None]:
# Instantiate PyTorch estimator with upload of final model to
# a specified blob storage container (this can be anything)
estimator = PyTorch(source_directory=project_folder, 
                    script_params=script_params,
                    compute_target=compute_target,
                    entry_script='pytorch_train_transfer.py',
                    use_gpu=True,
                    pip_packages=['matplotlib==3.1.1',
                                  'opencv-python==4.1.1.26', 
                                  'Pillow==6.2.1'],
                   framework_version='1.3')

run = experiment.submit(estimator)

Check run status.

In [None]:
RunDetails(run).show()

## Register model to workspace

This will allow accessibility to the model through the SDK in other runs or experiments.

This code is found in the training script where access exists to the run object.

```python
model = run.register_model(model_name='pt-dnn', model_path='outputs/model_finetuned.pth')
```

In [None]:
## Alternatively, register within this notebook 
# (the model_path is the Azure ML workspace model path, not local)

## Get one particular Run using run id found in Azure Portal
# from azureml.core import Run
# run = Run(experiment, run_id='suspicious-behavior-...')

# Register model to Models in workspace
model = run.register_model(model_name='suspicious-behavior-pytorch', model_path='outputs/model_finetuned.pth',
                          description='Squeezenet PyTorch model; 30 epochs; 0.01 LR')

## Evaluate model

You will need test images in the test_images folder in the following folder structure:

```
data
    \test
        \normal
        \suspicious
```

In [None]:
# Download the 40MB small dataset and unzip
! printf "y\n" | wget https://github.com/harris-soh-copeland-puca/SampleFiles/raw/master/caviar_small.zip -O caviar_small.zip
! unzip -q -o caviar_small.zip


In [None]:
# The "data" folder is in the current working directory
data_dir = '.'

Download model

In [None]:
model = Model( ws, 'suspicious-behavior-pytorch', version=1).download(exist_ok=True)
model = torch.load('model_finetuned.pth', map_location=torch.device('cpu'))

Data transforms

In [None]:
data_transforms = {
    'test': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

Datasets and dataloaders

In [None]:
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, 'data', x),
                                          data_transforms[x])
                  for x in ['test']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=1,
                                              shuffle=False, num_workers=0)
               for x in ['test']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['test']}
class_names = image_datasets['test'].classes
print(dataset_sizes['test'])

Peform inference on test data set to evaluate

In [None]:
# Iterate over data.
running_corrects = 0
for inputs, labels in dataloaders['test']:

    # Don't need to track history 
    with torch.set_grad_enabled(False):
        outputs = model(inputs)
        _, preds = torch.max(outputs, 1)
        
    # Statistics
    running_corrects += torch.sum(preds == labels.data)
    
overall_acc = running_corrects.double() / dataset_sizes['test']

In [None]:
print('Accuracy = ', overall_acc.item())

If the accuracy is very low, try using more of the CAVIAR dataset or using image augmentation to increase the size of the dataset like flipping, blurring, etc. (note, you will need to parse the images into folders for normal and suspicious behavior by hand if adding data).