# Challenge 3

As part of this challenge you will get familar with the basic concepts of [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning/). Relevant links will provided in the Notebook and help you to solve the tasks.

Generally a very good source of information is the [Python SDK reference](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro?view=azure-ml-py) for Azure Machine learning.

## 1. Import Azure ML Python Python SDK

In [None]:
import azureml.core
print("SDK version:", azureml.core.VERSION)

## 2. Authentication and initializing Azure Machine Learning Workspace

As a first step you have to authenticate against the Azure [Machine Learning Workspace](https://ml.azure.com/). This can be achieved in different ways:

1. **Interactive Login Authentication:** The interactive authentication is suitable for local experimentation on your own computer.
2. **Azure CLI Authentication:** Azure CLI authentication is suitable if you are already using Azure CLI for managing Azure resources, and want to sign in only once.
3. **Managed Service Identity (MSI) Authentication:** The MSI authentication is suitable for automated workflows, for example as part of Azure Devops build.
4. **Service Principal Authentication:** The Service Principal authentication is suitable for automated workflows, for example as part of Azure Devops build.

For now, we will use the interactive authentication, which is the default mode when using Azure ML SDK. When you connect to your workspace using `Workspace.from_config`, you will get an interactive login dialog.

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()

Note the user you're authenticated as must have access to the subscription and resource group. If you receive an error
```
AuthenticationException: You don't have access to xxxxxx-xxxx-xxx-xxx-xxxxxxxxxx subscription. All the subscriptions that you have access to = ...
```
check that the you used correct login and entered the correct subscription ID.

Alternatively, you can also specify the details of your workspace.

In [None]:
'''
# Alternative login method

from azureml.core.authentication import InteractiveLoginAuthentication

interactive_auth = InteractiveLoginAuthentication()

ws = Workspace(subscription_id='<your-subscription-id>',
               resource_group='<your-resource-group-name>',
               workspace_name='<your-workspace-name>',
               auth=interactive_auth)
'''

After we logged in, we can print the Worspace details.

**TASK**: Print the workspace details below. See here for the workspace object reference: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py

In [None]:
print("Workspace name: " + ws.name, 
      "Azure region: " + ws.location, 
      "Subscription id: " + ws.subscription_id, 
      "Resource group: " + ws.resource_group, sep = '\n')

## 3. Upload and register data

Every workspace comes with a default [datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data) (and you can register more) which is backed by the Azure blob storage account associated with the workspace. We can use it to transfer data from local to the cloud, and create Dataset from it. We will now upload the Iris data to the default datastore (blob) within your workspace.

By creating a dataset, you create a reference to the data source location. If you applied any subsetting transformations to the dataset, they will be stored in the dataset as well. The data remains in its existing location, so no extra storage cost is incurred.

In [None]:
# List all datastores registered in the current workspace
datastores = ws.datastores
for name, datastore in datastores.items():
    print(name, datastore.datastore_type)

For this challenge we will use the [default datastore](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data#get-datastores-from-your-workspace) that comes with the Azure Machine Learning Workspace.

**TASK**: Retrieve the default datastore for this workspace.

Hint: Same link as in the previous hint.

In [None]:
# get the default datastore
datastore = ws.get_default_datastore()
print(datastore.name, datastore.datastore_type, datastore.account_name, datastore.container_name, sep="\n")

**TASK**: Upload the file `./train-dataset/iris.csv` to the target path `train-dataset/tabular/` on the default datastore.

Hint: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.azure_storage_datastore.azureblobdatastore

In [None]:
import os

datastore.upload(src_dir = os.path.join('.', 'train_dataset'),
                 target_path = 'train_dataset/image_dataset',
                 overwrite = True,
                 show_progress = True)

Now we will register a dataset in the Azure Machine Learning Workspace as a file dataset. A file dataset can be mounted to the compute engine. When you mount a file system, you attach that file system to a directory (mount point) and make it available to the system. Because mounting load files at the time of processing, it is usually faster than download.
Note: mounting is only available for Linux-based compute (DSVM/VM, AMLCompute, HDInsights).

In [None]:
from azureml.core import Dataset

file_dataset = Dataset.File.from_files(path = [(datastore, 'train_dataset/image_dataset/hymenoptera_data')])
file_dataset = file_dataset.register(workspace=ws,
                                     name='hymenoptera_data',
                                     description='hymenoptera training dataset',
                                     create_new_version = True)

file_dataset.to_path()

## 3. Create Compute Engine

In this sample, we want to train a simple scikit-learn model on a remote compute engine on Azure. To do so, we first must create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target).

In this challenge, we want to use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Once this is created, you are ready to train on your remote compute.

#### **TASK:** Create a machine learning compute target.

Create an Azure Machine Learning Compute cluster and folow the steps one to four.
1. Check whether the cluster with the given name already exists.
2. Create the configuration (this step is local and only takes a second). Use the SKU `STANDARD_D2_V2` and a maximum of 4 nodes.
3. Create the cluster (this step will take about 20 seconds)
4. Provision the VMs to bring the cluster to the initial size. This step will take about 3-5 minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.

Hint: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.computetarget?view=azure-ml-py

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# choose a name for your cluster
cluster_name = "gpucluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
                                                           max_nodes=4)

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    # can poll for a minimum number of nodes and for a specific timeout.
    # if no min node count is provided it uses the scale settings for the cluster
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

# use get_status() to get a detailed status for the current cluster. 
print(compute_target.get_status().serialize())

## 4. Create a project directory

Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on.

In [None]:
TRAIN_FOLDER_NAME = 'train'
TRAIN_FILE_NAME = 'train.py'

## 5. Create a training script 

Now you will need to create your training scripts in your project folder. This will be done in the next step. In practice, you should be able to take any custom training script as is and run it with Azure ML without having to modify your code.

If you would like to use Azure ML's [tracking and metrics](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#metrics) capabilities, you will have to add a small amount of Azure ML code inside your training script.

In `train_iris.py`, we will log some metrics to our Azure ML run. To do so, we will access the Azure ML Run object within the script:

```python
from azureml.core.run import Run
run = Run.get_context()
```

Further within `train_iris.py`, we log the kernel and penalty parameters, and the highest accuracy the model achieves:

```python
run.log('Kernel type', np.string(args.kernel))
run.log('Penalty', np.float(args.penalty))

run.log('Accuracy', np.float(accuracy))
```

These run metrics will become particularly important when we begin hyperparameter tuning our model in the "Tune model hyperparameters" section.

**TASK**: The training script below misses to log a few of the metrics. Find the `???` and complete the script

Link: https://pytorch.org/docs/stable/torchvision/models.html

In [None]:
%%writefile $TRAIN_FOLDER_NAME/$TRAIN_FILE_NAME

# Sample from torchvision references: https://github.com/pytorch/vision/tree/master/references/classification

from __future__ import print_function
import datetime
import os
import time
import sys
import onnx

import torch
import torch.utils.data
from torch import nn
import torchvision
from torchvision import transforms

from azureml.core import Dataset, Run
run = Run.get_context() # get the Azure ML run object

import utils

try:
    from apex import amp
except ImportError:
    amp = None


def train_one_epoch(model, criterion, optimizer, data_loader, device, epoch, print_freq, apex=False):
    model.train()
    metric_logger = utils.MetricLogger(delimiter="  ")
    metric_logger.add_meter('lr', utils.SmoothedValue(window_size=1, fmt='{value}'))
    metric_logger.add_meter('img/s', utils.SmoothedValue(window_size=10, fmt='{value}'))

    header = 'Epoch: [{}]'.format(epoch)
    for image, target in metric_logger.log_every(data_loader, print_freq, header):
        start_time = time.time()
        image, target = image.to(device), target.to(device)
        output = model(image)
        loss = criterion(output, target)

        optimizer.zero_grad()
        if apex:
            with amp.scale_loss(loss, optimizer) as scaled_loss:
                scaled_loss.backward()
        else:
            loss.backward()
        optimizer.step()

        acc1, acc5 = utils.accuracy(output, target, topk=(1, 5))
        batch_size = image.shape[0]
        metric_logger.update(loss=loss.item(), lr=optimizer.param_groups[0]["lr"])
        metric_logger.meters['acc1'].update(acc1.item(), n=batch_size)
        metric_logger.meters['acc5'].update(acc5.item(), n=batch_size)
        metric_logger.meters['img/s'].update(batch_size / (time.time() - start_time))
        
        # Log metrics in Azure ML
        run.log('train_loss', loss.item())
        run.log('train_acc1', acc1.item())
        run.log('train_acc5', acc5.item())
        run.log('train_img/s', batch_size / (time.time() - start_time))
        
        #run.log('train_loss_{0}'.format(epoch), loss.item())
        #run.log('train_acc1_{0}'.format(epoch), acc1.item())
        #run.log('train_acc5_{0}'.format(epoch), acc5.item())
        #run.log('train_img/s_{0}'.format(epoch), batch_size / (time.time() - start_time))
    
    # gather the stats from all processes
    metric_logger.synchronize_between_processes()

    print(' * Acc@1 {top1.global_avg:.3f} Acc@5 {top5.global_avg:.3f}'
          .format(top1=metric_logger.acc1, top5=metric_logger.acc5))
    
    # Log metrics in Azure ML
    run.log('train_acc1_ga', metric_logger.acc1.global_avg)
    run.log('train_acc5_ga', metric_logger.acc5.global_avg)

def evaluate(model, criterion, data_loader, device, epoch, print_freq=100):
    model.eval()
    metric_logger = utils.MetricLogger(delimiter="  ")
    header = 'Test:'
    with torch.no_grad():
        for image, target in metric_logger.log_every(data_loader, print_freq, header):
            image = image.to(device, non_blocking=True)
            target = target.to(device, non_blocking=True)
            output = model(image)
            loss = criterion(output, target)

            acc1, acc5 = utils.accuracy(output, target, topk=(1, 5))
            # FIXME need to take into account that the datasets
            # could have been padded in distributed setup
            batch_size = image.shape[0]
            metric_logger.update(loss=loss.item())
            metric_logger.meters['acc1'].update(acc1.item(), n=batch_size)
            metric_logger.meters['acc5'].update(acc5.item(), n=batch_size)
            
            # Log metrics in Azure ML
            run.log('val_loss', loss.item())
            run.log('val_acc1', acc1.item())
            run.log('val_acc5', acc5.item())
            
            #run.log('val_loss_{0}'.format(epoch), loss.item())
            #run.log('val_acc1_{0}'.format(epoch), acc1.item())
            #run.log('val_acc5_{0}'.format(epoch), acc5.item())
            
    # gather the stats from all processes
    metric_logger.synchronize_between_processes()

    print(' * Acc@1 {top1.global_avg:.3f} Acc@5 {top5.global_avg:.3f}'
          .format(top1=metric_logger.acc1, top5=metric_logger.acc5))
    
    # Log metrics in Azure ML
    run.log('val_acc1_ga', metric_logger.acc1.global_avg)
    run.log('val_acc5_ga', metric_logger.acc5.global_avg)
    
    return metric_logger.acc1.global_avg


def _get_cache_path(filepath):
    import hashlib
    h = hashlib.sha1(filepath.encode()).hexdigest()
    cache_path = os.path.join("~", ".torch", "vision", "datasets", "imagefolder", h[:10] + ".pt")
    cache_path = os.path.expanduser(cache_path)
    return cache_path


def load_data(traindir, valdir, cache_dataset, distributed, input_size):
    # Data loading code
    print("Loading data")
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])

    print("Loading training data")
    st = time.time()
    cache_path = _get_cache_path(traindir)
    if cache_dataset and os.path.exists(cache_path):
        # Attention, as the transforms are also cached!
        print("Loading dataset_train from {}".format(cache_path))
        dataset, _ = torch.load(cache_path)
    else:
        dataset = torchvision.datasets.ImageFolder(
            traindir,
            transforms.Compose([
                transforms.RandomResizedCrop(input_size),
                transforms.RandomHorizontalFlip(),
                transforms.ToTensor(),
                normalize,
            ]))
        if cache_dataset:
            print("Saving dataset_train to {}".format(cache_path))
            utils.mkdir(os.path.dirname(cache_path))
            utils.save_on_master((dataset, traindir), cache_path)
    print("Took", time.time() - st)

    print("Loading validation data")
    cache_path = _get_cache_path(valdir)
    if cache_dataset and os.path.exists(cache_path):
        # Attention, as the transforms are also cached!
        print("Loading dataset_test from {}".format(cache_path))
        dataset_test, _ = torch.load(cache_path)
    else:
        dataset_test = torchvision.datasets.ImageFolder(
            valdir,
            transforms.Compose([
                transforms.Resize(input_size + 32),
                transforms.CenterCrop(input_size),
                transforms.ToTensor(),
                normalize,
            ]))
        if cache_dataset:
            print("Saving dataset_test to {}".format(cache_path))
            utils.mkdir(os.path.dirname(cache_path))
            utils.save_on_master((dataset_test, valdir), cache_path)

    print("Creating data loaders")
    if distributed:
        train_sampler = torch.utils.data.distributed.DistributedSampler(dataset)
        test_sampler = torch.utils.data.distributed.DistributedSampler(dataset_test)
    else:
        train_sampler = torch.utils.data.RandomSampler(dataset)
        test_sampler = torch.utils.data.SequentialSampler(dataset_test)

    return dataset, dataset_test, train_sampler, test_sampler


def main(args):
    if args.apex:
        if sys.version_info < (3, 0):
            raise RuntimeError("Apex currently only supports Python 3. Aborting.")
        if amp is None:
            raise RuntimeError("Failed to import apex. Please install apex from https://www.github.com/nvidia/apex "
                               "to enable mixed-precision training.")

    if args.output_dir:
        utils.mkdir(args.output_dir)

    utils.init_distributed_mode(args)
    print(args)

    device = torch.device(args.device)

    torch.backends.cudnn.benchmark = True
    
    train_dir = os.path.join(args.data_path, 'train')
    val_dir = os.path.join(args.data_path, 'val')
    
    # Creating model
    print('Creating model')
    num_classes = len(train_dir)
    model, input_size, params_to_update = utils.initialize_model(num_classes, args)
    model.to(device)
    if args.distributed and args.sync_bn:
        model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model)
    
    # Loading data
    print('Loading data')
    dataset, dataset_test, train_sampler, test_sampler = load_data(train_dir, val_dir, args.cache_dataset,
                                                                   args.distributed, input_size)
    data_loader = torch.utils.data.DataLoader(
        dataset, batch_size=args.batch_size,
        sampler=train_sampler, num_workers=args.workers, pin_memory=True)

    data_loader_test = torch.utils.data.DataLoader(
        dataset_test, batch_size=args.batch_size,
        sampler=test_sampler, num_workers=args.workers, pin_memory=True)

    criterion = nn.CrossEntropyLoss()

    optimizer = torch.optim.SGD(
        params_to_update, lr=args.lr, momentum=args.momentum, weight_decay=args.weight_decay)

    if args.apex:
        model, optimizer = amp.initialize(model, optimizer, opt_level=args.apex_opt_level)

    lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=args.lr_step_size, gamma=args.lr_gamma)

    model_without_ddp = model
    if args.distributed:
        model = torch.nn.parallel.DistributedDataParallel(model) #, device_ids=[args.gpu])
        model_without_ddp = model.module

    if args.resume:
        checkpoint = torch.load(args.resume, map_location='cpu')
        model_without_ddp.load_state_dict(checkpoint['model'])
        optimizer.load_state_dict(checkpoint['optimizer'])
        lr_scheduler.load_state_dict(checkpoint['lr_scheduler'])
        args.start_epoch = checkpoint['epoch'] + 1

    if args.test_only:
        print('Testing only')
        evaluate(model, criterion, data_loader_test, device=device, epoch=0)
    else:
        print('Start training')
        start_time = time.time()
        for epoch in range(args.start_epoch, args.epochs):
            if args.distributed:
                train_sampler.set_epoch(epoch)
            run.log('lr', optimizer.param_groups[0]['lr']) # Log metrics in Azure ML
            train_one_epoch(model, criterion, optimizer, data_loader, device, epoch, args.print_freq, args.apex)
            lr_scheduler.step()
            evaluate(model, criterion, data_loader_test, device, epoch)
            if args.output_dir:
                checkpoint = {
                    'model': model_without_ddp.state_dict(),
                    'optimizer': optimizer.state_dict(),
                    'lr_scheduler': lr_scheduler.state_dict(),
                    'epoch': epoch,
                    'args': args}
                utils.save_on_master(
                    checkpoint,
                    os.path.join(args.output_dir, 'model_{}.pth'.format(epoch)))
                utils.save_on_master(
                    checkpoint,
                    os.path.join(args.output_dir, 'checkpoint.pth'))

        total_time = time.time() - start_time
        total_time_str = str(datetime.timedelta(seconds=int(total_time)))
        print('Training time {}'.format(total_time_str))
    
    # Save model as pt and ONNX
    dummy_input = torch.randn(args.batch_size, 3, input_size, input_size, requires_grad=True, device=device)
    if isinstance(model, torch.nn.DataParallel) or isinstance(model, torch.nn.parallel.DistributedDataParallel):
        model = model.module
    torch.save(model, os.path.join(args.output_dir, 'model.pt'))
    torch.onnx.export(model,
                      dummy_input,
                      os.path.join(args.output_dir, 'model.onnx'),
                      export_params=True,
                      opset_version=10,
                      do_constant_folding=True,
                      verbose=True,
                      input_names = ['input'],
                      output_names = ['output'],
                      dynamic_axes={'input' : {0 : 'batch_size'},
                                    'output' : {0 : 'batch_size'}})
    onnx_model = onnx.load(os.path.join(args.output_dir, 'model.onnx'))
    onnx.checker.check_model(onnx_model)

def parse_args():
    import argparse
    parser = argparse.ArgumentParser(description='PyTorch Classification Training')
    
    # Training parameters
    parser.add_argument('--data-path', dest='data_path', default='/tmp/dataset/', help='dataset')
    parser.add_argument('--dataset-name', dest='dataset_name', default=None, help='dataset name')
    parser.add_argument('--model', default='resnet18', help='model')
    parser.add_argument('--device', default='cuda', help='device')
    parser.add_argument('-b', '--batch-size', default=32, type=int)
    parser.add_argument('--epochs', default=90, type=int, metavar='N', help='number of total epochs to run')
    parser.add_argument('-j', '--workers', default=16, type=int, metavar='N', help='number of data loading workers (default: 16)')
    parser.add_argument('--lr', default=0.1, type=float, help='initial learning rate')
    parser.add_argument('--momentum', default=0.9, type=float, metavar='M', help='momentum')
    parser.add_argument('--wd', '--weight-decay', default=1e-4, type=float,
                        metavar='W', help='weight decay (default: 1e-4)',
                        dest='weight_decay')
    parser.add_argument('--lr-step-size', default=30, type=int, help='decrease lr every step-size epochs')
    parser.add_argument('--lr-gamma', default=0.1, type=float, help='decrease lr by a factor of lr-gamma')
    parser.add_argument('--print-freq', default=10, type=int, help='print frequency')
    parser.add_argument('--output-dir', default='outputs', help='path where to save')
    parser.add_argument('--resume', default='', help='resume from checkpoint')
    parser.add_argument('--start-epoch', default=0, type=int, metavar='N',
                        help='start epoch')
    parser.add_argument('--cache-dataset', dest='cache_dataset',
                        help='Cache the datasets for quicker initialization. It also serializes the transforms',
                        action='store_true')
    parser.add_argument('--sync-bn', dest='sync_bn', help='Use sync batch norm', action='store_true')
    parser.add_argument('--test-only', dest='test_only', help='Only test the model', action='store_true')
    parser.add_argument('--pretrained', dest='pretrained', help='Use pre-trained models from the modelzoo', action='store_true')
    parser.add_argument('--finetuning', dest='finetuning', help='Only finetune last layer', action='store_true')
    
    # Mixed precision training parameters
    parser.add_argument('--apex', action='store_true', help='Use apex for mixed precision training')
    parser.add_argument('--apex-opt-level', default='O1', type=str,
                        help='For apex mixed precision training'
                             'O0 for FP32 training, O1 for mixed precision training.'
                             'For further detail, see https://github.com/NVIDIA/apex/tree/master/examples/imagenet')
    
    # Distributed training parameters
    parser.add_argument('--world-size', dest='world_size', default=1, type=int, help='number of distributed processes')
    parser.add_argument('--dist-backend', default='nccl', type=str, help='distributed backend')
    parser.add_argument('--dist-url', type=str, help='url used to set up distributed training')
    parser.add_argument('--rank', default=-1, type=int, help='rank of the worker')
    
    args = parser.parse_args()
    
    # Load data path from run
    try:
        args.data_path = run.input_datasets[args.dataset_name]
    except:
        print('Could not find registered dataset. Loading default data path.')
    return args

if __name__ == "__main__":
    args = parse_args()
    main(args)


## 6. Create an experiment

An *Experiment* is a logical container in an Azure ML Workspace that represents a collection of trials (individual model runs). It hosts run records which can include run metrics and output artifacts from your experiments.

**TASK**: Fill in the missing values below to create a new experiment in your workspace

In [None]:
from azureml.core import Experiment
exp = Experiment(workspace=ws, name='ch3-pytorch_sample')

## 7. Create Estimator

An estimator object is used to submit the run. Azure Machine Learning has pre-configured estimators for common machine learning frameworks, as well as generic Estimator. Create a generic estimator for by specifying

- The name of the estimator object, est
- The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution.
- The training script name, train_titanic.py
- The input Dataset for training
- The compute target. In this case you will use the AmlCompute you created
- The environment definition for the experiment

**TASK**: Complete the estimator creation below.

Hint: https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py

In [None]:
from azureml.train.dnn import PyTorch, Nccl

script_params = {
    '--dataset-name': 'hymenoptera_data',
    '--dist-backend': 'nccl',
    '--dist-url': '$AZ_BATCHAI_PYTORCH_INIT_METHOD',
    '--rank': '$AZ_BATCHAI_TASK_INDEX',
    '--world-size': 2,
    '--epochs': 4,
    '--pretrained': None,
    '--finetuning': None
}

est = PyTorch(source_directory=TRAIN_FOLDER_NAME,
              entry_script=TRAIN_FILE_NAME,
              script_params=script_params,
              compute_target=compute_target,
              node_count=2,
              inputs=[file_dataset.as_named_input('hymenoptera_data').as_mount('tmp/dataset_new')],
              distributed_training=Nccl(),
              use_gpu=True,
              framework_version='1.3',
              pip_packages=['azureml-dataprep[pandas,fuse]', 'onnx', 'Pillow==6.1'])

In [None]:
'''
# Alternative
from azureml.train.dnn import PyTorch, Gloo

script_params = {
    '--dataset-name': 'hymenoptera_data'
    '--dist-backend' : 'gloo',
    '--dist-url': '$AZ_BATCHAI_PYTORCH_INIT_METHOD',
    '--rank': '$AZ_BATCHAI_TASK_INDEX',
    '--world-size': 1,
    '--epochs': 1
}

est = PyTorch(source_directory=TRAIN_FOLDER_NAME,
              entry_script=TRAIN_FILE_NAME,
              script_params=script_params,
              compute_target=compute_target,
              node_count=1,
              inputs=[file_dataset.as_named_input('hymenoptera_data').as_mount('tmp/dataset')],
              distributed_training=Gloo(),
              use_gpu=True,
              framework_version='1.3',
              pip_packages=['azureml-dataprep[pandas,fuse]', 'onnx', 'Pillow==6.1'])
              
# not distributed
from azureml.train.dnn import PyTorch

script_params = {
    '--dataset-name': 'hymenoptera_data',
    '--dist-backend': 'nccl',
    '--dist-url': '$AZ_BATCHAI_PYTORCH_INIT_METHOD',
    '--rank': '$AZ_BATCHAI_TASK_INDEX',
    '--world-size': 1,
    '--epochs': 1,
    '--pretrained': None,
    '--finetuning': None
}

est = PyTorch(source_directory=TRAIN_FOLDER_NAME,
              entry_script=TRAIN_FILE_NAME,
              script_params=script_params,
              compute_target=compute_target,
              node_count=1,
              inputs=[file_dataset.as_named_input('hymenoptera_data').as_mount('tmp/dataset')],
              use_gpu=True,
              framework_version='1.3',
              pip_packages=['azureml-dataprep[pandas,fuse]', 'onnx', 'Pillow==6.1'])

'''

## 8. Submit the job

Submit the estimator to the Azure ML experiment to kick off the execution.

**TASK**: Submit the experiment as a new run.

Hint: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.experiment%28class%29?view=azure-ml-py#methods

In [None]:
run = exp.submit(est)
run.wait_for_completion(show_output=True, wait_post_processing=True)

In [None]:
#run.cancel()

You now have a model trained on a remote cluster. Retrieve all the metrics logged during the run, including the accuracy of the model:

**TASK**: Retrieve the metrics of the run.

Hint: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run%28class%29?view=azure-ml-py#methods

In [None]:
run.get_metrics()

In [None]:
# register model from run
from azureml.core import Model

model = run.register_model(model_name='ch3-pytorch-model',
                           model_path='outputs/model.onnx',
                           model_framework=Model.Framework.ONNX,
                           model_framework_version='1.3',
                           datasets=[('Training dataset', file_dataset)],
                           description='PyTorch hymenoptera classification.',
                           tags={'area': 'hymenoptera_data', 'type': 'pytorch'})