![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.png)

# Local Run Using Pytorch Estimator in Azure ML

In this notebook, we use Azure ML's PyTorch estimator to run our training script locally by using the conda environment created for the tutorial.

In [3]:
import sys

sys.path.append("scripts")
sys.path.append("scripts/cocoapi/PythonAPI/")

import azureml.core
from azureml.core import Workspace, Experiment
from azureml.widgets import RunDetails
from azureml.train.dnn import PyTorch

from dotenv import set_key, get_key, find_dotenv
from utilities import get_auth, download_data

import torch
from scripts.XMLDataset import BuildDataset, get_transform
from scripts.maskrcnn_model import get_model

from PIL import Image, ImageDraw
from IPython.display import display

# check core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

Azure ML SDK Version:  1.17.0


In [4]:
env_path = find_dotenv(raise_error_if_not_found=True)

## Download data

We first download the dataset that includes the images of store shelves.

In [5]:
data_file = "Data.zip"
data_url = ("https://bostondata.blob.core.windows.net/builddata/{}".format(data_file))
download_data(data_file, data_url)

Extracting files...
Finished extracting.


## Initialize workspace
Let's load the existing workspace you created earlier in the Azure ML configuration notebook. 

In [6]:
ws = Workspace.from_config(auth=get_auth(env_path))
print(ws.name, ws.resource_group, ws.location, sep="\n")

ProjektAzure
ProjektAzure
eastus


## Create an Azure ML experiment
Let's create an experiment and give it a name. The script runs will be recorded under this experiment in Azure.

In [7]:
exp = Experiment(workspace=ws, name='torchvision')

## Use a train.py script


In [8]:
with open("scripts/train.py", "r") as f:
    print(f.read())

import os
import sys

sys.path.append("./cocoapi/PythonAPI/")

import torch
import argparse
import utils
from XMLDataset import BuildDataset, get_transform
from maskrcnn_model import get_model
from engine import train_one_epoch, evaluate

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="PyTorch Object Detection Training")
    parser.add_argument(
        "--data_path", default="./Data/", help="the path to the dataset"
    )
    parser.add_argument("--batch_size", default=2, type=int)
    parser.add_argument(
        "--epochs", default=10, type=int, help="number of total epochs to run"
    )
    parser.add_argument(
        "--workers", default=4, type=int, help="number of data loading workers"
    )
    parser.add_argument(
        "--learning_rate", default=0.005, type=float, help="initial learning rate"
    )
    parser.add_argument("--momentum", default=0.9, type=float, help="momentum")
    parser.add_argument(
        "--weight_decay",
        default=0

## Create A Pytorch Estimator

First, we pick the number of epochs to run the training for.This deliberately has a low default value for the speed of running. In actual application, set this to higher values (i.e. num_epochs = 10)

In [9]:
num_epochs = 1

In [10]:
script_params = {
    "--data_path": ".",
    "--workers": 8,
    "--learning_rate": 0.005,
    "--epochs": num_epochs,
    "--anchor_sizes": "16,32,64,128,256,512",
    "--anchor_aspect_ratios": "0.25,0.5,1.0,2.0",
    "--rpn_nms_thresh": 0.5,
    "--box_nms_thresh": 0.3,
    "--box_score_thresh": 0.10,
}

estimator = PyTorch(
    source_directory="./scripts",
    script_params=script_params,
    compute_target="local",
    entry_script="train.py",
    use_docker=False,
    user_managed=True,
    use_gpu=True,
)



Next, we point the python interpreter to the local conda environment built for this tutorial. Azure ML SDK will run the training script using this environment. We also turn off project snapshot upload to the cloud since we have a large dataset in the folder.

In [11]:
estimator.run_config.environment.python.interpreter_path = ("/anaconda/envs/azureml_py36_pytorch/bin/python")
estimator.run_config.history.snapshot_project = False

In [16]:
run = exp.submit(estimator)
RunDetails(run).show()



_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

In [17]:
run.wait_for_completion(show_output=True)

RunId: torchvision_1608659833_6a87102a
Web View: https://ml.azure.com/experiments/torchvision/runs/torchvision_1608659833_6a87102a?wsid=/subscriptions/1db0a5ce-7de1-4082-8e25-3c5a4e5a9a98/resourcegroups/ProjektAzure/workspaces/ProjektAzure

Streaming azureml-logs/70_driver_log.txt

[2020-12-22T17:57:16.103369] Entering context manager injector.
[context_manager_injector.py] Command line Options: Namespace(inject=['ProjectPythonPath:context_managers.ProjectPythonPath', 'RunHistory:context_managers.RunHistory', 'TrackUserError:context_managers.TrackUserError'], invocation=['train.py', '--data_path', '.', '--workers', '8', '--learning_rate', '0.005', '--epochs', '1', '--anchor_sizes', '16,32,64,128,256,512', '--anchor_aspect_ratios', '0.25,0.5,1.0,2.0', '--rpn_nms_thresh', '0.5', '--box_nms_thresh', '0.3', '--box_score_thresh', '0.1'])
Script type = None
Starting the daemon thread to refresh tokens in background for process with pid = 90027
Entering Run History Context Manager.
[2020-12-2

Epoch: [0]  [260/279]  eta: 0:00:33  lr: 0.004695  loss: 0.4049 (0.6976)  loss_classifier: 0.0660 (0.1006)  loss_box_reg: 0.2028 (0.1571)  loss_objectness: 0.0338 (0.3126)  loss_rpn_box_reg: 0.1022 (0.1273)  time: 1.8401  data: 0.0065  max mem: 4510
Epoch: [0]  [270/279]  eta: 0:00:16  lr: 0.004874  loss: 0.3912 (0.6892)  loss_classifier: 0.0632 (0.0995)  loss_box_reg: 0.1995 (0.1598)  loss_objectness: 0.0390 (0.3032)  loss_rpn_box_reg: 0.1022 (0.1266)  time: 1.8255  data: 0.0068  max mem: 4510
Epoch: [0]  [278/279]  eta: 0:00:01  lr: 0.005000  loss: 0.4043 (0.6818)  loss_classifier: 0.0614 (0.0984)  loss_box_reg: 0.1923 (0.1598)  loss_objectness: 0.0541 (0.2968)  loss_rpn_box_reg: 0.1198 (0.1268)  time: 1.7782  data: 0.0069  max mem: 4510
Epoch: [0] Total time: 0:08:16 (1.7787 s / it)
creating index...
index created!
Test:  [ 0/50]  eta: 0:00:59  model_time: 0.9090 (0.9090)  evaluator_time: 0.0083 (0.0083)  time: 1.1835  data: 0.2588  max mem: 4510
Test:  [49/50]  eta: 0:00:00  model_

{'runId': 'torchvision_1608659833_6a87102a',
 'target': 'local',
 'status': 'Completed',
 'startTimeUtc': '2020-12-22T17:57:15.099082Z',
 'endTimeUtc': '2020-12-22T18:06:31.563373Z',
 'properties': {'_azureml.ComputeTargetType': 'local',
  'ContentSnapshotId': None,
  'azureml.git.repository_uri': 'https://github.com/ispmor/azure-project.git',
  'mlflow.source.git.repoURL': 'https://github.com/ispmor/azure-project.git',
  'azureml.git.branch': 'main',
  'mlflow.source.git.branch': 'main',
  'azureml.git.commit': '0cd15c57f57b5894818909867bd32604aa9d5ad1',
  'mlflow.source.git.commit': '0cd15c57f57b5894818909867bd32604aa9d5ad1',
  'azureml.git.dirty': 'True'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'script': 'train.py',
  'useAbsolutePath': False,
  'arguments': ['--data_path',
   '.',
   '--workers',
   '8',
   '--learning_rate',
   '0.005',
   '--epochs',
   '1',
   '--anchor_sizes',
   '16,32,64,128,256,512',
   '--anchor_aspect_ratios',
   '0.25,0.5,1.0,2.0'

In [18]:
run.get_file_names()

['azureml-logs/60_control_log.txt',
 'azureml-logs/70_driver_log.txt',
 'logs/azureml/90027_azureml.log',
 'logs/azureml/dataprep/python_span_745bbe04-4801-4aff-a002-23ebacd7fc9c.jsonl',
 'outputs/model_latest.pth']

In [19]:
run.get_metrics()

{'mAP@IoU=0.50': 0.9566988929770025}

Let's now register this first model.

In [20]:
run.register_model(model_name="torchvision_local_model", model_path="/outputs/model_latest.pth")

Model(workspace=Workspace.create(name='ProjektAzure', subscription_id='1db0a5ce-7de1-4082-8e25-3c5a4e5a9a98', resource_group='ProjektAzure'), name=torchvision_local_model, id=torchvision_local_model:2, version=2, tags={}, properties={})

## Visualize results

Let's download our model and load it to make predictions on our data.

In [21]:
run.download_file("outputs/model_latest.pth")

In [22]:
num_classes = 2
anchor_sizes = "16,32,64,128,256,512"
anchor_aspect_ratios = "0.25,0.5,1.0,2.0"
rpn_nms_threshold = 0.5
box_nms_threshold = 0.3
box_score_threshold = 0.1
num_box_detections = 100

In [23]:
# Load Mask RCNN model
model = get_model(
    num_classes,
    anchor_sizes,
    anchor_aspect_ratios,
    rpn_nms_threshold,
    box_nms_threshold,
    box_score_threshold,
    num_box_detections,
)

In [24]:
model_path = "model_latest.pth"
model.load_state_dict(torch.load(model_path))
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model.to(device)

MaskRCNN(
  (transform): GeneralizedRCNNTransform(
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
      Resize(min_size=(800,), max_size=1333, mode='bilinear')
  )
  (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): FrozenBatchNorm2d(64)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256)
          (relu): ReLU(inplace=True)
          (downsample): 

In [25]:
# Use a random subset of the data to visualize predictions on the images.
data_path = "./scripts"
dataset = BuildDataset(data_path, get_transform(train=False))
indices = torch.randperm(len(dataset)).tolist()
dataset = torch.utils.data.Subset(dataset, indices[-50:])

In [26]:
# for i in range(len(dataset)):
#     img, _ = dataset[i]
#     model.eval()
#     with torch.no_grad():
#         prediction = model([img.to(device)])
#     img = Image.fromarray(img.mul(255).permute(1, 2, 0).byte().numpy())
#     preds = prediction[0]["boxes"].cpu().numpy()
#     print(prediction[0]["scores"])
#     draw = ImageDraw.Draw(img)
#     for i in range(len(preds)):
#         draw.rectangle(
#             ((preds[i][0], preds[i][1]), (preds[i][2], preds[i][3])), outline="red"
#         )
#     display(img)

In the next notebook, we  will [build a custom docker image and push it to Azure Container Registry](03_BuildDockerImage.ipynb). This image will be used for tunning the hyperparameters of the model on AzureMLCompute.