# Object Detection with PyTorch and Mask R-CNN 

In this tutorial, you will finetune a pre-trained [Mask R-CNN](https://arxiv.org/abs/1703.06870) model on images from the [Penn-Fudan Database for Pedestrian Detection and Segmentation](https://www.cis.upenn.edu/~jshi/ped_html/). The dataset has 170 images with 345 instances of pedestrians.

## Prerequisities

- If you are using an Azure Machine Learning Notebook VM, your environment already meets these prerequisites. Otherwise, go through the [Configuration](https://docs.microsoft.com/azure/machine-learning/how-to-configure-environment) steps to install the Azure Machine Learning Python SDK and [create an Azure ML Workspace](https://docs.microsoft.com/azure/machine-learning/how-to-manage-workspace#create-a-workspace).


In [21]:
# Check core SDK version number
import azureml.core

print("SDK version:", azureml.core.VERSION)

SDK version: 1.0.65


## Diagnostics

Opt-in diagnostics for better experience, quality, and security in future releases.

In [22]:
from azureml.telemetry import set_diagnostics_collection

set_diagnostics_collection(send_diagnostics=True)

Turning diagnostics collection on. 


## Initialize a workspace

Initialize a [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`, using the [from_config()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace(class)?view=azure-ml-py#from-config-path-none--auth-none---logger-none---file-name-none-) method.

In [23]:
from azureml.core.workspace import Workspace

ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep='\n')

Workspace name: gopalv-ws
Azure region: westus2
Subscription id: 15ae9cb6-95c1-483d-a0e3-b1a1a3b06324
Resource group: aifxdemo


## Create or attach existing Azure ML Managed Compute

You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/concept-compute-target) for training your model. In this tutorial, we use [Azure ML managed compute](https://docs.microsoft.com/azure/machine-learning/how-to-set-up-training-targets#amlcompute) for our remote training compute resource. Specifically, the below code creates a `STANDARD_NC6` GPU cluster that autoscales from 0 to 4 nodes.

**Creation of Compute takes approximately 5 minutes.** If the Aauzre ML Compute with that name is already in your workspace, this code will skip the creation process. 

As with other Azure servies, there are limits on certain resources associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/how-to-manage-quotas) on the default limits and how to request more quota.

> Note that the below code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`.

In [24]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Avoid committing username and password in Git history, useful for debugging GPU usage

login_strings = []
with open('username-pass.txt', 'r') as file:
    for line in file.readlines():
        login_strings.append(line.rstrip('\n'))

print(login_strings)

# choose a name for your cluster
cluster_name = 'gpu-cluster'

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', 
                                                           max_nodes=4,
                                                           admin_username=login_strings[0],
                                                           admin_user_password=login_strings[1])

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True)

# use get_status() to get a detailed status for the current cluster. 
print(compute_target.get_status().serialize())

['demouser', 'demo@pass123']
Found existing compute target.
{'currentNodeCount': 0, 'targetNodeCount': 0, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2020-03-06T00:44:26.438000+00:00', 'errors': None, 'creationTime': '2020-03-05T14:55:28.441267+00:00', 'modifiedTime': '2020-03-05T15:05:30.544256+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_NC6'}


## Train model on the remote compute

### Create a project directory
Create a directory that will contain all the code from your local machine that you will need access to on the remote resource. This includes the training script an any additional files your training script depends on.

In [25]:
import os

project_folder = './pytorch-peds'

try:
    os.makedirs(project_folder, exist_ok=False)
except FileExistsError:
    print(f'project folder {project_folder} exists, moving on...')

project folder ./pytorch-peds exists, moving on...


Possibly helpful: [this link](https://github.com/drabastomek/GTC/blob/master/SJ_2020/workshop/1_Setup/Setup.ipynb), and this sample dockerfile from Jordan:

```
FROM mcr.microsoft.com/azureml/base-gpu:intelmpi2018.3-cuda9.0-cudnn7-ubuntu16.04

# Install Horovod, temporarily using CUDA stubsddd 
RUN ldconfig /usr/local/cuda/lib64/stubs && \     
# Install AzureML SDK     
pip install --no-cache-dir azureml-defaults && \     
# Install PyTorch     
pip install --no-cache-dir tensorflow==2.0.0b1 tensorflow-gpu==2.0.0b1 keras==2.0.8 matplotlib==3.0.3 seaborn==0.9.0 requests==2.21.0 bs4==0.0.1 imageio==2.5.0 sklearn pandas==0.24.2 numpy==1.16.2 hickle==3.4.3 && \     
# Install Horovod     
pip install --no-cache-dir horovod==0.13.5 && \     ldconfig
```

### Copy training script and dependencies into project directory

In [26]:
import shutil

shutil.copy('data.py', project_folder)
shutil.copy('model.py', project_folder)
shutil.copy('script.py', project_folder)

files_to_copy = ['utils', 'transforms', 'coco_eval', 'engine', 'coco_utils']
for file in files_to_copy:
    shutil.copy('./'+ file + '.py', project_folder)

In [27]:
# !git clone https://github.com/pytorch/vision.git

# !git checkout v0.3.0

# %cd vision
# !cp references/detection/utils.py ../
# !cp references/detection/transforms.py ../
# !cp references/detection/coco_eval.py ../
# !cp references/detection/engine.py ../
# !cp references/detection/coco_utils.py ../
# %cd ..



### Download data and upload to Azure blob storage

First we download the sample dataset, and extract the images into local storage.

In [28]:
import urllib.request

from zipfile import ZipFile

data_file = './test.zip'

urllib.request.urlretrieve('https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip', data_file)
zip = ZipFile(file=data_file)
zip.extractall()
!ls PennFudanPed/

Annotation  PNGImages  PedMasks  added-object-list.txt	readme.txt


Then, we upload the data files to the datastore associated with this workspace, so that we can access them during training.

In [29]:
# get the default datastore
ds = ws.get_default_datastore()
print(ds.name, ds.datastore_type, ds.account_name, ds.container_name)

ds.upload('./PennFudanPed', target_path='data', overwrite=False)

workspaceblobstore AzureBlob gopalvws3790775563 azureml-blobstore-e47496c6-9688-4277-a05b-ceb722514b9d
Uploading an estimated of 512 files
Target already exists. Skipping upload for data/added-object-list.txt
Target already exists. Skipping upload for data/readme.txt
Target already exists. Skipping upload for data/Annotation/FudanPed00001.txt
Target already exists. Skipping upload for data/Annotation/FudanPed00002.txt
Target already exists. Skipping upload for data/Annotation/FudanPed00003.txt
Target already exists. Skipping upload for data/Annotation/FudanPed00004.txt
Target already exists. Skipping upload for data/Annotation/FudanPed00005.txt
Target already exists. Skipping upload for data/Annotation/FudanPed00006.txt
Target already exists. Skipping upload for data/Annotation/FudanPed00007.txt
Target already exists. Skipping upload for data/Annotation/FudanPed00008.txt
Target already exists. Skipping upload for data/Annotation/FudanPed00009.txt
Target already exists. Skipping upload 

Target already exists. Skipping upload for data/Annotation/PennPed00034.txt
Target already exists. Skipping upload for data/Annotation/PennPed00035.txt
Target already exists. Skipping upload for data/Annotation/PennPed00036.txt
Target already exists. Skipping upload for data/Annotation/PennPed00037.txt
Target already exists. Skipping upload for data/Annotation/PennPed00038.txt
Target already exists. Skipping upload for data/Annotation/PennPed00039.txt
Target already exists. Skipping upload for data/Annotation/PennPed00040.txt
Target already exists. Skipping upload for data/Annotation/PennPed00041.txt
Target already exists. Skipping upload for data/Annotation/PennPed00042.txt
Target already exists. Skipping upload for data/Annotation/PennPed00043.txt
Target already exists. Skipping upload for data/Annotation/PennPed00044.txt
Target already exists. Skipping upload for data/Annotation/PennPed00045.txt
Target already exists. Skipping upload for data/Annotation/PennPed00046.txt
Target alrea

Target already exists. Skipping upload for data/PNGImages/FudanPed00051.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00052.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00053.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00054.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00055.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00056.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00057.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00058.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00059.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00060.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00061.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00062.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00063.png
Target alrea

Target already exists. Skipping upload for data/PNGImages/PennPed00092.png
Target already exists. Skipping upload for data/PNGImages/PennPed00093.png
Target already exists. Skipping upload for data/PNGImages/PennPed00094.png
Target already exists. Skipping upload for data/PNGImages/PennPed00095.png
Target already exists. Skipping upload for data/PNGImages/PennPed00096.png
Target already exists. Skipping upload for data/PedMasks/FudanPed00001_mask.png
Target already exists. Skipping upload for data/PedMasks/FudanPed00002_mask.png
Target already exists. Skipping upload for data/PedMasks/FudanPed00003_mask.png
Target already exists. Skipping upload for data/PedMasks/FudanPed00004_mask.png
Target already exists. Skipping upload for data/PedMasks/FudanPed00005_mask.png
Target already exists. Skipping upload for data/PedMasks/FudanPed00006_mask.png
Target already exists. Skipping upload for data/PedMasks/FudanPed00007_mask.png
Target already exists. Skipping upload for data/PedMasks/FudanPed

Target already exists. Skipping upload for data/PedMasks/PennPed00028_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00029_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00030_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00031_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00032_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00033_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00034_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00035_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00036_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00037_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00038_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00039_mask.png
Target already exists. Skipping upload for data/PedM

$AZUREML_DATAREFERENCE_0c5e3b1e614a45b797926f0f6914513d

### Register a dataset


In [30]:
from azureml.core import Dataset

dataset_name = 'penn_ds'
datastore_paths = [(ds, 'data')]
penn_ds = Dataset.File.from_files(path=datastore_paths)
penn_ds.register(workspace=ws,
                 name=dataset_name,
                 description='Penn Fudan pedestrian data')

{
  "source": [
    "('workspaceblobstore', 'data')"
  ],
  "definition": [
    "GetDatastoreFiles"
  ],
  "registration": {
    "name": "penn_ds",
    "version": 1,
    "description": "Penn Fudan pedestrian data",
    "workspace": "Workspace.create(name='gopalv-ws', subscription_id='15ae9cb6-95c1-483d-a0e3-b1a1a3b06324', resource_group='aifxdemo')"
  }
}

### Create an experiment

In [31]:
from azureml.core import Experiment

experiment_name = 'pytorch-peds'
experiment = Experiment(ws, name=experiment_name)

### Specify dependencies with a custom Dockerfile

There are a number of ways to [use environments](https://docs.microsoft.com/azure/machine-learning/how-to-use-environments) for specifying dependencies during model training. In this case, we use a custom Dockerfile.

In [32]:
from azureml.core import Environment

my_env = Environment(name='maskr-docker')
my_env.docker.enabled = True
with open("dockerfiles/Dockerfile1", "r") as f:
    dockerfile_contents_of_your_base_image=f.read()
my_env.docker.base_dockerfile=dockerfile_contents_of_your_base_image 
my_env.docker.base_image = None
my_env.docker.gpu_support = True
my_env.python.interpreter_path = '/opt/miniconda/bin/python'
my_env.python.user_managed_dependencies = True





### Create a ScriptRunConfig

Use the [ScriptRunConfig](https://docs.microsoft.com/python/api/azureml-core/azureml.core.scriptrunconfig?view=azure-ml-py) class to define your run. Specify the source driectory, compute target, and environment.

In [33]:
from azureml.train.dnn import PyTorch
from azureml.core import ScriptRunConfig

model_name = 'pytorch-peds'
output_dir = './outputs'
n_epochs = 10

script_args = [
    '--dataset_name', dataset_name,
    '--model_name', model_name,
    '--output_dir', output_dir,
    '--n_epochs', n_epochs
]
# Add training script to run config
runconfig = ScriptRunConfig(
    source_directory=project_folder,
    script="script.py",
    arguments=script_args)

# Attach compute target to run config
runconfig.run_config.target = cluster_name

# Uncomment the line below if you want to try this locally first
#runconfig.run_config.target = "local"

# Attach environment to run config
runconfig.run_config.environment = my_env

### Submit your run

In [34]:
# Submit run 
run = experiment.submit(runconfig)

# to get more details of your run
print(run.get_details())

{'runId': 'pytorch-peds_1583456997_6e930df6', 'target': 'gpu-cluster', 'status': 'Starting', 'properties': {'_azureml.ComputeTargetType': 'amlcompute', 'ContentSnapshotId': 'b9f35cd9-251c-44f1-ac5a-173569acab13', 'azureml.git.repository_uri': 'git@github.com:gvashishtha/pytorch-object.git', 'mlflow.source.git.repoURL': 'git@github.com:gvashishtha/pytorch-object.git', 'azureml.git.branch': 'register-model', 'mlflow.source.git.branch': 'register-model', 'azureml.git.commit': 'a64e8a5075b89ee7220623513505fd39d6016da7', 'mlflow.source.git.commit': 'a64e8a5075b89ee7220623513505fd39d6016da7', 'azureml.git.dirty': 'False', 'AzureML.DerivedImageName': 'azureml/azureml_9125fd9b495cfdec8f7bf56c6d28d91d'}, 'inputDatasets': [], 'runDefinition': {'script': 'script.py', 'useAbsolutePath': False, 'arguments': ['--dataset_name', 'penn_ds', '--model_name', 'pytorch-peds', '--output_dir', './outputs', '--n_epochs', '10'], 'sourceDirectoryDataStore': None, 'framework': 'Python', 'communicator': 'None', '

### Monitor your run

In [35]:
from azureml.widgets import RunDetails

RunDetails(run).show()
run.wait_for_completion(show_output=True)

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': True, 'log_level': 'INFO', 's…

RunId: pytorch-peds_1583456997_6e930df6
Web View: https://mlworkspace.azure.ai/portal/subscriptions/15ae9cb6-95c1-483d-a0e3-b1a1a3b06324/resourceGroups/aifxdemo/providers/Microsoft.MachineLearningServices/workspaces/gopalv-ws/experiments/pytorch-peds/runs/pytorch-peds_1583456997_6e930df6

Streaming azureml-logs/55_azureml-execution-tvmps_606dc9a2785e315fa1507484e3f75e6bbc9e9d83e1f54028bce29457fc6469a4_d.txt

2020-03-06T01:16:29Z Starting output-watcher...
2020-03-06T01:16:29Z IsDedicatedCompute == True, won't poll for Low Pri Preemption
Login Succeeded
Using default tag: latest
latest: Pulling from azureml/azureml_9125fd9b495cfdec8f7bf56c6d28d91d
7ddbc47eeb70: Pulling fs layer
c1bbdc448b72: Pulling fs layer
8c3b70e39044: Pulling fs layer
45d437916d57: Pulling fs layer
d8f1569ddae6: Pulling fs layer
85386706b020: Pulling fs layer
ee9b457b77d0: Pulling fs layer
be4f3343ecd3: Pulling fs layer
30b4effda4fd: Pulling fs layer
b398e882f414: Pulling fs layer
f2e1f2321196: Pulling fs layer
1e87

imgs is ['FudanPed00001.png', 'FudanPed00002.png', 'FudanPed00003.png', 'FudanPed00004.png', 'FudanPed00005.png', 'FudanPed00006.png', 'FudanPed00007.png', 'FudanPed00008.png', 'FudanPed00009.png', 'FudanPed00010.png', 'FudanPed00011.png', 'FudanPed00012.png', 'FudanPed00013.png', 'FudanPed00014.png', 'FudanPed00015.png', 'FudanPed00016.png', 'FudanPed00017.png', 'FudanPed00018.png', 'FudanPed00019.png', 'FudanPed00020.png', 'FudanPed00021.png', 'FudanPed00022.png', 'FudanPed00023.png', 'FudanPed00024.png', 'FudanPed00025.png', 'FudanPed00026.png', 'FudanPed00027.png', 'FudanPed00028.png', 'FudanPed00029.png', 'FudanPed00030.png', 'FudanPed00031.png', 'FudanPed00032.png', 'FudanPed00033.png', 'FudanPed00034.png', 'FudanPed00035.png', 'FudanPed00036.png', 'FudanPed00037.png', 'FudanPed00038.png', 'FudanPed00039.png', 'FudanPed00040.png', 'FudanPed00041.png', 'FudanPed00042.png', 'FudanPed00043.png', 'FudanPed00044.png', 'FudanPed00045.png', 'FudanPed00046.png', 'FudanPed00047.png', 'Fud

 77%|███████▋  | 131M/170M [00:04<00:01, 28.4MB/s]
 79%|███████▊  | 134M/170M [00:04<00:01, 29.3MB/s]
 81%|████████  | 137M/170M [00:04<00:01, 23.1MB/s]
 82%|████████▏ | 139M/170M [00:04<00:01, 22.0MB/s]
 84%|████████▍ | 143M/170M [00:04<00:01, 25.6MB/s]
 86%|████████▌ | 146M/170M [00:05<00:01, 24.8MB/s]
 89%|████████▊ | 151M/170M [00:05<00:00, 28.9MB/s]
 91%|█████████ | 154M/170M [00:05<00:00, 24.4MB/s]
 92%|█████████▏| 157M/170M [00:05<00:00, 18.8MB/s]
 93%|█████████▎| 159M/170M [00:05<00:00, 19.7MB/s]
 95%|█████████▍| 161M/170M [00:05<00:00, 20.4MB/s]
 96%|█████████▌| 163M/170M [00:05<00:00, 20.1MB/s]
 97%|█████████▋| 165M/170M [00:05<00:00, 19.9MB/s]
 99%|█████████▊| 167M/170M [00:06<00:00, 19.3MB/s]
100%|██████████| 170M/170M [00:06<00:00, 28.7MB/s]
Epoch: [0]  [ 0/60]  eta: 0:03:15  lr: 0.000090  loss: 3.5827 (3.5827)  loss_classifier: 0.7385 (0.7385)  loss_box_reg: 0.1523 (0.1523)  loss_mask: 2.6620 (2.6620)  loss_objectness: 0.0224 (0.0224)  loss_rpn_box_reg: 0.0076 (0.0076)  t

Epoch: [2]  [10/60]  eta: 0:01:20  lr: 0.005000  loss: 0.1857 (0.1740)  loss_classifier: 0.0309 (0.0300)  loss_box_reg: 0.0118 (0.0167)  loss_mask: 0.1124 (0.1182)  loss_objectness: 0.0006 (0.0011)  loss_rpn_box_reg: 0.0064 (0.0079)  time: 1.6102  data: 0.0167  max mem: 3597
Epoch: [2]  [20/60]  eta: 0:01:02  lr: 0.005000  loss: 0.1529 (0.1612)  loss_classifier: 0.0218 (0.0247)  loss_box_reg: 0.0096 (0.0132)  loss_mask: 0.1109 (0.1155)  loss_objectness: 0.0004 (0.0008)  loss_rpn_box_reg: 0.0043 (0.0070)  time: 1.5487  data: 0.0068  max mem: 3597
Epoch: [2]  [30/60]  eta: 0:00:47  lr: 0.005000  loss: 0.1613 (0.1828)  loss_classifier: 0.0218 (0.0303)  loss_box_reg: 0.0099 (0.0176)  loss_mask: 0.1147 (0.1252)  loss_objectness: 0.0003 (0.0012)  loss_rpn_box_reg: 0.0078 (0.0086)  time: 1.5862  data: 0.0069  max mem: 3597
Epoch: [2]  [40/60]  eta: 0:00:32  lr: 0.005000  loss: 0.1947 (0.1841)  loss_classifier: 0.0280 (0.0299)  loss_box_reg: 0.0160 (0.0171)  loss_mask: 0.1288 (0.1273)  loss_ob

Epoch: [4]  [ 0/60]  eta: 0:01:25  lr: 0.000500  loss: 0.1049 (0.1049)  loss_classifier: 0.0060 (0.0060)  loss_box_reg: 0.0031 (0.0031)  loss_mask: 0.0913 (0.0913)  loss_objectness: 0.0000 (0.0000)  loss_rpn_box_reg: 0.0046 (0.0046)  time: 1.4213  data: 0.1088  max mem: 3597
Epoch: [4]  [10/60]  eta: 0:01:19  lr: 0.000500  loss: 0.1587 (0.1616)  loss_classifier: 0.0235 (0.0223)  loss_box_reg: 0.0112 (0.0130)  loss_mask: 0.1101 (0.1181)  loss_objectness: 0.0004 (0.0007)  loss_rpn_box_reg: 0.0072 (0.0075)  time: 1.5991  data: 0.0161  max mem: 3597
Epoch: [4]  [20/60]  eta: 0:01:01  lr: 0.000500  loss: 0.1587 (0.1648)  loss_classifier: 0.0239 (0.0248)  loss_box_reg: 0.0092 (0.0122)  loss_mask: 0.1101 (0.1199)  loss_objectness: 0.0004 (0.0012)  loss_rpn_box_reg: 0.0063 (0.0066)  time: 1.5435  data: 0.0068  max mem: 3597
Epoch: [4]  [30/60]  eta: 0:00:46  lr: 0.000500  loss: 0.1417 (0.1679)  loss_classifier: 0.0261 (0.0269)  loss_box_reg: 0.0081 (0.0127)  loss_mask: 0.1003 (0.1194)  loss_ob

Epoch: [6]  [10/60]  eta: 0:01:26  lr: 0.000050  loss: 0.1345 (0.1427)  loss_classifier: 0.0191 (0.0203)  loss_box_reg: 0.0055 (0.0072)  loss_mask: 0.1023 (0.1092)  loss_objectness: 0.0002 (0.0005)  loss_rpn_box_reg: 0.0059 (0.0054)  time: 1.7213  data: 0.0170  max mem: 3597
Epoch: [6]  [20/60]  eta: 0:01:08  lr: 0.000050  loss: 0.1386 (0.1517)  loss_classifier: 0.0191 (0.0235)  loss_box_reg: 0.0065 (0.0103)  loss_mask: 0.1017 (0.1108)  loss_objectness: 0.0002 (0.0005)  loss_rpn_box_reg: 0.0060 (0.0067)  time: 1.6944  data: 0.0070  max mem: 3597
Epoch: [6]  [30/60]  eta: 0:00:50  lr: 0.000050  loss: 0.1517 (0.1531)  loss_classifier: 0.0223 (0.0250)  loss_box_reg: 0.0087 (0.0100)  loss_mask: 0.1017 (0.1107)  loss_objectness: 0.0003 (0.0006)  loss_rpn_box_reg: 0.0076 (0.0068)  time: 1.6688  data: 0.0073  max mem: 3597
Epoch: [6]  [40/60]  eta: 0:00:33  lr: 0.000050  loss: 0.1517 (0.1595)  loss_classifier: 0.0235 (0.0262)  loss_box_reg: 0.0087 (0.0110)  loss_mask: 0.1055 (0.1143)  loss_ob

Epoch: [8]  [10/60]  eta: 0:01:21  lr: 0.000050  loss: 0.1497 (0.1484)  loss_classifier: 0.0212 (0.0219)  loss_box_reg: 0.0056 (0.0099)  loss_mask: 0.1083 (0.1100)  loss_objectness: 0.0003 (0.0006)  loss_rpn_box_reg: 0.0048 (0.0061)  time: 1.6253  data: 0.0276  max mem: 3597
Epoch: [8]  [20/60]  eta: 0:01:04  lr: 0.000050  loss: 0.1502 (0.1577)  loss_classifier: 0.0223 (0.0224)  loss_box_reg: 0.0088 (0.0108)  loss_mask: 0.1068 (0.1162)  loss_objectness: 0.0006 (0.0010)  loss_rpn_box_reg: 0.0069 (0.0071)  time: 1.6110  data: 0.0058  max mem: 3597
Epoch: [8]  [30/60]  eta: 0:00:48  lr: 0.000050  loss: 0.1413 (0.1554)  loss_classifier: 0.0241 (0.0236)  loss_box_reg: 0.0085 (0.0113)  loss_mask: 0.1008 (0.1126)  loss_objectness: 0.0004 (0.0009)  loss_rpn_box_reg: 0.0076 (0.0071)  time: 1.5914  data: 0.0069  max mem: 3597
Epoch: [8]  [40/60]  eta: 0:00:32  lr: 0.000050  loss: 0.1458 (0.1640)  loss_classifier: 0.0257 (0.0255)  loss_box_reg: 0.0093 (0.0120)  loss_mask: 0.1096 (0.1178)  loss_ob

ActivityFailedException: ActivityFailedException:
	Message: Activity Failed:
{
    "error": {
        "message": "Could not locate the provided model_path outputs/model.pt in the set of files uploaded to the run: ['azureml-logs/55_azureml-execution-tvmps_606dc9a2785e315fa1507484e3f75e6bbc9e9d83e1f54028bce29457fc6469a4_d.txt', 'azureml-logs/65_job_prep-tvmps_606dc9a2785e315fa1507484e3f75e6bbc9e9d83e1f54028bce29457fc6469a4_d.txt', 'azureml-logs/70_driver_log.txt', 'azureml-logs/process_info.json', 'azureml-logs/process_status.json', 'logs/azureml/126_azureml.log', 'logs/azureml/job_prep_azureml.log', 'outputs/model.pt']\n                See https://aka.ms/run-logging for more details.",
        "details": [],
        "debugInfo": {
            "type": "ModelPathNotFoundException",
            "message": "ModelPathNotFoundException:\n\tMessage: Could not locate the provided model_path outputs/model.pt in the set of files uploaded to the run: ['azureml-logs/55_azureml-execution-tvmps_606dc9a2785e315fa1507484e3f75e6bbc9e9d83e1f54028bce29457fc6469a4_d.txt', 'azureml-logs/65_job_prep-tvmps_606dc9a2785e315fa1507484e3f75e6bbc9e9d83e1f54028bce29457fc6469a4_d.txt', 'azureml-logs/70_driver_log.txt', 'azureml-logs/process_info.json', 'azureml-logs/process_status.json', 'logs/azureml/126_azureml.log', 'logs/azureml/job_prep_azureml.log', 'outputs/model.pt']\n                See https://aka.ms/run-logging for more details.\n\tInnerException None\n\tErrorResponse \n{\n    \"error\": {\n        \"message\": \"Could not locate the provided model_path outputs/model.pt in the set of files uploaded to the run: ['azureml-logs/55_azureml-execution-tvmps_606dc9a2785e315fa1507484e3f75e6bbc9e9d83e1f54028bce29457fc6469a4_d.txt', 'azureml-logs/65_job_prep-tvmps_606dc9a2785e315fa1507484e3f75e6bbc9e9d83e1f54028bce29457fc6469a4_d.txt', 'azureml-logs/70_driver_log.txt', 'azureml-logs/process_info.json', 'azureml-logs/process_status.json', 'logs/azureml/126_azureml.log', 'logs/azureml/job_prep_azureml.log', 'outputs/model.pt']\\n                See https://aka.ms/run-logging for more details.\"\n    }\n}",
            "stackTrace": "  File \"/mnt/batch/tasks/shared/LS_root/jobs/gopalv-ws/azureml/pytorch-peds_1583456997_6e930df6/mounts/workspaceblobstore/azureml/pytorch-peds_1583456997_6e930df6/azureml-setup/context_manager_injector.py\", line 127, in execute_with_context\n    runpy.run_path(sys.argv[0], globals(), run_name=\"__main__\")\n  File \"/opt/miniconda/lib/python3.8/runpy.py\", line 263, in run_path\n    return _run_module_code(code, init_globals, run_name,\n  File \"/opt/miniconda/lib/python3.8/runpy.py\", line 96, in _run_module_code\n    _run_code(code, mod_globals, init_globals,\n  File \"/opt/miniconda/lib/python3.8/runpy.py\", line 86, in _run_code\n    exec(code, run_globals)\n  File \"script.py\", line 112, in <module>\n    main()\n  File \"script.py\", line 105, in main\n    model = run.register_model(\n  File \"/opt/miniconda/lib/python3.8/site-packages/azureml/core/run.py\", line 1987, in register_model\n    return self._client.register_model(\n  File \"/opt/miniconda/lib/python3.8/site-packages/azureml/_run_impl/run_history_facade.py\", line 379, in register_model\n    raise ModelPathNotFoundException(\n"
        }
    },
    "time": "0001-01-01T00:00:00.000Z"
}
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "Activity Failed:\n{\n    \"error\": {\n        \"message\": \"Could not locate the provided model_path outputs/model.pt in the set of files uploaded to the run: ['azureml-logs/55_azureml-execution-tvmps_606dc9a2785e315fa1507484e3f75e6bbc9e9d83e1f54028bce29457fc6469a4_d.txt', 'azureml-logs/65_job_prep-tvmps_606dc9a2785e315fa1507484e3f75e6bbc9e9d83e1f54028bce29457fc6469a4_d.txt', 'azureml-logs/70_driver_log.txt', 'azureml-logs/process_info.json', 'azureml-logs/process_status.json', 'logs/azureml/126_azureml.log', 'logs/azureml/job_prep_azureml.log', 'outputs/model.pt']\\n                See https://aka.ms/run-logging for more details.\",\n        \"details\": [],\n        \"debugInfo\": {\n            \"type\": \"ModelPathNotFoundException\",\n            \"message\": \"ModelPathNotFoundException:\\n\\tMessage: Could not locate the provided model_path outputs/model.pt in the set of files uploaded to the run: ['azureml-logs/55_azureml-execution-tvmps_606dc9a2785e315fa1507484e3f75e6bbc9e9d83e1f54028bce29457fc6469a4_d.txt', 'azureml-logs/65_job_prep-tvmps_606dc9a2785e315fa1507484e3f75e6bbc9e9d83e1f54028bce29457fc6469a4_d.txt', 'azureml-logs/70_driver_log.txt', 'azureml-logs/process_info.json', 'azureml-logs/process_status.json', 'logs/azureml/126_azureml.log', 'logs/azureml/job_prep_azureml.log', 'outputs/model.pt']\\n                See https://aka.ms/run-logging for more details.\\n\\tInnerException None\\n\\tErrorResponse \\n{\\n    \\\"error\\\": {\\n        \\\"message\\\": \\\"Could not locate the provided model_path outputs/model.pt in the set of files uploaded to the run: ['azureml-logs/55_azureml-execution-tvmps_606dc9a2785e315fa1507484e3f75e6bbc9e9d83e1f54028bce29457fc6469a4_d.txt', 'azureml-logs/65_job_prep-tvmps_606dc9a2785e315fa1507484e3f75e6bbc9e9d83e1f54028bce29457fc6469a4_d.txt', 'azureml-logs/70_driver_log.txt', 'azureml-logs/process_info.json', 'azureml-logs/process_status.json', 'logs/azureml/126_azureml.log', 'logs/azureml/job_prep_azureml.log', 'outputs/model.pt']\\\\n                See https://aka.ms/run-logging for more details.\\\"\\n    }\\n}\",\n            \"stackTrace\": \"  File \\\"/mnt/batch/tasks/shared/LS_root/jobs/gopalv-ws/azureml/pytorch-peds_1583456997_6e930df6/mounts/workspaceblobstore/azureml/pytorch-peds_1583456997_6e930df6/azureml-setup/context_manager_injector.py\\\", line 127, in execute_with_context\\n    runpy.run_path(sys.argv[0], globals(), run_name=\\\"__main__\\\")\\n  File \\\"/opt/miniconda/lib/python3.8/runpy.py\\\", line 263, in run_path\\n    return _run_module_code(code, init_globals, run_name,\\n  File \\\"/opt/miniconda/lib/python3.8/runpy.py\\\", line 96, in _run_module_code\\n    _run_code(code, mod_globals, init_globals,\\n  File \\\"/opt/miniconda/lib/python3.8/runpy.py\\\", line 86, in _run_code\\n    exec(code, run_globals)\\n  File \\\"script.py\\\", line 112, in <module>\\n    main()\\n  File \\\"script.py\\\", line 105, in main\\n    model = run.register_model(\\n  File \\\"/opt/miniconda/lib/python3.8/site-packages/azureml/core/run.py\\\", line 1987, in register_model\\n    return self._client.register_model(\\n  File \\\"/opt/miniconda/lib/python3.8/site-packages/azureml/_run_impl/run_history_facade.py\\\", line 379, in register_model\\n    raise ModelPathNotFoundException(\\n\"\n        }\n    },\n    \"time\": \"0001-01-01T00:00:00.000Z\"\n}"
    }
}

In [38]:
from azureml.core import Run

test = Run(experiment, run_id='pytorch-peds_1583456997_6e930df6)

In [39]:
test.get_file_names()

['azureml-logs/55_azureml-execution-tvmps_606dc9a2785e315fa1507484e3f75e6bbc9e9d83e1f54028bce29457fc6469a4_d.txt',
 'azureml-logs/65_job_prep-tvmps_606dc9a2785e315fa1507484e3f75e6bbc9e9d83e1f54028bce29457fc6469a4_d.txt',
 'azureml-logs/70_driver_log.txt',
 'azureml-logs/75_job_post-tvmps_606dc9a2785e315fa1507484e3f75e6bbc9e9d83e1f54028bce29457fc6469a4_d.txt',
 'azureml-logs/process_info.json',
 'azureml-logs/process_status.json',
 'logs/azureml/126_azureml.log',
 'logs/azureml/job_prep_azureml.log',
 'logs/azureml/job_release_azureml.log',
 'outputs/model.pt']

In [40]:
test.register_model(model_name=model_name, model_path='outputs/model.pt')

Model(workspace=Workspace.create(name='gopalv-ws', subscription_id='15ae9cb6-95c1-483d-a0e3-b1a1a3b06324', resource_group='aifxdemo'), name=pytorch-peds, id=pytorch-peds:1, version=1, tags={}, properties={})

### Get your latest run and register your model

In [45]:
from azureml.core import Model

model = Model(workspace=ws, name=model_name)

### Download your model and run predictions

We download the model parameters which were registered during the ScriptRun above, using them to initialize a model for inferencing. We then run inferencing on a single test image and display the results.

In [None]:
import torch
from azureml.core import Dataset
from data import PennFudanDataset
from script import get_transform

from model import get_instance_segmentation_model
from script import NUM_CLASSES

path = model.download(target_dir='.', exist_ok=True)

if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

model = get_instance_segmentation_model(NUM_CLASSES)

model.to(device)

model.load_state_dict(torch.load(path, map_location=device))
model.eval()

penn_ds = Dataset.get_by_name(workspace=ws, name='penn_ds')
dataset_test = PennFudanDataset(penn_ds, get_transform(train=False))


# pick one image from the test set
img, _ = dataset_test[0]
# put the model in evaluation mode
model.eval()
with torch.no_grad():
    prediction = model([img.to(device)])

# model = torch.load(path)
#torch.load(model.get_model_path(model_name='outputs/model.pt'))

### Display the input image

In [None]:
Image.fromarray(img.mul(255).permute(1, 2, 0).byte().numpy())

### Display the predicted mask

In [None]:
Image.fromarray(prediction[0]['masks'][0, 0].mul(255).byte().cpu().numpy())