# Object Detection with PyTorch and Mask R-CNN 

In this tutorial, you will finetune a pre-trained [Mask R-CNN](https://arxiv.org/abs/1703.06870) model on images from the [Penn-Fudan Database for Pedestrian Detection and Segmentation](https://www.cis.upenn.edu/~jshi/ped_html/). The dataset has 170 images with 345 instances of pedestrians.

## Prerequisities

- If you are using an Azure Machine Learning Notebook VM, your environment already meets these prerequisites. Otherwise, go through the [Configuration](https://docs.microsoft.com/azure/machine-learning/how-to-configure-environment) steps to install the Azure Machine Learning Python SDK and [create an Azure ML Workspace](https://docs.microsoft.com/azure/machine-learning/how-to-manage-workspace#create-a-workspace).


In [1]:
# Check core SDK version number
import azureml.core

print("SDK version:", azureml.core.VERSION)

Failure while loading azureml_run_type_providers. Failed to load entrypoint hyperdrive = azureml.train.hyperdrive:HyperDriveRun._from_run_dto with exception (azureml-core 1.0.65.1 (/home/gopalv/miniconda3/envs/azureml/lib/python3.6/site-packages), Requirement.parse('azureml-core==1.0.69.*')).
Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.PipelineRun = azureml.pipeline.core.run:PipelineRun._from_dto with exception (azureml-core 1.0.65.1 (/home/gopalv/miniconda3/envs/azureml/lib/python3.6/site-packages), Requirement.parse('azureml-core==1.0.69.*')).
Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.ReusedStepRun = azureml.pipeline.core.run:StepRun._from_reused_dto with exception (azureml-core 1.0.65.1 (/home/gopalv/miniconda3/envs/azureml/lib/python3.6/site-packages), Requirement.parse('azureml-core==1.0.69.*')).
Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.StepRun = azureml.pip

SDK version: 1.0.65


## Diagnostics

Opt-in diagnostics for better experience, quality, and security in future releases.

In [2]:
from azureml.telemetry import set_diagnostics_collection

set_diagnostics_collection(send_diagnostics=True)

Turning diagnostics collection on. 


## Initialize a workspace

Initialize a [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`, using the [from_config()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace(class)?view=azure-ml-py#from-config-path-none--auth-none---logger-none---file-name-none-) method.

In [3]:
from azureml.core.workspace import Workspace

ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep='\n')

If you run your code in unattended mode, i.e., where you can't give a user input, then we recommend to use ServicePrincipalAuthentication or MsiAuthentication.
Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.


Workspace name: gopalv-ws
Azure region: westus2
Subscription id: 15ae9cb6-95c1-483d-a0e3-b1a1a3b06324
Resource group: aifxdemo


## Create or attach existing Azure ML Managed Compute

You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/concept-compute-target) for training your model. In this tutorial, we use [Azure ML managed compute](https://docs.microsoft.com/azure/machine-learning/how-to-set-up-training-targets#amlcompute) for our remote training compute resource. Specifically, the below code creates a `STANDARD_NC6` GPU cluster that autoscales from 0 to 4 nodes.

**Creation of Compute takes approximately 5 minutes.** If the Aauzre ML Compute with that name is already in your workspace, this code will skip the creation process. 

As with other Azure servies, there are limits on certain resources associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/how-to-manage-quotas) on the default limits and how to request more quota.

> Note that the below code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`.

In [4]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Avoid committing username and password in Git history, useful for debugging GPU usage

login_strings = []
with open('username-pass.txt', 'r') as file:
    for line in file.readlines():
        login_strings.append(line.rstrip('\n'))

print(login_strings)

# choose a name for your cluster
cluster_name = 'gpu-cluster'

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', 
                                                           max_nodes=4,
                                                           admin_username=login_strings[0],
                                                           admin_user_password=login_strings[1])

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True)

# use get_status() to get a detailed status for the current cluster. 
print(compute_target.get_status().serialize())

['demouser', 'demo@pass123']
Found existing compute target.
{'currentNodeCount': 1, 'targetNodeCount': 1, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 1, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2020-03-05T16:21:52.965000+00:00', 'errors': None, 'creationTime': '2020-03-05T14:55:28.441267+00:00', 'modifiedTime': '2020-03-05T15:05:30.544256+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_NC6'}


## Train model on the remote compute

### Create a project directory
Create a directory that will contain all the code from your local machine that you will need access to on the remote resource. This includes the training script an any additional files your training script depends on.

In [5]:
import os

project_folder = './pytorch-peds'

try:
    os.makedirs(project_folder, exist_ok=False)
except FileExistsError:
    print(f'project folder {project_folder} exists, moving on...')

project folder ./pytorch-peds exists, moving on...


Possibly helpful: [this link](https://github.com/drabastomek/GTC/blob/master/SJ_2020/workshop/1_Setup/Setup.ipynb), and this sample dockerfile from Jordan:

```
FROM mcr.microsoft.com/azureml/base-gpu:intelmpi2018.3-cuda9.0-cudnn7-ubuntu16.04

# Install Horovod, temporarily using CUDA stubsddd 
RUN ldconfig /usr/local/cuda/lib64/stubs && \     
# Install AzureML SDK     
pip install --no-cache-dir azureml-defaults && \     
# Install PyTorch     
pip install --no-cache-dir tensorflow==2.0.0b1 tensorflow-gpu==2.0.0b1 keras==2.0.8 matplotlib==3.0.3 seaborn==0.9.0 requests==2.21.0 bs4==0.0.1 imageio==2.5.0 sklearn pandas==0.24.2 numpy==1.16.2 hickle==3.4.3 && \     
# Install Horovod     
pip install --no-cache-dir horovod==0.13.5 && \     ldconfig
```

### Copy training script and dependencies into project directory

In [31]:
import shutil

shutil.copy('data.py', project_folder)
shutil.copy('model.py', project_folder)
shutil.copy('script.py', project_folder)

files_to_copy = ['utils', 'transforms', 'coco_eval', 'engine', 'coco_utils']
for file in files_to_copy:
    shutil.copy('./'+ file + '.py', project_folder)

'./pytorch-peds/script.py'

In [30]:
# !git clone https://github.com/pytorch/vision.git

# !git checkout v0.3.0

# %cd vision
# !cp references/detection/utils.py ../
# !cp references/detection/transforms.py ../
# !cp references/detection/coco_eval.py ../
# !cp references/detection/engine.py ../
# !cp references/detection/coco_utils.py ../
# %cd ..



### Download data and upload to Azure blob storage

First we download the sample dataset, and extract the images into local storage.

In [8]:
import urllib.request

from zipfile import ZipFile

data_file = './test.zip'

urllib.request.urlretrieve('https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip', data_file)
zip = ZipFile(file=data_file)
zip.extractall()
!ls PennFudanPed/

Annotation  PNGImages  PedMasks  added-object-list.txt	readme.txt


Then, we upload the data files to the datastore associated with this workspace, so that we can access them during training.

In [9]:
# get the default datastore
ds = ws.get_default_datastore()
print(ds.name, ds.datastore_type, ds.account_name, ds.container_name)

ds.upload('./PennFudanPed', target_path='data', overwrite=False)

workspaceblobstore AzureBlob gopalvws3790775563 azureml-blobstore-e47496c6-9688-4277-a05b-ceb722514b9d
Uploading an estimated of 512 files
Target already exists. Skipping upload for data/added-object-list.txt
Target already exists. Skipping upload for data/readme.txt
Target already exists. Skipping upload for data/Annotation/FudanPed00001.txt
Target already exists. Skipping upload for data/Annotation/FudanPed00002.txt
Target already exists. Skipping upload for data/Annotation/FudanPed00003.txt
Target already exists. Skipping upload for data/Annotation/FudanPed00004.txt
Target already exists. Skipping upload for data/Annotation/FudanPed00005.txt
Target already exists. Skipping upload for data/Annotation/FudanPed00006.txt
Target already exists. Skipping upload for data/Annotation/FudanPed00007.txt
Target already exists. Skipping upload for data/Annotation/FudanPed00008.txt
Target already exists. Skipping upload for data/Annotation/FudanPed00009.txt
Target already exists. Skipping upload 

Target already exists. Skipping upload for data/Annotation/PennPed00038.txt
Target already exists. Skipping upload for data/Annotation/PennPed00039.txt
Target already exists. Skipping upload for data/Annotation/PennPed00040.txt
Target already exists. Skipping upload for data/Annotation/PennPed00041.txt
Target already exists. Skipping upload for data/Annotation/PennPed00042.txt
Target already exists. Skipping upload for data/Annotation/PennPed00043.txt
Target already exists. Skipping upload for data/Annotation/PennPed00044.txt
Target already exists. Skipping upload for data/Annotation/PennPed00045.txt
Target already exists. Skipping upload for data/Annotation/PennPed00046.txt
Target already exists. Skipping upload for data/Annotation/PennPed00047.txt
Target already exists. Skipping upload for data/Annotation/PennPed00048.txt
Target already exists. Skipping upload for data/Annotation/PennPed00049.txt
Target already exists. Skipping upload for data/Annotation/PennPed00050.txt
Target alrea

Target already exists. Skipping upload for data/PNGImages/FudanPed00052.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00053.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00054.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00055.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00056.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00057.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00058.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00059.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00060.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00061.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00062.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00063.png
Target already exists. Skipping upload for data/PNGImages/FudanPed00064.png
Target alrea

Target already exists. Skipping upload for data/PNGImages/PennPed00087.png
Target already exists. Skipping upload for data/PNGImages/PennPed00088.png
Target already exists. Skipping upload for data/PNGImages/PennPed00089.png
Target already exists. Skipping upload for data/PNGImages/PennPed00090.png
Target already exists. Skipping upload for data/PNGImages/PennPed00091.png
Target already exists. Skipping upload for data/PNGImages/PennPed00092.png
Target already exists. Skipping upload for data/PNGImages/PennPed00093.png
Target already exists. Skipping upload for data/PNGImages/PennPed00094.png
Target already exists. Skipping upload for data/PNGImages/PennPed00095.png
Target already exists. Skipping upload for data/PNGImages/PennPed00096.png
Target already exists. Skipping upload for data/PedMasks/FudanPed00001_mask.png
Target already exists. Skipping upload for data/PedMasks/FudanPed00002_mask.png
Target already exists. Skipping upload for data/PedMasks/FudanPed00003_mask.png
Target alr

Target already exists. Skipping upload for data/PedMasks/PennPed00025_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00026_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00027_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00028_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00029_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00030_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00031_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00032_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00033_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00034_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00035_mask.png
Target already exists. Skipping upload for data/PedMasks/PennPed00036_mask.png
Target already exists. Skipping upload for data/PedM

$AZUREML_DATAREFERENCE_a3b34009c18b402781d977ec39f7096f

### Register a dataset


In [34]:
from azureml.core import Dataset

datastore_paths = [(ds, 'data')]
penn_ds = Dataset.File.from_files(path=datastore_paths)
penn_ds.register(workspace=ws,
                 name='penn_ds',
                 description='Penn Fudan pedestrian data')

{
  "source": [
    "('workspaceblobstore', 'data')"
  ],
  "definition": [
    "GetDatastoreFiles"
  ],
  "registration": {
    "name": "penn_ds",
    "version": 1,
    "description": "Penn Fudan pedestrian data",
    "workspace": "Workspace.create(name='gopalv-ws', subscription_id='15ae9cb6-95c1-483d-a0e3-b1a1a3b06324', resource_group='aifxdemo')"
  }
}

### Create an experiment

In [35]:
from azureml.core import Experiment

experiment_name = 'pytorch-peds'
experiment = Experiment(ws, name=experiment_name)

### Specify dependencies with a custom Dockerfile

There are a number of ways to [use environments](https://docs.microsoft.com/azure/machine-learning/how-to-use-environments) for specifying dependencies during model training. In this case, we use a custom Dockerfile.

In [36]:
from azureml.core import Environment

my_env = Environment(name='maskr-docker')
my_env.docker.enabled = True
with open("dockerfiles/Dockerfile1", "r") as f:
    dockerfile_contents_of_your_base_image=f.read()
my_env.docker.base_dockerfile=dockerfile_contents_of_your_base_image 
my_env.docker.base_image = None
my_env.docker.gpu_support = True
my_env.python.interpreter_path = '/opt/miniconda/bin/python'
my_env.python.user_managed_dependencies = True





### Create a ScriptRunConfig

Use the [ScriptRunConfig](https://docs.microsoft.com/python/api/azureml-core/azureml.core.scriptrunconfig?view=azure-ml-py) class to define your run. Specify the source driectory, compute target, and environment.

In [37]:
from azureml.train.dnn import PyTorch
from azureml.core import ScriptRunConfig

# follow pattern from here: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments#use-environments-for-training

# Add training script to run config
runconfig = ScriptRunConfig(source_directory=project_folder, script="script.py")

# Attach compute target to run config
runconfig.run_config.target = cluster_name

# Uncomment the line below if you want to try this locally first
#runconfig.run_config.target = "local"

# Attach environment to run config
runconfig.run_config.environment = my_env

### Submit your run

In [38]:
# Submit run 
run = experiment.submit(runconfig)

# to get more details of your run
print(run.get_details())

{'runId': 'pytorch-peds_1583446986_325bdc5c', 'target': 'gpu-cluster', 'status': 'Starting', 'properties': {'_azureml.ComputeTargetType': 'amlcompute', 'ContentSnapshotId': '622a759f-97d4-4e2f-a589-0abd1e4b79ef', 'azureml.git.repository_uri': 'git@github.com:gvashishtha/pytorch-object.git', 'mlflow.source.git.repoURL': 'git@github.com:gvashishtha/pytorch-object.git', 'azureml.git.branch': 'dataset-change', 'mlflow.source.git.branch': 'dataset-change', 'azureml.git.commit': '5b39a0f082ec9e5a099db28f6879900279d0386d', 'mlflow.source.git.commit': '5b39a0f082ec9e5a099db28f6879900279d0386d', 'azureml.git.dirty': 'False', 'AzureML.DerivedImageName': 'azureml/azureml_9125fd9b495cfdec8f7bf56c6d28d91d'}, 'inputDatasets': [], 'runDefinition': {'script': 'script.py', 'useAbsolutePath': False, 'arguments': [], 'sourceDirectoryDataStore': None, 'framework': 'Python', 'communicator': 'None', 'target': 'gpu-cluster', 'dataReferences': {}, 'data': {}, 'jobName': None, 'maxRunDurationSeconds': None, 'n

### Monitor your run

In [39]:
from azureml.widgets import RunDetails

RunDetails(run).show()
run.wait_for_completion(show_output=True)

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': True, 'log_level': 'INFO', 's…

RunId: pytorch-peds_1583446986_325bdc5c
Web View: https://mlworkspace.azure.ai/portal/subscriptions/15ae9cb6-95c1-483d-a0e3-b1a1a3b06324/resourceGroups/aifxdemo/providers/Microsoft.MachineLearningServices/workspaces/gopalv-ws/experiments/pytorch-peds/runs/pytorch-peds_1583446986_325bdc5c

Execution Summary
RunId: pytorch-peds_1583446986_325bdc5c
Web View: https://mlworkspace.azure.ai/portal/subscriptions/15ae9cb6-95c1-483d-a0e3-b1a1a3b06324/resourceGroups/aifxdemo/providers/Microsoft.MachineLearningServices/workspaces/gopalv-ws/experiments/pytorch-peds/runs/pytorch-peds_1583446986_325bdc5c


ActivityFailedException: ActivityFailedException:
	Message: Activity Failed:
{
    "error": {
        "code": "ServiceError",
        "message": "AzureMLCompute job failed.\nJobPreparationError: failed to prepare an environment for the job execution\n\tInfo: Job environment preparation failed on 10.0.0.4.",
        "details": []
    },
    "correlation": {
        "operation": null,
        "request": "6e32ed02991646ab"
    },
    "environment": "westus2",
    "location": "westus2",
    "time": "2020-03-05T22:30:51.379442Z"
}
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "Activity Failed:\n{\n    \"error\": {\n        \"code\": \"ServiceError\",\n        \"message\": \"AzureMLCompute job failed.\\nJobPreparationError: failed to prepare an environment for the job execution\\n\\tInfo: Job environment preparation failed on 10.0.0.4.\",\n        \"details\": []\n    },\n    \"correlation\": {\n        \"operation\": null,\n        \"request\": \"6e32ed02991646ab\"\n    },\n    \"environment\": \"westus2\",\n    \"location\": \"westus2\",\n    \"time\": \"2020-03-05T22:30:51.379442Z\"\n}"
    }
}

### Get your latest run and register your model

In [None]:
from azureml.core import Run

run = Run(run_id='pytorch-peds_1583262435_ac3ea423', experiment=experiment)
model = run.register_model(model_name='pytorch_peds', model_path='outputs/model.pt')
model


### Download your model

In [None]:
import torch
path = model.download(target_dir='.', exist_ok=True)
path

# model = torch.load(path)
#torch.load(model.get_model_path(model_name='outputs/model.pt'))

### Test model inferencing

In [None]:
import torch
from azureml.core import Dataset
from data import PennFudanDataset
from script import get_transform

if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

model = torch.load('./model.pt', map_location=device)

penn_ds = Dataset.get_by_name(workspace=ws, name='penn_ds')
dataset_test = PennFudanDataset(penn_ds, get_transform(train=False))



# pick one image from the test set
img, _ = dataset_test[0]
# put the model in evaluation mode
model.eval()
with torch.no_grad():
    prediction = model([img.to(device)])