![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/ml-frameworks/fastai/train-with-custom-docker/fastai-with-custom-docker.png)

# Train a model using a custom Docker image and Darknet

In this tutorial, learn how to use a custom Docker image when training models with Azure Machine Learning and leverage the Darknet framework.

## Prerequisites

1. Install of Python 3 in development environment (e.g. local or DSVM).  Use `pip install requirements_local.txt` to install necessary packages on the command line in a Python environment (virtual or conda environment).



In [None]:
from azureml.core import Workspace
from azureml.core import Dataset
from azureml.core import Environment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core import ScriptRunConfig
from azureml.core import Experiment
from azureml.core.conda_dependencies import CondaDependencies

import os
from uuid import uuid4

## Initialize a workspace
The Azure Machine Learning workspace is the top-level resource for the service. It provides you with a centralized place to work with all the artifacts you create. In the Python SDK, you can access the workspace artifacts by creating a `workspace` object.

Create a workspace object from the `config.json` file.

In [None]:
ws = Workspace.from_config()

## Upload dataset to default Data Store

Default data store (backend is Blob Storage) for this Workspace.

Upload the data folder where the `data` folder is structured as:
```
    data/
        img/
            image1.jpg
            image1.txt
            image2.jpg
            image2.txt
            ...
        train.txt
        valid.txt
        obj.data
        obj.names
```

Note, `train.txt` looks similiar to the following snippet (image path is from `data` root) and `valid.txt` follows the same pattern:
```
data/img/image1.jpg
data/img/image2.jpg
...
```

Note, it is recommended that 5-10% of all images should go in to the `valid.txt` image list.  There should not be overlap between the two lists.

In [None]:
datastore = ws.get_default_datastore()
datastore.upload(src_dir='./data_products',
                 target_path='data',
                 overwrite=True)

Create an Azure ML Dataset.  A Dataset can reference single or multiple files in your datastores or public urls. The files can be of any format. Dataset provides you with the ability to download or mount the files to your compute. By creating a dataset, you create a reference to the data source location. The data remains in its existing location, so no extra storage cost is incurred.

In [None]:
# initialize file dataset 
ds_paths = [(datastore, 'data/')]
dataset = Dataset.File.from_files(path=ds_paths)

## Prepare scripts
Create a directory titled `darknet_scripts` for training script any any testing scripts.

In [None]:
os.makedirs('darknet_scripts', exist_ok=True)

### Setup test script

Then run the cell below to create the a script to test the setup in the directory.

In [None]:
%%writefile darknet_scripts/test_setup.py
"""
Azure ML test setup script for Darknet object detection experiment
"""
import os
import requests
import subprocess
import shutil
import argparse


# Test greeting
def greeting():
    print("Welcome to darknet container!")
greeting()

os.makedirs('./outputs', exist_ok=True)

# Arguments
parser = argparse.ArgumentParser()
parser.add_argument('--data-folder', type=str,
                    dest='data_folder', help='data folder')
args = parser.parse_args()

# Look at data folder
print('Data folder is at:', args.data_folder)
print('List all files: ', os.listdir(args.data_folder))

fulldatapath = os.path.join(args.data_folder, "data")

print("Contents of data folder: ")
os.system("ls data")

print("Getting the test image...")
# Get a test image
url = "https://raw.githubusercontent.com/AlexeyAB/darknet/master/data/giraffe.jpg"
response = requests.get(url)
if response.status_code == 200:
    with open("giraffe.jpg", "wb") as f:
        f.write(response.content)

# Make coco.data file
coco_data = """
classes= 80
train  = train.txt
valid  = val.txt
names = coco.names
backup = backup/
eval=coco
"""
print("Making coco.data")
with open("coco.data", "w") as f:
    f.write(coco_data)

print("Getting coco.names...")
# Get the names file
url = "https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/coco.names"
response = requests.get(url)
if response.status_code == 200:
    with open("coco.names", "wb") as f:
        f.write(response.content)

print("Getting the weights file, yolov4.weights...")
# Get the weights file
url = "https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights"
response = requests.get(url)
if response.status_code == 200:
    with open("yolov4.weights", "wb") as f:
        f.write(response.content)

print("Getting the config file...")
# Get the config file
url = "https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4.cfg"
response = requests.get(url)
if response.status_code == 200:
    with open("yolov4.cfg", "wb") as f:
        f.write(response.content)

# What is our current working directory?
print("Current working directory: {}".format(os.getcwd()))
print("Contents of directory: ")
os.system("ls")

# Predict with darknet
print("Running darknet detector test!")
os.system("darknet detector test coco.data yolov4.cfg yolov4.weights -thresh 0.25 giraffe.jpg -ext_output")
os.system("ls")

if os.path.exists("predictions.jpg"):
    shutil.copyfile("predictions.jpg", "outputs/predictions.jpg")
if os.path.exists("predictions.png"):
    shutil.copyfile("predictions.png", "outputs/predictions.png")


print("Return value: {}".format(retval))

### Train script

This is the training script.  It need not be modified as it utilizes some of the variables above as arguments, however feel free to improve it.

In [None]:
%%writefile darknet_scripts/train.py
"""
Azure ML train script for Darknet object detection experiment
"""
import os
import requests
import subprocess
import shutil
import argparse

from azureml.core.run import Run
from azureml.core.model import Model


# Azure ML run to log metrics etc.
run = Run.get_context()

# Fill in number of classes
num_classes = 3

# Fill in with a list of anchor boxes
anchors = "42,98, 53,120, 58,137, 76,181, 109,162, 131,261"

# Test greeting
def greeting():
    print("Welcome to darknet container!")
greeting()

os.makedirs('./outputs', exist_ok=True)

# Arguments
parser = argparse.ArgumentParser()
parser.add_argument('--data-folder', type=str,
                    dest='data_folder', help='data folder')
parser.add_argument('--lr', type=float, default=0.001,
                    dest='lr', help='learning rate')
parser.add_argument('--bs', type=int, default=4,
                    dest='bs', help='minibatch size')
parser.add_argument('--epochs', type=int, default=4,
                    dest='epochs', help='number of epochs')
args = parser.parse_args()

# ========================== Get data ==========================

# Look at data folder
print('Data folder is at:', args.data_folder)
print('List all files: ', os.listdir(args.data_folder))

fulldatapath = args.data_folder

# ========================== Create or download necessary files ==========================
    
# Get the pre-trained weights file
print("Getting the pre-trained weights file, yolov4-tiny.conv.29...")
url = "https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.conv.29"
response = requests.get(url)
if response.status_code == 200:
    with open("yolov4-tiny.conv.29", "wb") as f:
        f.write(response.content)

print("Creating the config file...")
# Get the config file
url = "https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-tiny.cfg"
response = requests.get(url)
if response.status_code == 200:
    with open("yolov4-tiny.cfg", "wb") as f:
        f.write(response.content)
with open("yolov4-tiny.cfg", "r") as f:
    config_content = f.read()

# Replace LR
config_content = config_content.replace("learning_rate=0.00261", "learning_rate={}".format(args.lr))
# Replace number of filters in CN layer before yolo layer
num_filters = (num_classes+5)*3
config_content = config_content.replace("filters=255", "filters={}".format(num_filters))
# Replace number of classes
config_content = config_content.replace("classes=80", "classes={}".format(num_classes))
# Replace batch size
config_content = config_content.replace("batch=64", "batch={}".format(args.bs))
# Replace max batches/epochs and learning rate stepping epochs
config_content = config_content.replace("max_batches = 2000200", "max_batches={}".format(args.epochs))
config_content = config_content.replace("steps=1600000,1800000", 
                                        "steps={},{}".format(int(0.8*args.epochs), 
                                                             int(0.9*args.epochs)))
# Replace anchors
config_content = config_content.replace("10,14,  23,27,  37,58,  81,82,  135,169,  344,319",
                                       anchors)

with open("yolov4-tiny-custom.cfg", "w") as f:
    f.write(config_content)
if os.path.exists("yolov4-tiny-custom.cfg"):
    shutil.copyfile("yolov4-tiny-custom.cfg", "./outputs/yolov4-tiny-custom.cfg")

# What is our current working directory?
print("Current working directory: {}".format(os.getcwd()))
print("Contents of directory: ")
os.system("ls")

# ========================== Train model ==========================

# Predict with darknet
print("Running darknet training experiment for {} epochs!".format(args.epochs))

result = subprocess.run(['darknet', 
                         'detector',
                         'train',
                         'data/obj.data',
                         'yolov4-tiny-custom.cfg', 
                         'yolov4-tiny.conv.29',
                         '-map',
                         '-dont_show',
                         '-clear'], 
                        stdout=subprocess.PIPE).stdout.decode('utf-8')

os.system("ls")

mAP = ""
# Capture mAP of final model (read from log file)
result = result.split("\n")
for line in result:
    # This is what darknet outputs at end
    if "mean average precision (mAP@0.50)" in line:
        mAP = float(line[-8:].replace(" % ", ""))
        # Log to Azure ML workspace
        run.log('mAP0.5_all', mAP)

        
# ========================== Register model - TBD ==========================

# # Get class names as string
# with open("./data/obj.names", "r") as f:
#     class_names = f.read().replace("\n", "_").replace(" ", "").strip()

# model = run.register_model(model_name='darknet-yolov4-tiny',
#                            tags={"mAP0.5_all": mAP,
#                                  "classes": class_names,
#                                  "learning_rate": args.lr,
#                                  "batch_size": args.bs,
#                                  "format": "darknet"},
#                            model_path="./outputs/yolov4-tiny-custom_final.weights")


# ========================== Small test - optional ==========================

with open("data/valid.txt", "r") as f:
    validtxt = f.readlines()
# Pick first image
testimg = validtxt[0].strip()

# Predict with darknet
print("Running darknet detector test!")
os.system("darknet detector test data/obj.data yolov4-tiny-custom.cfg ./outputs/yolov4-tiny-custom_final.weights -thresh 0.25 {} -ext_output".format(testimg))
os.system("ls")

if os.path.exists("predictions.jpg"):
    shutil.copyfile("predictions.jpg", "./outputs/predictions.jpg")

# ========================== Convert model to onnx - TBD ==========================



## Define your environment
Create an environment object and enable Docker.

In [None]:
darknet_env = Environment("darknet")

It is also possible to use a custom Dockerfile. Use this approach if you need to install non-Python packages as dependencies and remember to set the base image to None. 

This specified base image supports the darknet framework which allows for object detection deep learning capabilities. For more information, see the [darknet GitHub repo](https://github.com/AlexeyAB/darknet). 

When you are using your custom Docker image, you might already have your Python environment properly set up. In that case, set the `user_managed_dependencies` flag to True in order to leverage your custom image's built-in python environment.

In [None]:
darknet_env.docker.base_image = None
darknet_env.docker.base_dockerfile = "./Dockerfile"
darknet_env.python.user_managed_dependencies = True

## Create or attach existing AmlCompute
You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.

**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.

As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.

In [None]:
# choose a name for your cluster
cluster_name = "gpu-cluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
                                                           max_nodes=4)

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True)

# use get_status() to get a detailed status for the current AmlCompute
print(compute_target.get_status().serialize())

## Create a ScriptRunConfig and submit for training
This ScriptRunConfig will configure your job for execution on the desired compute target.  Here we are looping over a list of hyperparameters.  Note, the concurrency will be limited by the number of compute nodes in our compute target.

When a training run is submitted using a ScriptRunConfig object, the submit method returns an object of type ScriptRun. The returned ScriptRun object gives you programmatic access to information about the training run. 

In [None]:
hyperparams = {"learning_rate": [0.0005, 0.001],
               "batch_size": [4, 6]}
epochs = 3000

# Iterate over hyperparameters
for lr in hyperparams["learning_rate"]:
    for bs in hyperparams["batch_size"]:
    
        script_args = ['--data-folder', 
                       dataset.as_named_input('data').as_mount('data'),
                       '--lr', lr,
                       '--bs', bs,
                       '--epochs', epochs]

        darknet_config = ScriptRunConfig(source_directory='darknet_scripts',
                                        script='train.py',
                                        arguments=script_args,
                                        compute_target=compute_target,
                                        environment=darknet_env)

        run = Experiment(ws,'darknet-custom-image-hyper').submit(darknet_config)