In [None]:
# Imports
import numpy as np

from determined.experimental import Determined
from models import ObjectDetectionModel
from predict import predict
from utils import check_model

# remove warnings
import warnings
warnings.filterwarnings('ignore')

# Set up .detignore file so the checkpoints directory is not packaged into future experiments
!echo checkpoints > .detignore

<img src="https://raw.githubusercontent.com/determined-ai/determined/master/determined-logo.png" align='right' width=150 />

# Building a Geospatial Detection Model with Determined

<img src="https://www.cis.upenn.edu/~jshi/ped_html/images/PennPed00071_1.png" width=400 />


This notebook will walk through the benefits of building a Deep Learning model with Determined.  We will build an object detection model trained on the [Penn-Fudan Database for Pedestrian Detection and Segmentation](https://www.cis.upenn.edu/~jshi/ped_html/).


# Table of Contents


<font size="3">
<ol>
  <li>What Modeling looks like Today</li>
  <li>Building a model with Determined
    <ol>
      <li>Single GPU training</li>
      <li>Cluster-scale multi-GPU training</li>
      <li>Adapative hyperparameter search</li>
    </ol>
  </li>
</ol>
</font>

# What modeling looks like without Determined

First let's look at the kind of work modelers do today.  Below, we train a model we found on Github and modified, printing validation set metrics after each epoch.

In [None]:
from models import ObjectDetectionModel

NUM_EPOCHS = 10

model = ObjectDetectionModel({'lr': 0.00045, 'm': 0.72})

try:
    for epoch in range(NUM_EPOCHS):
        print(f"Training epoch {epoch + 1} of {NUM_EPOCHS}")
        model.train_one_epoch()
        iou = model.eval()
        print(f"Validation set average IoU: {iou}\n")
except KeyboardInterrupt:
    pass

We might also roll our own simple hyperparameter tuning:

In [None]:
import numpy as np

from models import ObjectDetectionModel

def hp_grid_search():
    for lr in np.logspace(-4, -2, num=10):
        for m in np.linspace(0.7, 0.95, num=10):
            print(f"Training model with learning rate {lr} and momentum {m}")
            model = ObjectDetectionModel({'lr': lr, 'm': m})
            model.train_one_epoch()
            iou = model.eval()
            print(f"Validation set average IoU: {iou}\n")

try:
    hp_grid_search()
except KeyboardInterrupt:
    pass

# What's Missing?

<font size="4">This approach works in theory -- we could get a good model, save it, and use it for predictions.  But we're missing a lot from the ideal state:</font>
<font size="4">
<ul style="margin-top: 15px">
  <li style="margin-bottom: 10px">Distributed training</li>
  <li style="margin-bottom: 10px">Parallel search</li>
  <li style="margin-bottom: 10px">Intelligent checkpointing</li>
  <li style="margin-bottom: 10px">Interruptibility and fault tolerance</li>
  <li                            >Logging of experiment configurations and results </li>
</ul>
</font>

<font size=6><b>Scaled Experimentation with Determined</b></font>

With less work than setting up a limited random search, you can get started with Determined.

## Our First Experiment

For our first example, we run a simple single-GPU training job with fixed hyperparameters.

<img src="https://raw.githubusercontent.com/determined-ai/public_assets/main/images/StartAnExperiment.png" align=left width=330/>

In [None]:
!det e create const.yaml .

And evaluate its performance:

In [None]:
experiment_id = <Enter Experiment ID>

In [None]:
checkpoint = Determined().get_experiment(experiment_id).top_checkpoint()
model = checkpoint.load().model

In [None]:
predict(model, 'test.jpg', 0.5)

## Scaling up to Distributed Training

Determined makes it trivial to move from single-GPU to multi-GPU (and even multi-node) training. Here we'll simply modify the config above to request 8 GPUs instead of 1, and increase the global batch size to increase the data throughput 

In [None]:
!cat distributed.yaml

In [None]:
!det experiment create distributed.yaml .

<img src="https://raw.githubusercontent.com/determined-ai/public_assets/main/images/4GPUexperiment.png" align=left width=530 />

## Run Distributed Hyperparameter Tuning

By simply building a config file and adapting our code to meet the determined trial interface, we can conduct a sophisticated hyperparamter search.  Instructions for how to configure different types of experiments [can be found in the Determined documentation.](https://docs.determined.ai/latest/how-to/index.html)

In [None]:
!cat search.yaml

## Create your Experiment

Now that you've described your experiment, you'll simply need to use the command line interface to submit it to the Determined Cluster.  

In [None]:
!det experiment create search.yaml .

<img src="https://raw.githubusercontent.com/determined-ai/public_assets/main/images/12GPUexperiment.png" align=left width=800 />

# Model Registry

After training, we'll want to actually use our model in some sort of system.  Determined provides a model registry to version your trained models, making them easy to retrieve for inference.

In [None]:
experiment_id = <Enter Experiment ID>
MODEL_NAME = "pedestrian-detection"

In [None]:
# Get the best checkpoint from the training
checkpoint = Determined().get_experiment(experiment_id).top_checkpoint()

In [None]:
model = check_model(MODEL_NAME)

In [None]:
model.register_version(checkpoint.uuid)

# Inference

Once your model is versioned in the model registry, using that model for inference is straightforward:

In [None]:
# Retrieve latest checkpoint for a given model name
latest_version = model.get_version()

In [None]:
# Load the model checkpoint into memory
inference_model = latest_version.checkpoint.load().model

In [None]:
# Run inference as before
predict(inference_model, 'test.jpg')