## 1) Deps and sagemaker env

First requirements :

Make sure you have an AWS IAM Role capable of running SageMaker job, and having read/write access to the S3 buckets thats contains :

  - yolo training model and inputs generated/used in the training job
  - sagemaker training job name/id

In [None]:
import cv2
import sagemaker
import numpy as np
import matplotlib.pyplot as plt
from sagemaker import get_execution_role
from sagemaker.utils import name_from_base
from sagemaker.pytorch import PyTorchModel

sagemaker_session = sagemaker.Session()

# we are using the notebook instance role for training in this example
role = 'AmazonSageMaker-ExecutionRole-<YOUR_IAM_EXECUTION_ROLE_ID>' 

# you can specify a bucket name here, we're using the default bucket of SageMaker
bucket = sagemaker_session.default_bucket()

latest_training_jobname = '<YOUR LATEST TRAINING JOB NAME/ID>'

# Path of the trained model artefact (tar.gz archive made by SageMaker estimator, from Yolo output (weights folder))
# This archive should contains the two trained model from Yolo : last.pt and best.pt
model_artefact = f's3://{bucket}/visualsearch/training-inputs/results/{latest_jobname}/output/model.tar.gz'


## 2) Build yolov5 runtime container. 

First we build our custom yolo v5 docker image, using this command from a bash terminal : 

    AWS_PROFILE=your-aws-profile-name ./build-and-push.sh visualsearch-yolov5l-runtime

Then we retrieve the docker image id, to be use by the training job

In [None]:
with open (os.path.join('container', 'ecr_image_fullname.txt'), 'r') as f:
    container = f.readlines()[0][:-1]

print(container)

## 3) Runtime model definition as PyTorch model (Yolo v5 is based on PyTorch)

The local folder "code" will be embedded into this endpoint, to allow custom behavior of the container instance.

In this case we use a custom python class (ModelHandler) as a singleton to load the model and to manage inference request with this loaded model (please review code/model_handler.py file). This class is made to be compatible with the SageMaker model_server interface (https://github.com/aws/sagemaker-inference-toolkit)

In [None]:
# Create SageMaker model and deploy an endpoint
model = PyTorchModel(
    name=name_from_base('visualsearch-yolov5'),
    model_data=model_artefact,
    entry_point='dockerd-entrypoint.py',
    role=role,
    source_dir='code',
    framework_version='1.5',
    py_version='py3',
    image_uri=container,
)


### 4) Endpoint creation

The endpoint instance type can be customized depending on the desired hardware resource. "local" instance type allow you to test your endpoint locally (docker/docker-compose are required)

In [None]:
#predictor = model.deploy(initial_instance_count=1, instance_type='ml.m5.large')
predictor = model.deploy(initial_instance_count=1, instance_type='ml.g4dn.xlarge')
#predictor = model.deploy(initial_instance_count=1, instance_type='local')


In [None]:
# Print the name of newly created endpoint
print(predictor.endpoint_name) 

#### Local endpoint invoke with curl : 

    curl -v -X POST -F "body=@/path/to/some-image.jpg" http://localhost:8080/models/model/invoke

#### Sagemaker endpoint remote invoke : 

(use the endpoint name printed out from the previous cell)

    AWS_PROFILE=<your aws profile name> aws sagemaker-runtime invoke-endpoint --endpoint-name visualsearch-yolov5-xxxxxxxx --body fileb:///path/to/some-image.jpg --content-type multipart/form-data  >(cat)

### 5) Endpoint removal

In [None]:
sagemaker.Session().delete_endpoint(predictor.endpoint_name)