## Logo detection using YoloV2 and Darknet on Amazon SageMaker

This example shows how you create a Darknet image https://pjreddie.com/darknet/ for Amazon SageMaker. With this image you can **train** and **deploy** ML models.

There are 3 exercises in total:
  - This is the first one, where you'll create a docker image to work with Darknet on SageMaker
  - In the second exercise you'll prepare the dataset [Openlogo](https://qmul-openlogo.github.io/index.html)
  - Finally you'll train/deploy and test the logo detector

SageMaker provides libraries that we can use to help us to create the Docker image:
  - https://github.com/aws/sagemaker-inference-toolkit
  - https://github.com/aws/sagemaker-training-toolkit\
 
So, let's get started

In [1]:
!rm -rf container && mkdir -p container

### 1.0) First, let's use SageMaker Inference toolkit and create a handler for the predictions
This class will be used by SageMaker when an it is time to run the model(prediction). 

In [26]:
%%writefile container/handler.py
import os
import sys
import darknet as dn
import numpy as np
import io

from PIL import Image
from ctypes import pointer,c_int
from sagemaker_inference.default_inference_handler import DefaultInferenceHandler
from sagemaker_inference.default_handler_service import DefaultHandlerService
from sagemaker_inference import content_types, errors, transformer, encoder, decoder

class HandlerService(DefaultHandlerService, DefaultInferenceHandler):
    def __init__(self):
        op = transformer.Transformer(default_inference_handler=self)
        super(HandlerService, self).__init__(transformer=op)
        self.thresh=.5
        self.hier_thresh=.5
        self.nms=.45
        self.num_classes = 335

    ## Loads the model from the disk
    def default_model_fn(self, model_dir):
        cfg_path = os.path.join(model_dir, "model.cfg").encode('utf-8')
        model_path = os.path.join(model_dir, "model.weights").encode('utf-8')

        return dn.load_net(cfg_path, model_path, 0)

    ## Parse and check the format of the input data
    def default_input_fn(self, input_data, content_type):
        if not content_type in ["image/jpeg", "image/png" ]:
            raise Exception("Invalid content-type: %s" % content_type)
        img = np.array(Image.open(io.BytesIO(input_data)))
        # now lets create a cbinding image
        h, w, c = img.shape
        im = dn.make_image(w, h, c)

        img = np.divide(np.rollaxis(img, axis=2, start=0).flatten(), 255.)
        for i in range(h*w*c):
            im.data[i] = img[i]

        return im

    ## Run our model and do the prediction
    def default_predict_fn(self, payload, model):
        num = c_int(0)
        pnum = pointer(num)
        a = dn.predict_image(model, payload)
        print(a[0])
        dets = dn.get_network_boxes(model, payload.w, payload.h, self.thresh, self.hier_thresh, None, 0, pnum)
        num = pnum[0]
        if (self.nms): dn.do_nms_obj(dets, num, self.num_classes, self.nms)
        res = []
        for j in range(num):
            for i in range(self.num_classes):
                if dets[j].prob[i] > 0:
                    b = dets[j].bbox
                    res.append((i, dets[j].prob[i], (b.x, b.y, b.w, b.h)))
        res = sorted(res, key=lambda x: -x[1])
        dn.free_image(payload)
        dn.free_detections(dets, num)
        return res

    ## Gets the prediction output and format it to be returned to the user
    def default_output_fn(self, prediction, accept):
        if accept != "application/json":
            raise Exception("Invalid accept: %s" % accept)
        return encoder.encode(prediction, accept)

Overwriting container/handler.py


### 1.1) Now we need to create the container entrypoint

This script will **handle** both training and predictions. So, we need to check the command (train or serve) and execute the appropriate code for each operation.

In [43]:
%%writefile container/main.py
import argparse
import subprocess
import sys
import os

from sagemaker_inference import model_server
from sagemaker_training import environment, intermediate_output, params, logging_config, files

logger = logging_config.get_logger()

if __name__ == "__main__":
    if len(sys.argv) < 2 or ( not sys.argv[1] in [ "serve", "train" ] ):
        raise Exception("Invalid argument: you must inform 'train' for training mode or 'serve' predicting mode") 
        
    if sys.argv[1] == "train":
        
        env = environment.Environment()
        parser = argparse.ArgumentParser()
        logging_config.configure_logger(env.log_level)
        logger.info( "Starting a new training! %s" % env.log_level)
        # https://github.com/aws/sagemaker-training-toolkit/blob/master/ENVIRONMENT_VARIABLES.md

        # reads input channels training and testing from the environment variables
        parser.add_argument("--training", type=str, default=env.channel_input_dirs["training"])
        parser.add_argument("--testing", type=str, default=env.channel_input_dirs["testing"])
        parser.add_argument("--assets", type=str, default=env.channel_input_dirs["assets"])

        parser.add_argument("--model-dir", type=str, default=env.model_dir)
        parser.add_argument("--checkpoints-dir", type=str, default=env.output_intermediate_dir)

        parser.add_argument("--num-classes", type=int, default=env.hyperparameters.get("num_classes"))
        parser.add_argument("--cfg", type=str, default=env.hyperparameters.get("cfg"))
        parser.add_argument("--weights", type=str, default=env.hyperparameters.get("weights"))

        parser.add_argument("--train-file", type=str, default=env.hyperparameters.get("train_file"))
        parser.add_argument("--test-file", type=str, default=env.hyperparameters.get("test_file"))
        parser.add_argument("--names-file", type=str, default=env.hyperparameters.get("names_file"))

        args,unknown = parser.parse_known_args()

        logger.info("ENV: %s" % (env) )
        logger.info("ARGS: %s" % (args) )
        
        command = ["darknet", "detector", "train", "/tmp/temp.data"]

        if args.cfg is None or not os.path.isfile(os.path.join(args.assets, args.cfg)):
            raise Exception("You need to inform a valid .cfg file: %s" % args.cfg)
        command.append( os.path.join(args.assets, args.cfg) )

        if args.weights is not None:
            weights_file = os.path.join(args.assets,  args.weights )
            if not os.path.isfile(weights_file):
                raise Exception('You defined an invalid weights file')
            command.append(weights_file)

        train_file = os.path.join(args.training, args.train_file)
        test_file = os.path.join(args.testing, args.test_file)
        names_file = os.path.join(args.assets,  args.names_file )
        model_prefix = os.path.join(args.checkpoints_dir, args.cfg.split('.')[0])
        model_cfg_filename = os.path.join(args.assets, args.cfg)

        subprocess.call(['sed', '-i', '-e', 's#^#%s/#' % args.training, train_file])
        subprocess.call(['sed', '-i', '-e', 's#^#%s/#' % args.testing, test_file])

        with open('/tmp/temp.data', 'w') as f:
            f.write("classes=%d\n" % args.num_classes)
            f.write("train=%s\n" % train_file)
            f.write("valid=%s\n" % test_file)
            f.write("names=%s\n" % names_file )
            f.write("backup=%s\n" % args.checkpoints_dir)

        gpus = ','.join([str(i) for i in range(env.num_gpus)])
        if gpus != '': 
            command += ["-gpus", gpus]
        logger.info(command)
        
        intermediate_sync = None
        try:
            region = os.environ.get("AWS_REGION", os.environ.get(params.REGION_NAME_ENV))
            s3_endpoint_url = os.environ.get(params.S3_ENDPOINT_URL, None)
            logger.info("Starting intermediate sync. %s: %s - %s" % (region, env.sagemaker_s3_output(), s3_endpoint_url))
            intermediate_sync = intermediate_output.start_sync(
                env.sagemaker_s3_output(), region, endpoint_url=s3_endpoint_url
            )
            logger.info(intermediate_sync)
            subprocess.call(command)
            new_model_cfg = os.path.join(args.model_dir, "model.cfg")
            subprocess.call(["cp", model_cfg_filename, new_model_cfg])
            subprocess.call(["mv", "%s_final.weights" % model_prefix, os.path.join(args.model_dir, "model.weights")])

            # we need to set batch and subdivisions to 1 to accept 1 image per prediction
            subprocess.call(['sed', '-i', '-e', 's#^batch\s*=\s*[0-9]\+#batch=1#', new_model_cfg])
            subprocess.call(['sed', '-i', '-e', 's#^subdivisions\s*=\s*[0-9]\+#subdivisions=1#', new_model_cfg])
            
            logger.info("Reporting training SUCCESS")
            files.write_success_file()
        except Exception as e:
            failure_msg = "framework error: \n%s\n%s" % (traceback.format_exc(), str(e))
            logger.error("Reporting training FAILURE")
            logger.error(failure_msg)
            files.write_failure_file(failure_msg)
        finally:
            if intermediate_sync:
                intermediate_sync.join()
    else:
        model_server.start_model_server(handler_service="serving.handler")

Overwriting container/main.py


### 1.3) Finally we need to create the Dockerfile to prepare our container

In [44]:
%%writefile container/Dockerfile
FROM nvidia/cuda:10.1-cudnn7-devel

# Set a docker label to advertise multi-model support on the container
LABEL com.amazonaws.sagemaker.capabilities.multi-models=false
# Set a docker label to enable container to use SAGEMAKER_BIND_TO_PORT environment variable if present
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true

ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=Europe/London

RUN apt-get update -y && apt-get -y install \
    --no-install-recommends default-jdk build-essential git python3.6 python3.6-dev python3-pip

RUN apt-get clean && rm -rf /var/cache/apt && \
    apt-get -y autoremove && apt-get -y autoclean && \
    rm -rf /var/cache/apt /var/lib/apt/lists/*

RUN mkdir -p /opt/ml/code
RUN git clone https://github.com/pjreddie/darknet.git && \
    sed -i 's#GPU=0#GPU=1#' darknet/Makefile && \
    sed -i 's#CUDNN=0#CUDNN=1#' darknet/Makefile && \
    cd darknet && make -j && \
    mv darknet /usr/bin && \
    mkdir -p /opt/ml/code && \
    mv python/darknet.py /opt/ml/code && \
    mv libdarknet.so /usr/lib && \
    ldconfig
RUN rm -rf darknet

RUN pip3 --no-cache-dir install -U setuptools 
RUN pip3 --no-cache-dir install -U multi-model-server sagemaker-inference sagemaker-training 2to3 wheel

RUN 2to3 -w /opt/ml/code/darknet.py

ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PYTHONPATH="/opt/ml/code:${PATH}"

COPY main.py /opt/ml/code/main.py
COPY handler.py /opt/ml/code/serving/handler.py

# Defines train.py as script entry point
#ENV SAGEMAKER_PROGRAM main.py

ENTRYPOINT ["python3", "/opt/ml/code/main.py"]

Overwriting container/Dockerfile


### 2.0) We can use the local Docker daemon to build our image

In [None]:
!docker build -t darknet:latest container/

## 3.0) Upload the image to ECR
### Pushing the image to ECR
Before executing the next cell, go to ECR and create a new repo, called **darknet**

In [46]:
import boto3

account_id = boto3.client("sts").get_caller_identity()["Account"]
region = boto3.session.Session().region_name

!$(aws ecr get-login --no-include-email --region $region)
!docker tag darknet:latest "$account_id".dkr.ecr."$region".amazonaws.com/darknet:1.0
!docker push "$account_id".dkr.ecr."$region".amazonaws.com/darknet:1.0

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
The push refers to repository [715445047862.dkr.ecr.us-east-1.amazonaws.com/darknet]

[1B93e10553: Preparing 
[1Baf79b19b: Preparing 
[1Bd3f5bb73: Preparing 
[1B13417d2a: Preparing 
[1Baaf5e3b8: Preparing 
[1B10cac599: Preparing 
[1Bfff24be3: Preparing 
[1B87a7da47: Preparing 
[1Ba6c7a448: Preparing 
[1B44daba9c: Preparing 
[1B9913a256: Preparing 
[1B2f599fd6: Preparing 
[1B74f76be4: Preparing 
[1Bd332a58a: Preparing 
[1Bf11cbf29: Preparing 
[11B0cac599: Waiting g 
[1Bafb09dc3: Preparing 
[12Bff24be3: Waiting g 
[1Bc8e5063e: Preparing 
[20B3e10553: Pushed lready exists 9kB6A[1K[K[13A[1K[K[9A[1K[K[6A[1K[K[2A[1K[K[19A[1K[K[20A[1K[K1.0: digest: sha256:272e229e2be9caf64f6d0a3a318d00c8f453b51ee5e900e268fede423eda988f size: 4513
