# Better Performance with MKL BLAS on MXNet 1.6 Deep Learning Containers

Inference speed and performance is often times one of the most crucial factors for deciding to deploy a model in a production environment. Small increases in latency can be costly, so every bit of performance boost that can be had can help the overall costs. Because of our obessesion for customers, the MXNet team has brought a solution to improve the latency for inference using MXNet 1.6 Deep learning containers. In this post, we dicuss the improvement that was made to the MXNet 1.6.0 DLC version to make use of highly optimized matrix operators. The enhancement comes in the form of compiling MXNet with a dependency on Intel MKL BLAS instead of the default, oneDNN. As one will see, the performance boost can be up to 30% in latency reduction, making this a worthwhile option for our customers to implement in their production environment. To describe and show the enhancements in more detail, we will use the MNIST dataset to briefly give context of performing inference on Amazon SageMaker. Then, we will discuss the differences between the MKL BLAS and oneDNN libraries. Lastly, we will show how to implement the enhancement in your environment so that you can take advantage of the performance boost. 

**Note: MXNet 1.7+ Deep Learning Containers have this enhancement as a default, so this solution applies to customers who don't want to change the MXNet version from 1.6.0**

## Inference on Amazon SageMaker

Amazon SageMaker makes it really easy to deploy, host and maintain models. As part of that, choosing what framework or deep learning container to use is also a matter of setting a parameter. For the remainder of this post, we will use the MNIST dataset to quickly give examples for context and to show the performance difference between the MKL BLAS library and oneDNN library. But first, here is a brief example of how to perform inference in SageMaker.

### Setup

First we define a few variables that are needed to perform operations in SageMaker

In [84]:
!pip install gluonnlp --quiet
!pip install bert --quiet
!pip install mxnet --upgrade --quiet

You should consider upgrading via the '/home/ec2-user/anaconda3/envs/mxnet_p36/bin/python -m pip install --upgrade pip' command.[0m
You should consider upgrading via the '/home/ec2-user/anaconda3/envs/mxnet_p36/bin/python -m pip install --upgrade pip' command.[0m
You should consider upgrading via the '/home/ec2-user/anaconda3/envs/mxnet_p36/bin/python -m pip install --upgrade pip' command.[0m


In [85]:
!pip install gluoncv --quiet

You should consider upgrading via the '/home/ec2-user/anaconda3/envs/mxnet_p36/bin/python -m pip install --upgrade pip' command.[0m


In [86]:
from sagemaker import get_execution_role
from sagemaker.session import Session
import sagemaker
from sagemaker.mxnet.model import MXNetModel, MXNetPredictor
import pandas as pd
from sagemaker import utils
import boto3
import gzip
import os
import struct
import numpy as np
import tempfile
import mxnet as mx
import tarfile
import urllib.request
import gluonnlp as nlp
import cv2
from transform import BERTDatasetTransform
from gluonnlp.calibration import BertLayerCollector
from utils import test_bert, deploy_bert, deploy_model
# S3 bucket for saving code and model artifacts.
# Feel free to specify a different bucket here if you wish.
bucket = Session().default_bucket()

# Bucket location where results of model training are saved.
model_artifacts_location = 's3://{}/mxnet-mnist-example/artifacts'.format(bucket)

# IAM execution role that gives SageMaker access to resources in your AWS account.
# We can use the SageMaker Python SDK to get the role from our notebook environment. 
role = get_execution_role()
sagemaker_session = Session()
region = boto3.Session().region_name
test_data_location = 'sagemaker-sample-data-{}'.format(region)

### Setup Data 

This dataset contains 10,000 images that 28x28 pixels.

In [87]:
tweets = pd.read_csv('training.1600000.processed.noemoticon.csv', encoding="ISO-8859-1", names=["target", "ids", "date", "flag", "user", "text"])

### Create an inference Endpoint

We use the ``MXNet model`` object to load model data and deploy an ``MXNetPredictor``. This creates a Sagemaker **Endpoint** -- a hosted prediction service that we can use to perform inference. 

The arguments to the ``deploy`` function allow us to set the number and type of instances that will be used for the Endpoint. Here we will deploy the model to a single ``ml.m4.xlarge`` instance. By not setting the ``image`` parameter, ``MXNetModel`` uses the default deep learning container image.

## MKL BLAS vs oneDNN

The MKL BLAS and oneDNN libraries of math routines that are used to perform mathmatical operations on data. You can think of these as low level instructions that progamming languages use to perform computations. In the case of MXNet, it uses these libraries to for its operations such as dot products and other computationally expensive operations. MKL BLAS as version implemented by Intel, that uses highly optimized operators for CPU. These operators like the ``dot`` operator, are much faster than the operators found on the default library, oneDNN. So in order to take advantage of the speed boost, the MXNet team packaged the the MXNet 1.6 version with Intel's MKL BLAS library in a deep learning container available for use. In the diagram below, you can see the changes that are made between the different math libraries and MXNet versions.  

### Performance

In order to see the performance increase, we describe now how to implement the MKL BLAS and oneDNN based deep learning containers using SageMaker. 

#### oneDNN 

Notice the parameter ``image`` is set to a uri. This is the deep learning container uri, and the ``v3.7`` specifies a version of this MXNet 1.6 container that is compiled with oneDNN. 

In [5]:
ecr_image = "763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.6.0-cpu-py36-ubuntu16.04-v3.7"
onednn_predictor = deploy_bert(sagemaker_session, ecr_image, 'ml.m4.xlarge', "1.6.0", role)

Deploying...
-----------------!
Predicting...


In [9]:
times = inference(tweets.text[:100], onednn_predictor)

Avg: 55.409968686103824 Std: 0.12656422301418305 with 10 loops


[55.78212642669678,
 55.4226975440979,
 55.36242318153381,
 55.377811670303345,
 55.33304786682129,
 55.36034035682678,
 55.37596607208252,
 55.37017631530762,
 55.3291597366333,
 55.38593769073486]

#### MKL BLAS

To specify the deep learning container with MKL BLAS, you change the version identifier in the image uri to ``v3.8``.

In [10]:
ecr_image = "763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.6.0-cpu-py36-ubuntu16.04-v3.8"
mkl_predictor = deploy_bert(sagemaker_session, ecr_image, 'ml.m4.xlarge', "1.6.0", role)

Deploying...
-----------------!
Predicting...


In [11]:
times = inference(tweets.text[:100], mkl_predictor)

Avg: 48.03284316062927 Std: 0.09309706624115292 with 10 loops


In [15]:
ecr_image = "763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.6.0-cpu-py36-ubuntu16.04-v3.8"
mklq_predictor = deploy_bert("bert_sst_quantized.tar.gz", sagemaker_session, ecr_image, 'ml.m4.xlarge', "1.6.0", role, script="bert_inference_quantized.py")

Deploying...
-------------------!
Endpoint Name: bert-1605561706-78b3


In [17]:
times = inference(tweets.text[:100], mklq_predictor)

Avg: 59.52104277610779 Std: 0.1776845656988009 with 10 loops


# SSD

In [112]:
ecr_image = "763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.6.0-gpu-py36-cu101-ubuntu16.04-v3.7"
ssd_predictor = deploy_model("ssd_512_resnet50_v1_voc.tar.gz", "ssd-test", sagemaker_session, ecr_image, 'ml.c5.9xlarge', "1.6.0", role, script="ssd_inference-batch50.py")

Deploying...
-------------------------------------*

UnexpectedStatusException: Error hosting endpoint ssd-test-1606152661-97f8: Failed. Reason:  The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint..

In [None]:
#ssd_predictor = MXNetPredictor("ssd-test-1605820634-af4b")
img = cv2.imread("street_small.jpg")
times_ssd = inference(img, ssd_predictor)

In [None]:
ecr_image = "763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.6.0-gpu-py36-cu101-ubuntu16.04-v3.8"
mkl_ssd_predictor = deploy_model("ssd_512_resnet50_v1_voc.tar.gz", "ssd-test-mkl", sagemaker_session, ecr_image, 'ml.p2.8xlarge', "1.6.0", role, script="ssd_inference.py")

Deploying...
------------

In [None]:
times_ssd_mkl = inference(img, mkl_ssd_predictor)

In [54]:
model_data = "s3://sagemaker-us-east-1-254243159864/ssd-model/ssd_512_resnet50_v1_voc-ml_m4.tar.gz"
ecr_image = "763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.6.0-cpu-py36-ubuntu16.04-v3.8"
neo_mkl_ssd_predictor = deploy_model(model_data, "ssd-neo-mkl", sagemaker_session, ecr_image, 'ml.m4.xlarge', "1.6.0", role, script="ssd_inference2.py")

Deploying...
---------------!
Endpoint Name: ssd-neo-mkl-1605736102-329f


In [None]:
times_ssd_neo_mkl = inference(img, neo_mkl_ssd_predictor)

# Faster-RCNN

In [57]:
ecr_image = "763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.6.0-cpu-py36-ubuntu16.04-v3.7"
frcnn_predictor = deploy_model("faster_rcnn_resnet50_v1b_voc.tar.gz", "frcnn-test", sagemaker_session, ecr_image, 'ml.m4.xlarge', "1.6.0", role, script="faster_rcnn_inference.py")

Deploying...
-------------!
Endpoint Name: frcnn-test-1605737440-df00


In [58]:
img = cv2.imread("street_small.jpg")
times_ssd = inference(img, frcnn_predictor)

Avg: 11.732010388374329 Std: 0.1700353891263145 with 10 loops


In [59]:
ecr_image = "763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.6.0-cpu-py36-ubuntu16.04-v3.8"
frcnn_mkl_predictor = deploy_model("faster_rcnn_resnet50_v1b_voc.tar.gz", "frcnn--mkl-test", sagemaker_session, ecr_image, 'ml.m4.xlarge', "1.6.0", role, script="faster_rcnn_inference.py")

Deploying...
--------------!
Endpoint Name: frcnn--mkl-test-1605738209-c4bf


In [60]:
img = cv2.imread("street_small.jpg")
times_fcrnn_mkl = inference(img, frcnn_mkl_predictor)

Avg: 11.451266074180603 Std: 0.1609549418581635 with 10 loops


In [62]:
frcnn_17_predictor = deploy_model("faster_rcnn_resnet50_v1b_voc.tar.gz", "frcnn--mkl-test", sagemaker_session, ecr_image, 'ml.m4.xlarge', "1.7.0", role, script="faster_rcnn_inference.py")

Deploying...
-----------------!
Endpoint Name: frcnn--mkl-test-1605738953-bb57


In [64]:
img = cv2.imread("street_small.jpg")
frcnn_17_predictor = MXNetPredictor("frcnn--mkl-test-1605738953-bb57")
times_fcrnn_17 = inference(img, frcnn_17_predictor)

Avg: 11.490782737731934 Std: 0.14977126993494028 with 10 loops


In [65]:
ecr_image = "763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.6.0-cpu-py36-ubuntu16.04-v3.7"
frcnn_predictor50 = deploy_model("faster_rcnn_resnet50_v1b_voc.tar.gz", "frcnn-test", sagemaker_session, ecr_image, 'ml.m4.xlarge', "1.6.0", role, script="faster_rcnn_inference_batch50.py")

ecr_image = "763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.6.0-cpu-py36-ubuntu16.04-v3.8"
frcnn_mkl_predictor50 = deploy_model("faster_rcnn_resnet50_v1b_voc.tar.gz", "frcnn--mkl-test", sagemaker_session, ecr_image, 'ml.m4.xlarge', "1.6.0", role, script="faster_rcnn_inference_batch50.py")

Deploying...
---------------!
Endpoint Name: frcnn-test-1605740117-a6cf
Deploying...
-----------------!
Endpoint Name: frcnn--mkl-test-1605740582-9555


In [66]:
img = cv2.imread("street_small.jpg")
times_ssd = inference(img, frcnn_predictor50)
times_fcrnn_mkl = inference(img, frcnn_mkl_predictor50)

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "Error in operator fasterrcnn0_concat0: [23:12:06] src/operator/nn/concat.cc:67: Check failed: shape_assign(&(*in_shape)[i], dshape): Incompatible input shape: expected [300,-1], got [15000,4]
Stack trace:
  [bt] (0) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x32952b) [0x7ff2beeed52b]
  [bt] (1) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x9edc0b) [0x7ff2bf5b1c0b]
  [bt] (2) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x35704fc) [0x7ff2c21344fc]
  [bt] (3) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x3573d88) [0x7ff2c2137d88]
  [bt] (4) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x358d9cf) [0x7ff2c21519cf]
  [bt] (5) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::CachedOp::CheckDynamicShapeExists(mxnet::Context const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, bool)+0x3f3) [0x7ff2c2163cb3]
  [bt] (6) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::CachedOp::Forward(std::shared_ptr<mxnet::CachedOp> const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&)+0xaac) [0x7ff2c2167a7c]
  [bt] (7) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(MXInvokeCachedOp+0x3f0) [0x7ff2c20675d0]
  [bt] (8) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(MXInvokeCachedOpEx+0x40) [0x7ff2c2068640]


Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/sagemaker_inference/transformer.py", line 126, in transform
    result = self._transform_fn(self._model, input_data, content_type, accept)
  File "/opt/ml/model/code/faster_rcnn_inference_batch50.py", line 76, in transform_fn
    cid, score, bbox = net(batch)
  File "/usr/local/lib/python3.6/site-packages/mxnet/gluon/block.py", line 758, in __call__
    out = self.forward(*args)
  File "/usr/local/lib/python3.6/site-packages/mxnet/gluon/block.py", line 1393, in forward
    return self._call_cached_op(x, *args)
  File "/usr/local/lib/python3.6/site-packages/mxnet/gluon/block.py", line 1085, in _call_cached_op
    out = self._cached_op(*cargs)
  File "/usr/local/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py", line 170, in __call__
    ctypes.byref(out_stypes)))
  File "/usr/local/lib/python3.6/site-packages/mxnet/base.py", line 255, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Error in operator fasterrcnn0_concat0: [23:12:06] src/operator/nn/concat.cc:67: Check failed: shape_assign(&(*in_shape)[i], dshape): Incompatible input shape: expected [300,-1], got [15000,4]
Stack trace:
  [bt] (0) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x32952b) [0x7ff2beeed52b]
  [bt] (1) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x9edc0b) [0x7ff2bf5b1c0b]
  [bt] (2) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x35704fc) [0x7ff2c21344fc]
  [bt] (3) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x3573d88) [0x7ff2c2137d88]
  [bt] (4) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x358d9cf) [0x7ff2c21519cf]
  [bt] (5) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::CachedOp::CheckDynamicShapeExists(mxnet::Context const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, bool)+0x3f3) [0x7ff2c2163cb3]
  [bt] (6) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::CachedOp::Forward(std::shared_ptr<mxnet::CachedOp> const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&)+0xaac) [0x7ff2c2167a7c]
  [bt] (7) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(MXInvokeCachedOp+0x3f0) [0x7ff2c20675d0]
  [bt] (8) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(MXInvokeCachedOpEx+0x40) [0x7ff2c2068640]


". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/frcnn-test-1605740117-a6cf in account 254243159864 for more information.

# (Optional) Delete the Endpoint

After you have finished with this example, remember to delete the prediction endpoint to release the instance(s) associated with it.

In [36]:
predictor.delete_endpoint()
oneDNN_predictor.delete_endpoint()
mklblas_predictor.delete_endpoint()

# BERT

In [47]:
ecr_image = "763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.6.0-cpu-py36-ubuntu16.04-v3.7"
onednn_predictor = test_bert(sagemaker_session, ecr_image, 'ml.m4.xlarge', "1.6.0", role)

Deploying...
-------------------!
Predicting...


In [48]:
times = []
for i in range(10):
    output = onednn_predictor.predict(list(tweets.text[:100]))
    times.append(output['time'])

In [49]:
import numpy as np
print("Avg:", np.mean(times))
print("Std:", np.std(times))

Avg: 55.713221716880795
Std: 0.20049065918782286


In [50]:
ecr_image = "763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.6.0-cpu-py36-ubuntu16.04-v3.8"
mkl_predictor = test_bert(sagemaker_session, ecr_image, 'ml.m4.xlarge', "1.6.0", role)

Deploying...
---------------!
Predicting...


In [51]:
times = []
for i in range(10):
    output = mkl_predictor.predict(list(tweets.text[:100]))
    times.append(output["time"])

In [52]:
print("Avg:", np.mean(times))
print("Std:", np.std(times))

Avg: 48.21942739486694
Std: 0.34405162161675223


In [9]:
ecr_image = "763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.6.0-cpu-py36-ubuntu16.04-v3.8"
mklq_predictor = deploy_bert("bert_sst_quantized.tar.gz", sagemaker_session, ecr_image, 'ml.m4.xlarge', "1.6.0", role, script="bert_inference_quantized.py")

Deploying...
-----------------!
Endpoint Name: bert-1605559584-3701


In [None]:
times = []
for i in range(10):
    output = mklq_predictor.predict(list(tweets.text[:100]))
    times.append(output['time'])

In [None]:
print("Avg:", np.mean(times))
print("Std:", np.std(times))