# Better Performance with MKL BLAS on MXNet 1.6 Deep Learning Containers

Inference speed and performance is often times one of the most crucial factors for deciding to deploy a model in a production environment. Small increases in latency can be costly, so every bit of performance boost that can be had can help the overall costs. Because of our obessesion for customers, the MXNet team has brought a solution to improve the latency for inference using MXNet 1.6 Deep learning containers. In this post, we dicuss the improvement that was made to the MXNet 1.6.0 DLC version to make use of highly optimized matrix operators. The enhancement comes in the form of compiling MXNet with a dependency on Intel MKL BLAS instead of the default, oneDNN. As one will see, the performance boost can be up to 30% in latency reduction, making this a worthwhile option for our customers to implement in their production environment. To describe and show the enhancements in more detail, we will use the MNIST dataset to briefly give context of performing inference on Amazon SageMaker. Then, we will discuss the differences between the MKL BLAS and oneDNN libraries. Lastly, we will show how to implement the enhancement in your environment so that you can take advantage of the performance boost. 

**Note: MXNet 1.7+ Deep Learning Containers have this enhancement as a default, so this solution applies to customers who don't want to change the MXNet version from 1.6.0**

## Inference on Amazon SageMaker

Amazon SageMaker makes it really easy to deploy, host and maintain models. As part of that, choosing what framework or deep learning container to use is also a matter of setting a parameter. For the remainder of this post, we will use the MNIST dataset to quickly give examples for context and to show the performance difference between the MKL BLAS library and oneDNN library. But first, here is a brief example of how to perform inference in SageMaker.

### Setup

First we define a few variables that are needed to perform operations in SageMaker

In [23]:
from sagemaker import get_execution_role
from sagemaker.session import Session
from sagemaker.mxnet.model import MXNetModel, MXNetPredictor
import boto3
import gzip
import os
import struct
import numpy as np
# S3 bucket for saving code and model artifacts.
# Feel free to specify a different bucket here if you wish.
bucket = Session().default_bucket()

# Bucket location where results of model training are saved.
model_artifacts_location = 's3://{}/mxnet-mnist-example/artifacts'.format(bucket)

# IAM execution role that gives SageMaker access to resources in your AWS account.
# We can use the SageMaker Python SDK to get the role from our notebook environment. 
role = get_execution_role()
sagemaker_session = Session()
region = boto3.Session().region_name
test_data_location = 'sagemaker-sample-data-{}'.format(region)

### Download the MNIST test dataset

This dataset contains 10,000 images that 28x28 pixels.

In [9]:
sagemaker_session.download_data('test_test', test_data_location, key_prefix="mxnet/mnist/test")

### Define Utility Function
These functions will help to load data into memory

In [20]:
def load_data(path):
    with gzip.open(find_file(path, "labels.gz")) as flbl:
        struct.unpack(">II", flbl.read(8))
        labels = np.frombuffer(flbl.read(), dtype=np.int8)
    with gzip.open(find_file(path, "images.gz")) as fimg:
        _, _, rows, cols = struct.unpack(">IIII", fimg.read(16))
        images = np.frombuffer(fimg.read(), dtype=np.uint8).reshape(len(labels), rows, cols)
        images = images.reshape(images.shape[0], 1, 28, 28).astype(np.float32) / 255
    return labels, images


def find_file(root_path, file_name):
    for root, dirs, files in os.walk(root_path):
        if file_name in files:
            return os.path.join(root, file_name)


In [21]:
labels, images = load_data('test')

### Create an inference Endpoint

We use the ``MXNet model`` object to load model data and deploy an ``MXNetPredictor``. This creates a Sagemaker **Endpoint** -- a hosted prediction service that we can use to perform inference. 

The arguments to the ``deploy`` function allow us to set the number and type of instances that will be used for the Endpoint. Here we will deploy the model to a single ``ml.m4.xlarge`` instance. By not setting the ``image`` parameter, ``MXNetModel`` uses the default deep learning container image.

In [None]:
model = MXNetModel(
    model_data="s3://sagemaker-us-west-2-738657245266/mxnet-mnist-example/artifacts/mxnet-training-2020-10-21-21-49-55-741/output/model.tar.gz",
    role=role,
    entry_point="inference.py",
    framework_version="1.6.0",
    py_version="py3"
)

predictor = model.deploy(initial_instance_count=1,
                 instance_type='ml.m4.xlarge',
                endpoint_name="default-image")

In [28]:
%%time
for image in images:
    predictor.predict(image)

CPU times: user 30.1 s, sys: 573 ms, total: 30.7 s
Wall time: 1min 26s


## MKL BLAS vs oneDNN

The MKL BLAS and oneDNN libraries of math routines that are used to perform mathmatical operations on data. You can think of these as low level instructions that progamming languages use to perform computations. In the case of MXNet, it uses these libraries to for its operations such as dot products and other computationally expensive operations. MKL BLAS as version implemented by Intel, that uses highly optimized operators for CPU. These operators like the ``dot`` operator, are much faster than the operators found on the default library, oneDNN. So in order to take advantage of the speed boost, the MXNet team packaged the the MXNet 1.6 version with Intel's MKL BLAS library in a deep learning container available for use. In the diagram below, you can see the changes that are made between the different math libraries and MXNet versions.  

### Performance

In order to see the performance increase, we describe now how to implement the MKL BLAS and oneDNN based deep learning containers using SageMaker. 

#### oneDNN 

Notice the parameter ``image`` is set to a uri. This is the deep learning container uri, and the ``v3.7`` specifies a version of this MXNet 1.6 container that is compiled with oneDNN. 

In [None]:
oneDNN_model = MXNetModel(
    model_data="s3://sagemaker-us-west-2-738657245266/mxnet-mnist-example/artifacts/mxnet-training-2020-10-21-21-49-55-741/output/model.tar.gz",
    role=role,
    entry_point="inference.py",
    framework_version="1.6.0",
    py_version="py3",
    image="763104351884.dkr.ecr.us-west-2.amazonaws.com/mxnet-inference:1.6.0-cpu-py36-ubuntu16.04-v3.7"
)

oneDNN_predictor = oneDNN_model.deploy(initial_instance_count=1,
                 instance_type='ml.m4.xlarge',
                endpoint_name="oneDNN-image")

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.


------

In [31]:
%%time
for image in images:
    oneDNN_predictor.predict(image)

CPU times: user 29.6 s, sys: 530 ms, total: 30.1 s
Wall time: 1min 23s


#### MKL BLAS

To specify the deep learning container with MKL BLAS, you change the version identifier in the image uri to ``v3.8``.

In [33]:
mklblas_model = MXNetModel(
    model_data="s3://sagemaker-us-west-2-738657245266/mxnet-mnist-example/artifacts/mxnet-training-2020-10-21-21-49-55-741/output/model.tar.gz",
    role=role,
    entry_point="inference.py",
    framework_version="1.6.0",
    py_version="py3",
    image="763104351884.dkr.ecr.us-west-2.amazonaws.com/mxnet-inference:1.6.0-cpu-py36-ubuntu16.04-v3.8"
)

mklblas_predictor = mklblas_model.deploy(initial_instance_count=1,
                 instance_type='ml.m4.xlarge',
                endpoint_name="mklblas-image")

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.


---------------!

In [34]:
%%time
for image in images:
    mklblas_predictor.predict(image)

CPU times: user 29.2 s, sys: 564 ms, total: 29.7 s
Wall time: 1min 41s


# (Optional) Delete the Endpoint

After you have finished with this example, remember to delete the prediction endpoint to release the instance(s) associated with it.

In [36]:
predictor.delete_endpoint()
oneDNN_predictor.delete_endpoint()
mklblas_predictor.delete_endpoint()