Skip to content
A port of Intel(R) MKL-DNN for a non-JIT chip (NEC SX)
C++ CMake C Shell Makefile Python
Branch: master
Clone or download
Pull request Compare This branch is 546 commits ahead, 2050 commits behind intel:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Generic MKL-DNN for vector compilers

This fork of MKL-DNN provides the same API for non-Intel chips, targeting:

  • NEC SX-Aurora TSUBASA chip, ncc compiler
  • NEC SX, sxcc compiler

It provides a "vanilla" build that removes Intel-specific JIT instructions as well as an Aurora build with a few optimized instructions for Aurora. The optimizations are a work in progress.

GEMM convolutions on SX and Aurora attain about 60-75% of the chips theoretical FLOPS/s. For Aurora, higher efficiencies are available for several primitives via the hand-coded VEDNN library.

We plan to develop some simple jit examples for convolutions on NEC's Aurora chip.

Last merge with upstream was around v0.16 of mkl-dnn (~ Sept 2018)

[Erik Kruus, NEC Labs America]

Getting started on NEC Aurora

First untar the ve*.tar.gz tarballs at the top level of the source directory. VEDNN contains optimized implementations of various convolution kernels.

tar xvfz vednnx.tar.gz
tar xvfz vejit.tar.gz

Before building, make sure CC and CXX are set to ncc and nc++ and that they are in your path. Also, NLC_HOME must be set to the base directory of the NEC Numeric Library Collection (for BLAS). For example:

export PATH=/opt/nec/ve/bin:$PATH
export NLC_HOME=/opt/nec/ve/nlc/2.0.0
export CC=ncc
export CXX=nc++

Make sure you are using CMake 3.8 or later. Then, to build with VEDNN optimized kernels:

./ -ajq

Alternatively, you can build with GEMM convolution kernels by omitting the j flag: ./ -aq.

The libraries will be built in build-vej (if using VEDNN) or build (if using GEMM), and a script of the compilation session will be in build-vej.log or build.log. You may benchmark the implementation with

cd build-vej/tests/benchdnn
ve_exec ./benchdnn --mode=P

which will time forward passes through 204 layers from various popular networks (AlexNet, VGG 19, ResNet-50, GoogleNet v1 and GoogleNet v2). The output format and additional options for testing backward passes or testing different layers are described in the benchdnn README. To run on a specific VE node, use the VE_NODE_NUMBER environment variable, and to run with multiple threads, use the OMP_NUM_THREADS environment variable.

More build options:

./ -h      # help</BR>
./ -tt     # Intel x86 build and run some tests (w/ jit)
                   # add 'v' for *vanilla* (no jit)
                   # --> build/    and build.log
# Platform Aurora:
# untar the ve*.tar.gz distro tarballs
./ -adttT  # a:Aurora platform, d:debug compile, T:trace cmake decisions
                   # a=NEC Aurora; use S for NEC SX

Without the -q flag to, Doxygen documentation will be produced in install/share/doc/mkldnn/reference/html/index.html.

Original ...

Intel MKL-DNN repository migrated to The old address will continue to be available and will redirect to the new repo. Please update your links.

Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN)

v0.16 beta

Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN) is an open source performance library for deep learning applications. The library accelerates deep learning applications and framework on Intel(R) architecture. Intel(R) MKL-DNN contains vectorized and threaded building blocks which you can use to implement deep neural networks (DNN) with C and C++ interfaces.

DNN functionality optimized for Intel architecture is also included in Intel(R) Math Kernel Library (Intel(R) MKL). API in this implementation is not compatible with Intel MKL-DNN and does not include certain new and experimental features.

This release contains performance critical functions that improve performance of of the following deep learning topologies and variations of these.

Application Example topology
Image recognition AlexNet, VGG, GoogleNet, ResNet, MobileNet
Image segmenation FCN, SegNet, MaskRCNN, U-Net
Volumetric segmentation 3D-Unet
Object detection SSD, Faster R-CNN, Yolo
Neural Machine Translation (experimental) GNMT
Speech Recognition (experimental) DeepSpeech
Adversarial Networks DCGAN, 3DGAN
Reinforcement Learning A3C

Intel MKL-DNN is used in the following software products:


Intel MKL-DNN is licensed under Apache License Version 2.0. This software includes the following third party components:


  • Introduction explains programming model and basic concepts
  • Reference manual provides detailed functionality description
  • Examples demonstrate use of C and C++ APIs in simple topologies
  • Tutorial provides step by step installation instructions and an example walkthrough


Please submit your questions, feature requests and bug reports on GitHub issues page.

WARNING The following functionality has preview status and might change without prior notification in future releases:

  • Convolutions with s16 data type in source, weights or destination
  • Convolutions and auxillary primitives for 3D spatial data
  • RNN, LSTM and GRU primitives

How to Contribute

We welcome community contributions to Intel MKL-DNN. If you have an idea how to improve the library:

  • Share your proposal via GitHub issues.
  • Ensure you can build the product and run all the examples with your patch
  • In the case of a larger feature, create a test
  • Submit a pull request

We will review your contribution and, if any additional fixes or modifications are necessary, may provide feedback to guide you. When accepted, your pull request will be merged the repository.

System Requirements

Intel MKL-DNN supports Intel(R) 64 architecture and compatible architectures. The library is optimized for the systems based on

  • Intel Atom(R) processor with Intel(R) SSE4.1 support
  • 4th, 5th, 6th and 7th generation Intel(R) Core processor
  • Intel(R) Xeon(R) processor E5 v3 family (formerly Haswell)
  • Intel Xeon processor E5 v4 family (formerly Broadwell)
  • Intel Xeon Platinum processor family (formerly Skylake)
  • Intel(R) Xeon Phi(TM) processor x200 product family (formerly Knights Landing)
  • Intel Xeon Phi processor x205 product family (formerly Knights Mill)

and compatible processors.

The software dependencies are:

  • Cmake 2.8.0 or later
  • Doxygen 1.8.5 or later
  • C++ compiler with C++11 standard support

The software was validated on RedHat* Enterprise Linux 7 with

on Windows Server* 2012 R2 with

on macOS* 10.13 (High Sierra) with

The implementation uses OpenMP* 4.0 SIMD extensions. We recommend using Intel(R) Compiler for the best performance results.


Download Intel MKL-DNN source code or clone the repository to your system

	git clone

Ensure that all software dependencies are in place and have at least minimal supported version.

Intel MKL-DNN can take advantage of optimized matrix-matrix multiplication (GEMM) function from Intel MKL. The dynamic library with this functionality is included in the repository. If you choose to build Intel MKL-DNN with the binary dependency download Intel MKL small libraries using provided script

	cd scripts && ./ && cd ..
	cd scripts && call prepare_mkl.bat && cd ..

or manually from GitHub release section and unpack it to the external directory in the repository root.

You can choose to build Intel MKL-DNN without binary dependency. The resulting version will be fully functional, however performance of certain convolution shapes and sizes and inner product relying on SGEMM function may be suboptimal.

Intel MKL-DNN uses a CMake-based build system

	mkdir -p build && cd build && cmake .. && make

Intel MKL-DNN includes unit tests implemented using the googletest framework. To validate your build, run:

	make test

Documentation is provided inline and can be generated in HTML format with Doxygen:

	make doc

Documentation will reside in build/reference/html folder.


	make install

will place the header files, libraries and documentation in /usr/local. To change the installation path, use the option -DCMAKE_INSTALL_PREFIX=<prefix> when invoking CMake.

Linking your application

Intel MKL-DNN include several header files providing C and C++ APIs for the functionality and several dynamic libraries depending on how Intel MKL-DNN was built. Intel OpenMP runtime and Intel MKL small libraries are not installed for standalone Intel MKL-DNN build.

File Description
lib/ Intel MKL-DNN dynamic library
lib/ Intel OpenMP* runtime library
lib/ Intel MKL small library for GNU* OpenMP runtime
lib/ Intel MKL small library for Intel(R) OpenMP runtime
include/mkldnn.h C header
include/mkldnn.hpp C++ header
include/mkldnn_types.h auxillary C header

Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime library to work. As different OpenMP runtimes may not be binary compatible it's important to ensure that only one OpenMP runtime is used throughout the application. Having more than one OpenMP runtime initialized may lead to undefined behavior resulting in incorrect results or crashes.

Intel MKL-DNN library built with binary dependency will link against Intel OpenMP runtime included with Intel MKL small libraries package. Intel OpenMP runtime is binary compatible with GNU OpenMP and CLANG OpenMP runtimes and is recommended for the best performance results. Here are example linklines for GNU C++ compiler and Intel C++ compiler.

	g++ -std=c++11 -I${MKLDNNROOT}/include -L${MKLDNNROOT}/lib simple_net.cpp -lmkldnn -lmklml_intel -liomp5
	icpc -std=c++11 -qopenmp -I${MKLDNNROOT}/include -L${MKLDNNROOT}/lib simple_net.cpp -lmkldnn -lmklml_intel

Using GNU compiler with -fopenmp and -liomp5 options will link the application with both Intel and GNU OpenMP runtime libraries. This will lead to undefined behavior of the application.

Intel MKL-DNN library built standalone will use OpenMP runtime supplied by the compiler, so as long as both the library and the application use the same compiler correct OpenMP runtime will be used.

	g++ -std=c++11 -fopenmp -I${MKLDNNROOT}/include -L${MKLDNNROOT}/lib simple_net.cpp -lmkldnn
	icpc -std=c++11 -qopenmp -I${MKLDNNROOT}/include -L${MKLDNNROOT}/lib simple_net.cpp -lmkldnn

Legal Information

You can’t perform that action at this time.