diff --git a/BUILD.md b/BUILD.md index b3827fee0d4c1..7227559050347 100644 --- a/BUILD.md +++ b/BUILD.md @@ -1,43 +1,45 @@ -# Building ONNX Runtime - Getting Started +# Building ONNX Runtime *Dockerfiles are available [here](https://github.com/microsoft/onnxruntime/tree/master/tools/ci_build/github/linux/docker) to help you get started.* *Pre-built packages are available at the locations indicated [here](https://github.com/microsoft/onnxruntime#official-builds).* -## To build the baseline CPU version of ONNX Runtime from source: -1. Checkout the source tree: +## Getting Started: Build the baseline CPU version of ONNX Runtime from source + +### Pre-Requisites +* Checkout the source tree: ``` git clone --recursive https://github.com/Microsoft/onnxruntime cd onnxruntime ``` -2. Install cmake-3.13 or better from https://cmake.org/download/. +* Install cmake-3.13 or higher from https://cmake.org/download/. -**On Windows:** -3. (optional) Install protobuf 3.6.1 from source code (cmake/external/protobuf). CMake flag protobuf\_BUILD\_SHARED\_LIBS must be turned OFF. After the installation, you should have the 'protoc' executable in your PATH. -4. (optional) Install onnx from source code (cmake/external/onnx) - ``` - export ONNX_ML=1 - python3 setup.py bdist_wheel - pip3 install --upgrade dist/*.whl - ``` -5. Run `build.bat --config RelWithDebInfo --build_shared_lib --parallel`. +### Build Instructions +#### Windows +``` +.\build.bat --config RelWithDebInfo --build_shared_lib --parallel +``` +The default Windows CMake Generator is Visual Studio 2017, but you can also use the newer Visual Studio 2019 by passing `--cmake_generator "Visual Studio 16 2019"` to `.\build.bat` + -*Note: The default Windows CMake Generator is Visual Studio 2017, but you can also use the newer Visual Studio 2019 by passing `--cmake_generator "Visual Studio 16 2019"` to build.bat.* +#### Linux +``` +./build.sh --config RelWithDebInfo --build_shared_lib --parallel +``` -**On Linux:** +#### Notes -3. (optional) Install protobuf 3.6.1 from source code (cmake/external/protobuf). CMake flag protobuf\_BUILD\_SHARED\_LIBS must be turned ON. After the installation, you should have the 'protoc' executable in your PATH. It is recommended to run `ldconfig` to make sure protobuf libraries are found. -4. If you installed your protobuf in a non standard location it would be helpful to set the following env var:`export CMAKE_ARGS="-DONNX_CUSTOM_PROTOC_EXECUTABLE=full path to protoc"` so ONNX build can find it. Also run `ldconfig ` so the linker can find protobuf libraries. -5. (optional) Install onnx from source code (cmake/external/onnx) +* Please note that these instructions build the debug build, which may have performance tradeoffs +* The build script runs all unit tests by default (for native builds and skips tests by default for cross-compiled builds). +* If you need to install protobuf 3.6.1 from source code (cmake/external/protobuf), please note: + * CMake flag protobuf\_BUILD\_SHARED\_LIBS must be turned OFF. After the installation, you should have the 'protoc' executable in your PATH. It is recommended to run `ldconfig` to make sure protobuf libraries are found. + * If you installed your protobuf in a non standard location it would be helpful to set the following env var:`export CMAKE_ARGS="-DONNX_CUSTOM_PROTOC_EXECUTABLE=full path to protoc"` so the ONNX build can find it. Also run `ldconfig ` so the linker can find protobuf libraries. +* If you'd like to install onnx from source code (cmake/external/onnx), use: ``` export ONNX_ML=1 python3 setup.py bdist_wheel pip3 install --upgrade dist/*.whl ``` -6. Run `./build.sh --config RelWithDebInfo --build_shared_lib --parallel`. - -The build script runs all unit tests by default (for native builds and skips tests by default for cross-compiled builds). - --- # Supported architectures and build environments @@ -69,14 +71,25 @@ The build script runs all unit tests by default (for native builds and skips tes |Windows 10 | YES | Not tested | |Linux | NO | YES(gcc>=5.0) | -ONNX Runtime Python bindings support Python 3.5, 3.6 and 3.7. +## System Requirements +For other system requirements and other dependencies, please see [this section](./README.md#system-requirements-pre-requisite-dependencies). --- +# Common Build Instructions +|Description|Command|Additional description| +|-----------|-----------|-----------| +|**Basic build**|build.bat (Windows)
./build.sh (Linux)|| +|**Debug build**|--config RelWithDebugInfo|Debug build| +|**Use OpenMP**|--use_openmp|OpenMP will parallelize some of the code for potential performance improvements. This is not recommended for running on single threads.| +|**Build using parallel processing**|--parallel|This is strongly recommended to speed up the build.| +|**Build Shared Library**|--build_shared_lib|| +|**Build Python wheel**|--build_wheel|| +|**Build C# and C packages**|--build_csharp|| + # Additional Build Instructions -The complete list of build options can be found by running `./build.sh (or ./build.bat) --help` +The complete list of build options can be found by running `./build.sh (or .\build.bat) --help` -* [Docker on Linux](#Docker-on-Linux) * [ONNX Runtime Server (Linux)](#Build-ONNX-Runtime-Server-on-Linux) **Execution Providers** @@ -86,7 +99,7 @@ The complete list of build options can be found by running `./build.sh (or ./bui * [Intel nGraph](#nGraph) * [Intel OpenVINO](#openvino) * [Android NNAPI](#Android) -* [Nuphar](#Nuphar) +* [Nuphar Model Compiler](#Nuphar) * [DirectML](#DirectML) **Options** @@ -98,159 +111,152 @@ The complete list of build options can be found by running `./build.sh (or ./bui * [ARM](#ARM) --- -## Docker on Linux -Install Docker: `https://docs.docker.com/install/` -**CPU** -``` -cd tools/ci_build/github/linux/docker -docker build -t onnxruntime_dev --build-arg OS_VERSION=16.04 -f Dockerfile.ubuntu . -docker run --rm -it onnxruntime_dev /bin/bash -``` +## Build ONNX Runtime Server on Linux +Read more about ONNX Runtime Server [here](https://github.com/microsoft/onnxruntime/blob/master/docs/ONNX_Runtime_Server_Usage.md) -**GPU** -If you need GPU support, please also install: -1. nvidia driver. Before doing this please add `nomodeset rd.driver.blacklist=nouveau` to your linux [kernel boot parameters](https://www.kernel.org/doc/html/v4.17/admin-guide/kernel-parameters.html). -2. nvidia-docker2: [Install doc](`https://github.com/NVIDIA/nvidia-docker/wiki/Installation-(version-2.0)`) +### Pre-Requisites +* ONNX Runtime server (and only the server) requires you to have Go installed to build, due to building BoringSSL. + See https://golang.org/doc/install for installation instructions. -To test if your nvidia-docker works: +### Build Instructions ``` -docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi +./build.sh --config RelWithDebInfo --build_server --use_openmp --parallel ``` -Then build a docker image. We provided a sample for use: -``` -cd tools/ci_build/github/linux/docker -docker build -t cuda_dev -f Dockerfile.ubuntu_gpu . -``` +ONNX Runtime Server supports sending logs to [rsyslog](https://www.rsyslog.com/) daemon. To enable it, please build with an additional parameter: `--cmake_extra_defines onnxruntime_USE_SYSLOG=1`. -Then run it +Build command: ``` -./tools/ci_build/github/linux/run_dockerbuild.sh +./build.sh --config RelWithDebInfo --build_server --use_openmp --parallel --cmake_extra_defines onnxruntime_USE_SYSLOG=1 ``` --- -## Build ONNX Runtime Server on Linux -Read more about ONNX Runtime Server [here](https://github.com/microsoft/onnxruntime/blob/master/docs/ONNX_Runtime_Server_Usage.md) -1. ONNX Runtime server (and only the server) requires you to have Go installed to build, due to building BoringSSL. - See https://golang.org/doc/install for installation instructions. -2. In the ONNX Runtime root folder, run `./build.sh --config RelWithDebInfo --build_server --use_openmp --parallel` -3. ONNX Runtime Server supports sending log to [rsyslog](https://www.rsyslog.com/) daemon. To enable it, please build with an additional parameter: `--cmake_extra_defines onnxruntime_USE_SYSLOG=1`. The build command will look like this: `./build.sh --config RelWithDebInfo --build_server --use_openmp --parallel --cmake_extra_defines onnxruntime_USE_SYSLOG=1` - ---- ## Execution Providers ### CUDA -For Linux, please use [this Dockerfile](https://github.com/microsoft/onnxruntime/blob/master/tools/ci_build/github/linux/docker/Dockerfile.ubuntu_gpu) and refer to instructions above for [building with Docker on Linux](#Docker-on-Linux) - -ONNX Runtime supports CUDA builds. You will need to download and install [CUDA](https://developer.nvidia.com/cuda-toolkit) and [cuDNN](https://developer.nvidia.com/cudnn). - -ONNX Runtime is built and tested with CUDA 10.0 and cuDNN 7.3 using the Visual Studio 2017 14.11 toolset (i.e. Visual Studio 2017 v15.3). -CUDA versions from 9.1 up to 10.1, and cuDNN versions from 7.1 up to 7.4 should also work with Visual Studio 2017. +#### Pre-Requisites +* Install [CUDA](https://developer.nvidia.com/cuda-toolkit) and [cuDNN](https://developer.nvidia.com/cudnn) + * ONNX Runtime is built and tested with CUDA 10.0 and cuDNN 7.3 using the Visual Studio 2017 14.11 toolset (i.e. Visual Studio 2017 v15.3). CUDA versions from 9.1 up to 10.1, and cuDNN versions from 7.1 up to 7.4 should also work with Visual Studio 2017. + * The path to the CUDA installation must be provided via the CUDA_PATH environment variable, or the `--cuda_home parameter` + * The path to the cuDNN installation (include the `cuda` folder in the path) must be provided via the cuDNN_PATH environment variable, or `--cudnn_home parameter`. The cuDNN path should contain `bin`, `include` and `lib` directories. + * The path to the cuDNN bin directory must be added to the PATH environment variable so that cudnn64_7.dll is found. - - The path to the CUDA installation must be provided via the CUDA_PATH environment variable, or the `--cuda_home parameter`. - - The path to the cuDNN installation (include the `cuda` folder in the path) must be provided via the cuDNN_PATH environment variable, or `--cudnn_home parameter`. The cuDNN path should contain `bin`, `include` and `lib` directories. - - The path to the cuDNN bin directory must be added to the PATH environment variable so that cudnn64_7.dll is found. - -You can build with: +#### Build Instructions +##### Windows ``` -./build.sh --use_cuda --cudnn_home /usr --cuda_home /usr/local/cuda (Linux) -./build.bat --use_cuda --cudnn_home --cuda_home (Windows) +.\build.bat --use_cuda --cudnn_home --cuda_home ``` -Depending on compatibility between the CUDA, cuDNN, and Visual Studio 2017 versions you are using, you may need to explicitly install an earlier version of the MSVC toolset. -- CUDA 10.0 is known to work with toolsets from 14.11 up to 14.16 (Visual Studio 2017 15.9), and should continue to work with future Visual Studio versions - - https://devblogs.microsoft.com/cppblog/cuda-10-is-now-available-with-support-for-the-latest-visual-studio-2017-versions/ -- CUDA 9.2 is known to work with the 14.11 MSVC toolset (Visual Studio 15.3 and 15.4) - -To install the 14.11 MSVC toolset, see - -To use the 14.11 toolset with a later version of Visual Studio 2017 you have two options: +##### Linux +``` +./build.sh --use_cuda --cudnn_home --cuda_home +``` -1. Setup the Visual Studio environment variables to point to the 14.11 toolset by running vcvarsall.bat, prior to running the build script - - e.g. if you have VS2017 Enterprise, an x64 build would use the following command -`"C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" amd64 -vcvars_ver=14.11` - - For convenience, build.amd64.1411.bat will do this and can be used in the same way as build.bat. - - e.g.` .\build.amd64.1411.bat --use_cuda` +A Dockerfile is available [here](./tools/ci_build/github/linux/docker/Dockerfile.ubuntu_gpu). -2. Alternatively if you have CMake 3.12 or later you can specify the toolset version via the `--msvc_toolset` build script parameter. - - e.g. `.\build.bat --msvc_toolset 14.11` + +#### Notes +* Depending on compatibility between the CUDA, cuDNN, and Visual Studio 2017 versions you are using, you may need to explicitly install an earlier version of the MSVC toolset. + * CUDA 10.0 is [known to work](https://devblogs.microsoft.com/cppblog/cuda-10-is-now-available-with-support-for-the-latest-visual-studio-2017-versions/) with toolsets from 14.11 up to 14.16 (Visual Studio 2017 15.9), and should continue to work with future Visual Studio versions + * CUDA 9.2 is known to work with the 14.11 MSVC toolset (Visual Studio 15.3 and 15.4) + * To install the 14.11 MSVC toolset, see [this page](https://blogs.msdn.microsoft.com/vcblog/2017/11/15/side-by-side-minor-version-msvc-toolsets-in-visual-studio-2017). + * To use the 14.11 toolset with a later version of Visual Studio 2017 you have two options: + 1. Setup the Visual Studio environment variables to point to the 14.11 toolset by running vcvarsall.bat, prior to running the build script. e.g. if you have VS2017 Enterprise, an x64 build would use the following command `"C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" amd64 -vcvars_ver=14.11` For convenience, .\build.amd64.1411.bat will do this and can be used in the same way as .\build.bat. e.g. ` .\build.amd64.1411.bat --use_cuda` -_Side note: If you have multiple versions of CUDA installed on a Windows machine and are building with Visual Studio, CMake will use the build files for the highest version of CUDA it finds in the BuildCustomization folder. + 2. Alternatively, if you have CMake 3.12 or later you can specify the toolset version via the `--msvc_toolset` build script parameter. e.g. `.\build.bat --msvc_toolset 14.11` + +* If you have multiple versions of CUDA installed on a Windows machine and are building with Visual Studio, CMake will use the build files for the highest version of CUDA it finds in the BuildCustomization folder. e.g. C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\Common7\IDE\VC\VCTargets\BuildCustomizations\. -If you want to build with an earlier version, you must temporarily remove the 'CUDA x.y.*' files for later versions from this directory._ +If you want to build with an earlier version, you must temporarily remove the 'CUDA x.y.*' files for later versions from this directory. + --- ### TensorRT -ONNX Runtime supports the TensorRT execution provider (released as preview). You will need to download and install [CUDA](https://developer.nvidia.com/cuda-toolkit), [cuDNN](https://developer.nvidia.com/cudnn) and [TensorRT](https://developer.nvidia.com/nvidia-tensorrt-download). -The TensorRT execution provider for ONNX Runtime is built and tested with CUDA 10.1, cuDNN 7.6 and TensorRT 6.0.1.5. +See more information on the TensorRT Execution Provider [here](./docs/execution_providers/TensorRT-ExecutionProvider.md). - - The path to the CUDA installation must be provided via the CUDA_PATH environment variable, or the `--cuda_home parameter`. The CUDA path should contain `bin`, `include` and `lib` directories. - - The path to the CUDA `bin` directory must be added to the PATH environment variable so that `nvcc` is found. - - The path to the cuDNN installation (path to folder that contains libcudnn.so) must be provided via the cuDNN_PATH environment variable, or `--cudnn_home parameter`. -- The path to TensorRT installation must be provided via the `--tensorrt_home parameter`. +#### Pre-Requisites +* Install [CUDA](https://developer.nvidia.com/cuda-toolkit) and [cuDNN](https://developer.nvidia.com/cudnn) + * The TensorRT execution provider for ONNX Runtime is built and tested with CUDA 10.1 and cuDNN 7.6. + * The path to the CUDA installation must be provided via the CUDA_PATH environment variable, or the `--cuda_home parameter`. The CUDA path should contain `bin`, `include` and `lib` directories. + * The path to the CUDA `bin` directory must be added to the PATH environment variable so that `nvcc` is found. + * The path to the cuDNN installation (path to folder that contains libcudnn.so) must be provided via the cuDNN_PATH environment variable, or `--cudnn_home parameter`. + * Install [TensorRT](https://developer.nvidia.com/nvidia-tensorrt-download) + * The TensorRT execution provider for ONNX Runtime is built and tested with TensorRT 6.0.1.5. + * The path to TensorRT installation must be provided via the `--tensorrt_home parameter`. -You can build from source on Linux by using the following `cmd` from the onnxruntime directory: +#### Build Instructions +##### Linux ``` -./build.sh --cudnn_home --cuda_home --use_tensorrt --tensorrt_home (Linux) +./build.sh --cudnn_home --cuda_home --use_tensorrt --tensorrt_home ``` +Dockerfile instructions are available [here](./dockerfiles#tensorrt) + --- ### MKLDNN and MKLML -To build ONNX Runtime with MKL-DNN support, build it with `./build.sh --use_mkldnn` -To build ONNX Runtime using MKL-DNN built with dependency on MKL small libraries, build it with `./build.sh --use_mkldnn --use_mklml` +See more information on MKL-DNN and MKL-ML [here](./docs/execution_providers/MKL-DNN-ExecutionProvider.md). ---- +#### Build Instructions +##### Linux +MKL-DNN: `./build.sh --use_mkldnn` -### nGraph -ONNX runtime with nGraph as an execution provider (released as preview) can be built on Linux as follows : `./build.sh --use_ngraph`. Similarly, on Windows use `.\build.bat --use_ngraph` +MKL-DNN built with dependency on MKL small libraries: `./build.sh --use_mkldnn --use_mklml` --- -### OpenVINO - -ONNX Runtime supports OpenVINO Execution Provider to enable deep learning inference using Intel® OpenVINOTM Toolkit. This execution provider supports several Intel hardware device types - CPU, integrated GPU, Intel® MovidiusTM VPUs and Intel® Vision accelerator Design with 8 Intel MovidiusTM MyriadX VPUs. - -The OpenVINO Execution Provider can be built using the following commands: - -- Currently supports and validated on two versions of OpenVINO: OpenVINO 2018 R5.0.1 and OpenVINO 2019 R1.1(Recommended). Install the OpenVINO release along with its dependencies from ([https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit)).For windows, please download 2019 R1.1 windows installer - -- Install the model optimizer prerequisites for ONNX by running - For Linux: - - /deployment_tools/model_optimizer/install_prerequisites/install_prerequisites_onnx.sh - - For Windows: - - /deployment_tools/model_optimizer/install_prerequisites/install_prerequisites_onnx.bat - -- Initialize the OpenVINO environment by running the setupvars in \\/bin using the below command: - - source setupvars.sh (Linux) - - setupvars.bat (Windows) +### nGraph +See more information on the nGraph Execution Provider [here](./docs/execution_providers/nGraph-ExecutionProvider.md). -- To configure Intel® Processor Graphics(GPU), please follow the installation steps from (https://docs.openvinotoolkit.org/2019_R1.1/_docs_install_guides_installing_openvino_linux.html#additional-GPU-steps (Linux)) - (https://docs.openvinotoolkit.org/2019_R1.1/_docs_install_guides_installing_openvino_windows.html#Install-GPU (Windows)) +#### Build Instructions +#### Windows +``` +.\build.bat --use_ngraph +``` -- To configure Intel® MovidiusTM USB, please follow the getting started guide from (https://docs.openvinotoolkit.org/2019_R1.1/_docs_install_guides_installing_openvino_linux.html#additional-NCS-steps (Linux)) -(https://docs.openvinotoolkit.org/2019_R1.1/_docs_install_guides_installing_openvino_windows.html#usb-myriad (Windows)) +##### Linux +``` +./build.sh --use_ngraph +``` -- To configure Intel® Vision Accelerator Design based on 8 MovidiusTM MyriadX VPUs, please follow the configuration guide from (https://docs.openvinotoolkit.org/2019_R1.1/_docs_install_guides_installing_openvino_linux.html#install-VPU (Linux)) -(https://docs.openvinotoolkit.org/2019_R1.1/_docs_install_guides_installing_openvino_windows.html#hddl-myriad (Windows)) +--- -- To configure Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA, please follow the configuration guide from (https://docs.openvinotoolkit.org/2019_R1.1/_docs_install_guides_VisionAcceleratorFPGA_Configure_2019R1.html) +### OpenVINO +See more information on the OpenVINO Execution Provider [here](./docs/execution_providers/OpenVINO-ExecutionProvider.md). + +#### Pre-Requisites +* Install the OpenVINO release along with its dependencies: [Windows]([https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit), [Linux](https://software.intel.com/en-us/openvino-toolkit). + * For Linux, currently supports and is validated on OpenVINO 2018 R5.0.1 and OpenVINO 2019 R1.1 (Recommended) + * For Windows, download the 2019 R1.1 Windows Installer. +* Install the model optimizer prerequisites for ONNX by running: + * Windows: `/deployment_tools/model_optimizer/install_prerequisites/install_prerequisites_onnx.bat` + * Linux: `/deployment_tools/model_optimizer/install_prerequisites/install_prerequisites_onnx.sh` +* Initialize the OpenVINO environment by running the setupvars in `\\/bin` using `setupvars.bat` (Windows) or `source setupvars.sh` (Linux) + * To configure Intel® Processor Graphics(GPU) please follow these instructions: [Windows](https://docs.openvinotoolkit.org/2019_R1.1/_docs_install_guides_installing_openvino_windows.html#Install-GPU), [Linux](https://docs.openvinotoolkit.org/2019_R1.1/_docs_install_guides_installing_openvino_linux.html#additional-GPU-steps) + * To configure Intel® MovidiusTM USB, please follow this getting started guide: [Windows](https://docs.openvinotoolkit.org/2019_R1.1/_docs_install_guides_installing_openvino_windows.html#usb-myriad), [Linux](https://docs.openvinotoolkit.org/2019_R1.1/_docs_install_guides_installing_openvino_linux.html#additional-NCS-steps) + * To configure Intel® Vision Accelerator Design based on 8 MovidiusTM MyriadX VPUs, please follow this configuration guide: [Windows](https://docs.openvinotoolkit.org/2019_R1.1/_docs_install_guides_installing_openvino_windows.html#hddl-myriad), [Linux](https://docs.openvinotoolkit.org/2019_R1.1/_docs_install_guides_installing_openvino_linux.html#install-VPU) + + +#### Build Instructions +##### Windows +``` +.\build.bat --config RelWithDebInfo --use_openvino +``` +*Note: The default Windows CMake Generator is Visual Studio 2017, but you can also use the newer Visual Studio 2019 by passing `--cmake_generator "Visual Studio 16 2019"` to `.\build.bat`* +##### Linux +``` +./build.sh --config RelWithDebInfo --use_openvino +``` -- Build ONNX Runtime using the below command. For Linux: @@ -258,13 +264,14 @@ The OpenVINO Execution Provider can be built using the following commands: For Windows: - build.bat --config RelWithDebInfo --use_openvino + .\build.bat --config RelWithDebInfo --use_openvino - *Note: The default Windows CMake Generator is Visual Studio 2017, but you can also use the newer Visual Studio 2019 by passing `--cmake_generator "Visual Studio 16 2019"` to build.bat.* + *Note: The default Windows CMake Generator is Visual Studio 2017, but you can also use the newer Visual Studio 2019 by passing `--cmake_generator "Visual Studio 16 2019"` to `.\build.bat`* --use_openvino: Builds the OpenVINO Execution Provider in ONNX Runtime. - : Specifies the hardware target for building OpenVINO Execution Provider. Below are the options for different Intel target devices. + +* ``: Specifies the hardware target for building OpenVINO Execution Provider. Below are the options for different Intel target devices. | Hardware Option | Target Device | | --------------- | ------------------------| @@ -275,7 +282,7 @@ The OpenVINO Execution Provider can be built using the following commands: | VAD-M_FP16 | Intel® Vision Accelerator Design based on 8 MovidiusTM MyriadX VPUs | | VAD-F_FP32 | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA | -For more information on OpenVINO Execution Provider's ONNX Layer support, Topology support, and Intel hardware enabled, please refer to the document OpenVINO-ExecutionProvider.md in $onnxruntime_root/docs/execution_providers +For more information on OpenVINO Execution Provider's ONNX Layer support, Topology support, and Intel hardware enabled, please refer to the document [OpenVINO-ExecutionProvider.md](./docs/execution_providers/OpenVINO-ExecutionProvider.md) in $onnxruntime_root/docs/execution_providers --- @@ -285,34 +292,33 @@ For more information on OpenVINO Execution Provider's ONNX Layer support, To 1. Get Android NDK from https://developer.android.com/ndk/downloads. Please unzip it after downloading. -2. Get a pre-compiled protoc: - - You may get it from https://github.com/protocolbuffers/protobuf/releases/download/v3.6.1/protoc-3.6.1-linux-x86_64.zip. Please unzip it after downloading. +2. Get a pre-compiled protoc from [here](https://github.com/protocolbuffers/protobuf/releases/download/v3.6.1/protoc-3.6.1-linux-x86_64.zip). Please unzip it after downloading. 3. Denote the unzip destination in step 1 as $ANDROID_NDK, append `-DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DONNX_CUSTOM_PROTOC_EXECUTABLE=path/to/protoc` to your cmake args, run cmake and make to build it. -Note: For 32-bit devices, replace `-DANDROID_ABI=arm64-v8a` to `-DANDROID_ABI=armeabi-v7a`. +#### Notes +* For 32-bit devices, replace `-DANDROID_ABI=arm64-v8a` with `-DANDROID_ABI=armeabi-v7a`. --- -### Nuphar -ONNX Runtime supports Nuphar execution provider (released as preview). It is an execution provider built on top of [TVM](https://github.com/dmlc/tvm) and [LLVM](https://llvm.org). Currently it targets to X64 CPU. - -The Nuphar execution provider for ONNX Runtime is built and tested with LLVM 9.0.0. Because of TVM's requirement when building with LLVM, you need to build LLVM from source: +### NUPHAR +See more information on the Nuphar Execution Provider [here](./docs/execution_providers/Nuphar-ExecutionProvider.md). -Windows with Visual Studio 2017: (Note here builds release flavor. Debug build of LLVM would be needed to build with Debug flavor of ONNX Runtime) -``` -REM download llvm source code 9.0.0 and unzip to \llvm\source\path, then install to \llvm\install\path -cd \llvm\source\path -mkdir build -cd build -cmake .. -G "Visual Studio 15 2017 Win64" -DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_ENABLE_DIA_SDK=OFF -msbuild llvm.sln /maxcpucount /p:Configuration=Release /p:Platform=x64 -cmake -DCMAKE_INSTALL_PREFIX=\llvm\install\path -DBUILD_TYPE=Release -P cmake_install.cmake -``` - -Note that following LLVM cmake patch is necessary to make the build work on Windows, Linux does not need to apply the patch. -The patch is to fix the linking warning LNK4199 caused by [LLVM commit](https://github.com/llvm-mirror/llvm/commit/148f823e4845c9a13faea62e3105abb80b39e4bc) +#### Pre-Requisites +* The Nuphar execution provider for ONNX Runtime is built and tested with LLVM 9.0.0. Because of TVM's requirement when building with LLVM, you need to build LLVM from source. To build the debug flavor of ONNX Runtime, you need the debug build of LLVM. + * Windows (Visual Studio 2017): + ``` + REM download llvm source code 9.0.0 and unzip to \llvm\source\path, then install to \llvm\install\path + cd \llvm\source\path + mkdir build + cd build + cmake .. -G "Visual Studio 15 2017 Win64" -DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_ENABLE_DIA_SDK=OFF + msbuild llvm.sln /maxcpucount /p:Configuration=Release /p:Platform=x64 + cmake -DCMAKE_INSTALL_PREFIX=\llvm\install\path -DBUILD_TYPE=Release -P cmake_install.cmake + ``` + +*Note that following LLVM cmake patch is necessary to make the build work on Windows, Linux does not need to apply the patch.* +The patch is to fix the linking warning LNK4199 caused by this [LLVM commit](https://github.com/llvm-mirror/llvm/commit/148f823e4845c9a13faea62e3105abb80b39e4bc) ``` diff --git "a/lib\\Support\\CMakeLists.txt" "b/lib\\Support\\CMakeLists.txt" @@ -342,66 +348,88 @@ index 7dfa97c..6d99e71 100644 set_property(TARGET LLVMSupport PROPERTY LLVM_SYSTEM_LIBS "${system_libs}") ``` + * Linux + Download llvm source code 9.0.0 and unzip to /llvm/source/path, then install to /llvm/install/path + ``` + cd /llvm/source/path + mkdir build + cd build + cmake .. -DLLVM_TARGETS_TO_BUILD=X86 -DCMAKE_BUILD_TYPE=Release + make -j$(nproc) + cmake -DCMAKE_INSTALL_PREFIX=/llvm/install/path -DBUILD_TYPE=Release -P cmake_install.cmake + ``` -Linux: +#### Build Instructions +##### Windows ``` -# download llvm source code 9.0.0 and unzip to /llvm/source/path, then install to /llvm/install/path -cd /llvm/source/path -mkdir build -cd build -cmake .. -DLLVM_TARGETS_TO_BUILD=X86 -DCMAKE_BUILD_TYPE=Release -cmake --build. -cmake -DCMAKE_INSTALL_PREFIX=/llvm/install/path -DBUILD_TYPE=Release -P cmake_install.cmake +.\build.bat --use_tvm --use_llvm --llvm_path=\llvm\install\path\lib\cmake\llvm --use_mklml --use_nuphar --build_shared_lib --build_csharp --enable_pybind --config=Release ``` -Then you can build from source by using following command from the onnxruntime directory: -Windows: +* These instructions build the release flavor. The Debug build of LLVM would be needed to build with the Debug flavor of ONNX Runtime. + +##### Linux: ``` -build.bat --use_tvm --use_llvm --llvm_path=\llvm\install\path\lib\cmake\llvm --use_mklml --use_nuphar --build_shared_lib --build_csharp --enable_pybind --config=Release +./build.sh --use_tvm --use_llvm --llvm_path=/llvm/install/path/lib/cmake/llvm --use_mklml --use_nuphar --build_shared_lib --build_csharp --enable_pybind --config=Release ``` -Linux: +Dockerfile instructions are available [here](https://github.com/microsoft/onnxruntime/tree/master/dockerfiles#nuphar-public-preview) + +### DirectML +See more information on the DirectML execution provider [here](./docs/execution_providers/DirectML-ExecutionProvider.md). +#### Windows ``` -./build.sh --use_tvm --use_llvm --llvm_path=/llvm/install/path/lib/cmake/llvm --use_mklml --use_nuphar --build_shared_lib --build_csharp --enable_pybind --config=Release +.\build.bat --use_dml ``` +#### Notes +The DirectML execution provider supports building for both x64 and x86 architectures. DirectML is only supported on Windows. + --- ## Options ### OpenMP +#### Build Instructions +##### Windows ``` -./build.sh --use_openmp (for Linux) -./build.bat --use_openmp (for Windows) +.\build.bat --use_openmp ``` ---- - -### OpenBLAS -**Windows** -Instructions how to build OpenBLAS for windows can be found here https://github.com/xianyi/OpenBLAS/wiki/How-to-use-OpenBLAS-in-Microsoft-Visual-Studio#build-openblas-for-universal-windows-platform. - -Once you have the OpenBLAS binaries, build ONNX Runtime with `./build.bat --use_openblas` +##### Linux +``` +./build.sh --use_openmp -**Linux** -For Linux (e.g. Ubuntu 16.04), install libopenblas-dev package -`sudo apt-get install libopenblas-dev` and build with `./build.sh --use_openblas` +``` --- -### DirectML - -To build onnxruntime with the [DirectML execution provider](./docs/execution_providers/DirectML-ExecutionProvider.md) included, supply the `--use_dml` parameter to build.bat. e.g. +### OpenBLAS +#### Pre-Requisites +* OpenBLAS + * Windows: See build instructions [here](https://github.com/xianyi/OpenBLAS/wiki/How-to-use-OpenBLAS-in-Microsoft-Visual-Studio#build-openblas-for-universal-windows-platform) + * Linux: Install the libopenblas-dev package `sudo apt-get install libopenblas-dev` - build.bat --use_dml +#### Build Instructions +##### Windows +``` +.\build.bat --use_openblas +``` -The DirectML execution provider supports building for both x64 and x86 architectures. DirectML is only supported on Windows. +##### Linux +``` +./build.sh --use_openblas +``` ---- +--- ## Architectures ### x86 - - For Windows, just add --x86 argument when launching build.bat - - For Linux, it must be built out of a x86 os, --x86 argument also needs be specified to build.sh +#### Build Intsructions +##### Windows +* add `--x86` argument when launching `.\build.bat` + +##### Linux +* Must be built on a x86 OS +* add --x86 argument to build.sh --- @@ -411,53 +439,7 @@ We have experimental support for Linux ARM builds. Windows on ARM is well tested #### Cross compiling for ARM with Docker (Linux/Windows - FASTER, RECOMMENDED) This method allows you to compile using a desktop or cloud VM. This is much faster than compiling natively and avoids out-of-memory issues that may be encountered when on lower-powered ARM devices. The resulting ONNX Runtime Python wheel (.whl) file is then deployed to an ARM device where it can be invoked in Python 3 scripts. -The Dockerfile used in these instructions specifically targets Raspberry Pi 3/3+ running Raspbian Stretch. The same approach should work for other ARM devices, but may require some changes to the Dockerfile such as choosing a different base image (Line 0: `FROM ...`). - -1. Install DockerCE on your development machine by following the instructions [here](https://docs.docker.com/install/) -2. Create an empty local directory - ```bash - mkdir onnx-build - cd onnx-build - ``` -3. Save the Dockerfile to your new directory - - [Dockerfile.arm32v7](https://github.com/Microsoft/onnxruntime/blob/master/dockerfiles/Dockerfile.arm32v7) -4. Run docker build - - This will build all the dependencies first, then build ONNX Runtime and its Python bindings. This will take several hours. - ```bash - docker build -t onnxruntime-arm32v7 -f Dockerfile.arm32v7 . - ``` -5. Note the full path of the `.whl` file - - - Reported at the end of the build, after the `# Build Output` line. - - It should follow the format `onnxruntime-0.3.0-cp35-cp35m-linux_armv7l.whl`, but version number may have changed. You'll use this path to extract the wheel file later. -6. Check that the build succeeded - - Upon completion, you should see an image tagged `onnxruntime-arm32v7` in your list of docker images: - ```bash - docker images - ``` -7. Extract the Python wheel file from the docker image - - (Update the path/version of the `.whl` file with the one noted in step 5) - ```bash - docker create -ti --name onnxruntime_temp onnxruntime-arm32v7 bash - docker cp onnxruntime_temp:/code/onnxruntime/build/Linux/MinSizeRel/dist/onnxruntime-0.3.0-cp35-cp35m-linux_armv7l.whl . - docker rm -fv onnxruntime_temp - ``` - This will save a copy of the wheel file, `onnxruntime-0.3.0-cp35-cp35m-linux_armv7l.whl`, to your working directory on your host machine. -8. Copy the wheel file (`onnxruntime-0.3.0-cp35-cp35m-linux_armv7l.whl`) to your Raspberry Pi or other ARM device -9. On device, install the ONNX Runtime wheel file - ```bash - sudo apt-get update - sudo apt-get install -y python3 python3-pip - pip3 install numpy - - # Install ONNX Runtime - # Important: Update path/version to match the name and location of your .whl file - pip3 install onnxruntime-0.3.0-cp35-cp35m-linux_armv7l.whl - ``` -10. Test installation by following the instructions [here](https://microsoft.github.io/onnxruntime/) +See the instructions for the the Dockerfile [here](./dockerfiles/README.md#arm-32v7). #### Cross compiling on Linux (without Docker) 1. Get the corresponding toolchain. For example, if your device is Raspberry Pi and the device os is Ubuntu 16.04, you may use gcc-linaro-6.3.1 from [https://releases.linaro.org/components/toolchain/binaries](https://releases.linaro.org/components/toolchain/binaries) @@ -541,5 +523,5 @@ ls -l /code/onnxruntime/build/Linux/MinSizeRel/dist/*.whl 1. Download and install Visual C++ compilers and libraries for ARM(64). If you have Visual Studio installed, please use the Visual Studio Installer (look under the section `Individual components` after choosing to `modify` Visual Studio) to download and install the corresponding ARM(64) compilers and libraries. -2. Use `build.bat` and specify `--arm` or `--arm64` as the build option to start building. Preferably use `Developer Command Prompt for VS` or make sure all the installed cross-compilers are findable from the command prompt being used to build using the PATH environmant variable. +2. Use `.\build.bat` and specify `--arm` or `--arm64` as the build option to start building. Preferably use `Developer Command Prompt for VS` or make sure all the installed cross-compilers are findable from the command prompt being used to build using the PATH environmant variable. diff --git a/README.md b/README.md index bf56476a1142c..f70e4efdf912d 100644 --- a/README.md +++ b/README.md @@ -6,25 +6,26 @@ [![Build Status](https://dev.azure.com/onnxruntime/onnxruntime/_apis/build/status/Linux%20GPU%20CI%20Pipeline?label=Linux+GPU)](https://dev.azure.com/onnxruntime/onnxruntime/_build/latest?definitionId=12) [![Build Status](https://dev.azure.com/onnxruntime/onnxruntime/_apis/build/status/MacOS%20CI%20Pipeline?label=MacOS+CPU)](https://dev.azure.com/onnxruntime/onnxruntime/_build/latest?definitionId=13) -**ONNX Runtime** is a performance-focused complete scoring engine for Open Neural Network Exchange (ONNX) models, with an open extensible architecture to continually address the latest developments in AI and Deep Learning. ONNX Runtime stays up to date with the ONNX standard with complete implementation of **all** ONNX operators, and supports all ONNX releases (1.2+) with both future and backwards compatibility. Please refer to [this page](docs/Versioning.md) for ONNX opset compatibility details. +**ONNX Runtime** is a performance-focused complete scoring engine for Open Neural Network Exchange (ONNX) models, with an open extensible architecture to continually address the latest developments in AI and Deep Learning. ONNX Runtime stays up to date with the ONNX standard and supports all operators from the ONNX v1.2+ spec with both forwards and backwards compatibility. Please refer to [this page](docs/Versioning.md) for ONNX opset compatibility details. [ONNX](https://onnx.ai) is an interoperable format for machine learning models supported by various ML and DNN frameworks and tools. The universal format makes it easier to interoperate between frameworks and maximize the reach of hardware optimization investments. *** + **[Key Features](#key-features)** +**[Samples and Tutorials](./samples)** + **Setup** * [Installation](#installation) -* [APIs and Official Binaries](#apis-and-official-builds) -* [Building from Source](#building-from-source) + * [APIs and Official Binaries](#apis-and-official-builds) + * [Building from Source](#building-from-source) **Usage** * [Getting ONNX Models](#getting-onnx-models) * [Deploying ONNX Runtime](#deploying-onnx-runtime) * [Performance Tuning](#performance-tuning) -**[Examples and Tutorials](#examples-and-tutorials)** - **More Info** * [Technical Design Details](#technical-design-details) * [Extensibility Options](#extensibility-options) @@ -51,30 +52,30 @@ ONNX Runtime supports both CPU and GPU. Using various graph optimizations and ac Currently ONNX Runtime supports the following accelerators: * MLAS (Microsoft Linear Algebra Subprograms) -* [DirectML](./docs/execution_providers/DirectML-ExecutionProvider.md) -* [MKL-DNN](./docs/execution_providers/MKL-DNN-ExecutionProvider.md) - [subgraph optimization](./docs/execution_providers/MKL-DNN-Subgraphs.md) -* MKL-ML + +* NVIDIA CUDA +* Intel MKL-ML +* [Intel MKL-DNN](./docs/execution_providers/MKL-DNN-ExecutionProvider.md) - [subgraph optimization](./docs/execution_providers/MKL-DNN-Subgraphs.md) * [Intel nGraph](./docs/execution_providers/nGraph-ExecutionProvider.md) -* CUDA -* [TensorRT](./docs/execution_providers/TensorRT-ExecutionProvider.md) -* [OpenVINO](./docs/execution_providers/OpenVINO-ExecutionProvider.md) -* [Nuphar](./docs/execution_providers/Nuphar-ExecutionProvider.md) +* [NVIDIA TensorRT](./docs/execution_providers/TensorRT-ExecutionProvider.md) +* [Intel OpenVINO](./docs/execution_providers/OpenVINO-ExecutionProvider.md) +* [Nuphar Model Compiler](./docs/execution_providers/Nuphar-ExecutionProvider.md) +* [DirectML](./docs/execution_providers/DirectML-ExecutionProvider.md) -Not all variations are supported in the [official release builds](#apis-and-official-builds), but can be built from source following [these instructions](./BUILD.md). Find Dockerfiles [here](./dockerfiles). +Not all variations are supported in the [official release builds](#apis-and-official-builds), but can be built from source following [these instructions](./BUILD.md). We are continuously working to integrate new execution providers for further improvements in latency and efficiency. If you are interested in contributing a new execution provider, please see [this page](docs/AddingExecutionProvider.md). ## Cross Platform -[API documentation and package installation](#installation) +ONNX Runtime is currently available for Linux, Windows, and Mac with Python, C#, C++, and C APIs. Please see [API documentation and package installation](#installation). -ONNX Runtime is currently available for Linux, Windows, and Mac with Python, C#, C++, and C APIs. If you have specific scenarios that are not supported, please share your suggestions and scenario details via [Github Issues](https://github.com/microsoft/onnxruntime/issues). *** # Installation **Quick Start:** The [ONNX-Ecosystem Docker container image](https://github.com/onnx/onnx-docker/tree/master/onnx-ecosystem) is available on Dockerhub and includes ONNX Runtime (CPU, Python), dependencies, tools to convert from various frameworks, and Jupyter notebooks to help get started. -Additional dockerfiles for some features can be found [here](./dockerfiles). +Additional dockerfiles can be found [here](./dockerfiles). ## APIs and Official Builds @@ -83,6 +84,7 @@ Additional dockerfiles for some features can be found [here](./dockerfiles). * [C](docs/C_API.md) * [C#](docs/CSharp_API.md) * [C++](./include/onnxruntime/core/session/onnxruntime_cxx_api.h) +* [Ruby](https://github.com/ankane/onnxruntime) (external project) ### Official Builds | | CPU (MLAS+Eigen) | CPU (MKL-ML) | GPU (CUDA) @@ -110,9 +112,7 @@ system. * Follow similar procedure to configure other locales on other platforms. ## Building from Source -If additional build flavors are needed, please find instructions on building from source at [Build ONNX Runtime](BUILD.md). For production scenarios, it's strongly recommended to build from an [official release branch](https://github.com/microsoft/onnxruntime/releases). - -Dockerfiles are available [here](./tools/ci_build/github/linux/docker) to help you get started. +If additional build flavors and/or dockerfiles are needed, please find instructions at [Build ONNX Runtime](BUILD.md). For production scenarios, it's strongly recommended to build only from an [official release branch](https://github.com/microsoft/onnxruntime/releases). *** # Usage @@ -126,58 +126,28 @@ Dockerfiles are available [here](./tools/ci_build/github/linux/docker) to help y * [E2E training on Azure Machine Learning Services](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-onnx) ## Deploying ONNX Runtime +### Cloud ONNX Runtime can be deployed to the cloud for model inferencing using [Azure Machine Learning Services](https://azure.microsoft.com/en-us/services/machine-learning-service). See [detailed instructions](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-build-deploy-onnx) and [sample notebooks](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/deployment/onnx). **ONNX Runtime Server (beta)** is a hosted application for serving ONNX models using ONNX Runtime, providing a REST API for prediction. Usage details can be found [here](./docs/ONNX_Runtime_Server_Usage.md), and image installation instructions are [here](./dockerfiles#onnx-runtime-server-preview). +### IoT and edge devices +The expanding focus and selection of IoT devices with sensors and consistent signal streams introduces new opportunities to move AI workloads to the edge. + +This is particularly important when there are massive volumes of incoming data/signals that may not be efficient or useful to push to the cloud due to storage or latency considerations. Consider: surveillance tapes where 99% of footage is uneventful, or real-time person detection scenarios where immediate action is required. In these scenarios, directly executing model inferencing on the target device is crucial for optimal assistance. + +To deploy AI workloads to these edge devices and take advantage of hardware acceleration capabilities on the target device, see [these reference implementations](https://github.com/Azure-Samples/onnxruntime-iot-edge). + +### Local applications +ONNX Runtime packages are published to PyPi and Nuget (see [Official Builds](#official-builds) and/or can be [built from source](./BUILD.md) for local application development. Find samples [here](https://github.com/microsoft/onnxruntime/tree/master/samples/c_cxx) using the C++ API. + +On newer Windows 10 devices (1809+), ONNX Runtime is available by default as part of the OS and is accessible via the [Windows Machine Learning APIs](https://docs.microsoft.com/en-us/windows/ai/windows-ml/). Find tutorials [here](https://docs.microsoft.com/en-us/windows/ai/windows-ml/get-started-desktop) for building a Windows Desktop or UWP application using WinML. + ## Performance Tuning ONNX Runtime is open and extensible, supporting a broad set of configurations and execution providers for model acceleration. For performance tuning guidance, please see [this page](./docs/ONNX_Runtime_Perf_Tuning.md). To tune performance for ONNX models, the [ONNX Go Live tool "OLive"](https://github.com/microsoft/OLive) provides an easy-to-use pipeline for converting models to ONNX and optimizing performance for inferencing with ONNX Runtime. -*** -# Examples and Tutorials -## Python -**Inference only** -* [Model Inferencing (single node Sigmoid)](https://github.com/onnx/onnx-docker/blob/master/onnx-ecosystem/inference_demos/simple_onnxruntime_inference.ipynb) -* [Model Inferencing (Resnet50)](https://github.com/onnx/onnx-docker/blob/master/onnx-ecosystem/inference_demos/resnet50_modelzoo_onnxruntime_inference.ipynb) -* [Model Inferencing](https://github.com/onnx/onnx-docker/tree/master/onnx-ecosystem/inference_demos) using [ONNX-Ecosystem Docker image](https://github.com/onnx/onnx-docker/tree/master/onnx-ecosystem) -* [Model Inferencing using ONNX Runtime Server (SSD Single Shot MultiBox Detector)](https://github.com/onnx/tutorials/blob/master/tutorials/OnnxRuntimeServerSSDModel.ipynb) - -**Inference with model conversion** -* [SKL Pipeline: Train, Convert, and Inference](https://microsoft.github.io/onnxruntime/auto_examples/plot_train_convert_predict.html#sphx-glr-auto-examples-plot-train-convert-predict-py) -* [Keras: Convert and Inference](https://microsoft.github.io/onnxruntime/auto_examples/plot_dl_keras.html#sphx-glr-auto-examples-plot-dl-keras-py) - -**Inference and deploy through AzureML** -* Inferencing using [ONNX Model Zoo](https://github.com/onnx/models) models: - * [Facial Expression Recognition](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/onnx/onnx-inference-facial-expression-recognition-deploy.ipynb) - * [MNIST Handwritten Digits](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.ipynb) - * [Resnet50 Image Classification](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/onnx/onnx-modelzoo-aml-deploy-resnet50.ipynb) -* Convert existing model for Inferencing: - * [TinyYolo](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/onnx/onnx-convert-aml-deploy-tinyyolo.ipynb) -* Train a model with PyTorch and Inferencing: - * [MNIST](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb) - -* GPU: Inferencing with TensorRT Execution Provider (AKS) - * [FER+](./docs/python/notebooks/onnx-inference-byoc-gpu-cpu-aks.ipynb) - -**Inference and Deploy wtih Azure IoT Edge** - * [Intel OpenVINO](http://aka.ms/onnxruntime-openvino) - * [NVIDIA TensorRT on Jetson Nano (ARM64)](http://aka.ms/onnxruntime-arm64) - -**Other** -* [Running ONNX model tests](./docs/Model_Test.md) - - -## C# -* [Inferencing Tutorial](./docs/CSharp_API.md#getting-started) - - -## C/C++ -* [C - Inferencing (SqueezeNet)](./csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/C_Api_Sample.cpp) -* [C++ - Inferencing (SqueezeNet)](./csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/CXX_Api_Sample.cpp) -* [C++ - Inferencing (MNIST)](./samples/c_cxx/MNIST) - *** # Technical Design Details * [High level architectural design](docs/HighLevelDesign.md) diff --git a/dockerfiles/README.md b/dockerfiles/README.md index 45aad1bd1d5e7..70bb31ddd336f 100644 --- a/dockerfiles/README.md +++ b/dockerfiles/README.md @@ -1,19 +1,21 @@ -# Docker containers for ONNX Runtime +# Docker Containers for ONNX Runtime -#### Build Flavors: Dockerfiles +**Dockerfiles** -- [Arm 32v7](Dockerfile.arm32v7) -- [Build from source (CPU)](Dockerfile.source) -- [CUDA + CUDNN](Dockerfile.cuda) -- [nGraph](Dockerfile.ngraph) -- [TensorRT](Dockerfile.tensorrt) -- [OpenVINO](Dockerfile.openvino) -- [ONNX Runtime Server](Dockerfile.server) -- [Nuphar](Dockerfile.nuphar) -#### Published Microsoft Container Registry (MCR) Images +- CPU [Dockerfile](Dockerfile.source), [Instructions](#cpu) +- CUDA + CUDNN: [Dockerfile](Dockerfile.cuda), [Instructions](#cuda) +- nGraph: [Dockerfile](Dockerfile.ngraph), [Instructions](#ngraph) +- TensorRT: [Dockerfile](Dockerfile.tensorrt), [Instructions](#tensorrt) +- OpenVINO: [Dockerfile](Dockerfile.openvino), [Instructions](#openvino) +- Nuphar: [Dockerfile](Dockerfile.nuphar), [Instructions](#nuphar) +- ARM 32v7: [Dockerfile](Dockerfile.arm32v7), [Instructions](#arm-32v7) +- ONNX-Ecosystem (CPU + Converters): [Dockerfile](https://github.com/onnx/onnx-docker/blob/master/onnx-ecosystem/Dockerfile), [Instructions](https://github.com/onnx/onnx-docker/tree/master/onnx-ecosystem) +- ONNX Runtime Server: [Dockerfile](Dockerfile.server), [Instructions](#onnx-runtime-server) -Use `docker pull` with any of the images and tags below to pull an image and try for yourself. Note that the build from source (CPU), CUDA, and TensorRT images include additional dependencies like miniconda for compatibility with AzureML image deployment. +**Published Microsoft Container Registry (MCR) Images** + +Use `docker pull` with any of the images and tags below to pull an image and try for yourself. Note that the CPU, CUDA, and TensorRT images include additional dependencies like miniconda for compatibility with AzureML image deployment. **Example**: Run `docker pull mcr.microsoft.com/azureml/onnxruntime:latest-cuda` to pull the latest released docker image with ONNX Runtime GPU, CUDA, and CUDNN support. @@ -26,80 +28,72 @@ Use `docker pull` with any of the images and tags below to pull an image and try | OpenVino (MYRIAD) | mcr.microsoft.com/azureml/onnxruntime | :v0.5.0-openvino-r1.1-myriad | :latest-openvino-myriad | | Server | mcr.microsoft.com/onnxruntime/server | :v0.4.0, :v0.5.0 | :latest | -## Build from Source -#### Linux 16.04, CPU, Python Bindings +--- + +# Building and using Docker images + +## CPU +**Ubuntu 16.04, CPU, Python Bindings** 1. Build the docker image from the Dockerfile in this repository. ``` - # If you have a Linux machine, preface this command with "sudo" - docker build -t onnxruntime-source -f Dockerfile.source . ``` 2. Run the Docker image ``` - # If you have a Linux machine, preface this command with "sudo" - docker run -it onnxruntime-source ``` ## CUDA -#### Linux 16.04, CUDA 10.0, CuDNN 7 +**Ubuntu 16.04, CUDA 10.0, CuDNN 7** 1. Build the docker image from the Dockerfile in this repository. ``` - # If you have a Linux machine, preface this command with "sudo" - docker build -t onnxruntime-cuda -f Dockerfile.cuda . ``` 2. Run the Docker image ``` - # If you have a Linux machine, preface this command with "sudo" - docker run -it onnxruntime-cuda ``` -## nGraph (Public Preview) -#### Linux 16.04, Python Bindings +## nGraph +*Public Preview* + +**Ubuntu 16.04, Python Bindings** 1. Build the docker image from the Dockerfile in this repository. ``` - # If you have a Linux machine, preface this command with "sudo" - docker build -t onnxruntime-ngraph -f Dockerfile.ngraph . ``` 2. Run the Docker image ``` - # If you have a Linux machine, preface this command with "sudo" - docker run -it onnxruntime-ngraph ``` ## TensorRT -#### Linux 16.04, TensorRT 5.0.2 +**Ubuntu 16.04, TensorRT 5.0.2** 1. Build the docker image from the Dockerfile in this repository. ``` - # If you have a Linux machine, preface this command with "sudo" - docker build -t onnxruntime-trt -f Dockerfile.tensorrt . ``` 2. Run the Docker image ``` - # If you have a Linux machine, preface this command with "sudo" - docker run -it onnxruntime-trt ``` -## OpenVINO (Public Preview) -#### Linux 16.04, Python Bindings +## OpenVINO +*Public Preview* + +**Ubuntu 16.04, Python Bindings** 1. Build the onnxruntime image for one of the accelerators supported below. @@ -122,7 +116,7 @@ Use `docker pull` with any of the images and tags below to pull an image and try | MYRIAD_FP16 | Intel MovidiusTM USB sticks | | VAD-M_FP16 | Intel Vision Accelerator Design based on MovidiusTM MyriadX VPUs | -## CPU +### OpenVINO on CPU 1. Retrieve your docker image in one of the following ways. @@ -140,7 +134,7 @@ Use `docker pull` with any of the images and tags below to pull an image and try docker run -it onnxruntime-cpu ``` -## GPU +### OpenVINO on GPU 1. Retrieve your docker image in one of the following ways. - Build the docker image from the DockerFile in this repository. @@ -156,7 +150,7 @@ Use `docker pull` with any of the images and tags below to pull an image and try ``` docker run -it --device /dev/dri:/dev/dri onnxruntime-gpu:latest ``` -## Myriad VPU Accelerator +### OpenVINO on Myriad VPU Accelerator 1. Retrieve your docker image in one of the following ways. - Build the docker image from the DockerFile in this repository. @@ -173,8 +167,8 @@ Use `docker pull` with any of the images and tags below to pull an image and try docker run -it --network host --privileged -v /dev:/dev onnxruntime-myriad:latest ``` -======= -## VAD-M Accelerator Version + +### OpenVINO on VAD-M Accelerator Version 1. Retrieve your docker image in one of the following ways. - Build the docker image from the DockerFile in this repository. @@ -191,40 +185,92 @@ Use `docker pull` with any of the images and tags below to pull an image and try docker run -it --device --mount type=bind,source=/var/tmp,destination=/var/tmp --device /dev/ion:/dev/ion onnxruntime-hddl:latest ``` -## ONNX Runtime Server (Public Preview) -#### Linux 16.04 +## ARM 32v7 +*Public Preview* -1. Build the docker image from the Dockerfile in this repository - ``` - docker build -t {docker_image_name} -f Dockerfile.server . - ``` +The Dockerfile used in these instructions specifically targets Raspberry Pi 3/3+ running Raspbian Stretch. The same approach should work for other ARM devices, but may require some changes to the Dockerfile such as choosing a different base image (Line 0: `FROM ...`). -2. Run the ONNXRuntime server with the image created in step 1 +1. Install DockerCE on your development machine by following the instructions [here](https://docs.docker.com/install/) +2. Create an empty local directory + ```bash + mkdir onnx-build + cd onnx-build + ``` +3. Save the Dockerfile to your new directory + - [Dockerfile.arm32v7](./Dockerfile.arm32v7) +4. Run docker build + This will build all the dependencies first, then build ONNX Runtime and its Python bindings. This will take several hours. + ```bash + docker build -t onnxruntime-arm32v7 -f Dockerfile.arm32v7 . + ``` +5. Note the full path of the `.whl` file + + - Reported at the end of the build, after the `# Build Output` line. + - It should follow the format `onnxruntime-0.3.0-cp35-cp35m-linux_armv7l.whl`, but version number may have changed. You'll use this path to extract the wheel file later. +6. Check that the build succeeded + + Upon completion, you should see an image tagged `onnxruntime-arm32v7` in your list of docker images: + ```bash + docker images + ``` +7. Extract the Python wheel file from the docker image + + (Update the path/version of the `.whl` file with the one noted in step 5) + ```bash + docker create -ti --name onnxruntime_temp onnxruntime-arm32v7 bash + docker cp onnxruntime_temp:/code/onnxruntime/build/Linux/MinSizeRel/dist/onnxruntime-0.3.0-cp35-cp35m-linux_armv7l.whl . + docker rm -fv onnxruntime_temp + ``` + This will save a copy of the wheel file, `onnxruntime-0.3.0-cp35-cp35m-linux_armv7l.whl`, to your working directory on your host machine. +8. Copy the wheel file (`onnxruntime-0.3.0-cp35-cp35m-linux_armv7l.whl`) to your Raspberry Pi or other ARM device +9. On device, install the ONNX Runtime wheel file + ```bash + sudo apt-get update + sudo apt-get install -y python3 python3-pip + pip3 install numpy + + # Install ONNX Runtime + # Important: Update path/version to match the name and location of your .whl file + pip3 install onnxruntime-0.3.0-cp35-cp35m-linux_armv7l.whl + ``` +10. Test installation by following the instructions [here](https://microsoft.github.io/onnxruntime/) + + +## Nuphar +*Public Preview* + +**Ubuntu 16.04, Python Bindings** + +1. Build the docker image from the Dockerfile in this repository. ``` - docker run -v {localModelAbsoluteFolder}:{dockerModelAbsoluteFolder} -p {your_local_port}:8001 {imageName} --model_path {dockerModelAbsolutePath} + docker build -t onnxruntime-nuphar -f Dockerfile.nuphar . ``` -3. Send HTTP requests to the container running ONNX Runtime Server - Send HTTP requests to the docker container through the binding local port. Here is the full [usage document](https://github.com/Microsoft/onnxruntime/blob/master/docs/ONNX_Runtime_Server_Usage.md). +2. Run the Docker image + ``` - curl -X POST -d "@request.json" -H "Content-Type: application/json" http://0.0.0.0:{your_local_port}/v1/models/mymodel/versions/3:predict + docker run -it onnxruntime-nuphar ``` -## Nuphar (Public Preview) -#### Linux 16.04, Python Bindings +## ONNX Runtime Server +*Public Preview* -1. Build the docker image from the Dockerfile in this repository. - ``` - # If you have a Linux machine, preface this command with "sudo" +**Ubuntu 16.04** - docker build -t onnxruntime-nuphar -f Dockerfile.nuphar . +1. Build the docker image from the Dockerfile in this repository + ``` + docker build -t {docker_image_name} -f Dockerfile.server . ``` -2. Run the Docker image +2. Run the ONNXRuntime server with the image created in step 1 ``` - # If you have a Linux machine, preface this command with "sudo" + docker run -v {localModelAbsoluteFolder}:{dockerModelAbsoluteFolder} -p {your_local_port}:8001 {imageName} --model_path {dockerModelAbsolutePath} + ``` +3. Send HTTP requests to the container running ONNX Runtime Server - docker run -it onnxruntime-nuphar + Send HTTP requests to the docker container through the binding local port. Here is the full [usage document](https://github.com/Microsoft/onnxruntime/blob/master/docs/ONNX_Runtime_Server_Usage.md). + ``` + curl -X POST -d "@request.json" -H "Content-Type: application/json" http://0.0.0.0:{your_local_port}/v1/models/mymodel/versions/3:predict ``` diff --git a/docs/ONNX_Runtime_Perf_Tuning.md b/docs/ONNX_Runtime_Perf_Tuning.md index 272f8a02aefe4..f0056cd48d763 100644 --- a/docs/ONNX_Runtime_Perf_Tuning.md +++ b/docs/ONNX_Runtime_Perf_Tuning.md @@ -1,39 +1,40 @@ -# How to tune ONNX Runtime Performance? +# ONNX Runtime Performance Tuning ## Why do we need to tune performance? -ONNX Runtime is designed to be open and scalable, it created the concept of "Execution Provider" to represents different execution kernels. +ONNX Runtime is designed to be open and extensible with its concept of "Execution Provider" to represents different execution kernels. See the [design overview](./HighLevelDesign.md). -ONNX Runtime right now supports 4 CPU execution providers, which are, default(MLAS), MKL-ML, MKLDNN and nGraph. For nVidia GPU, we support CUDA and TensorRT execution providers. (Technically, MKL-ML is not an formal execution provider since it can only enabled by using build options and does not support GetCapability interface.) -For different models and different hardware, there is no silver bullet which can always perform the best. And even for a single execution provider, many times you have several knobs to tune, like thread number, wait policy etc. +ONNX Runtime supports a variety of execution providers across CPU and GPU: [see the list here](../README.md#high-performance). +For different models and different hardware, there is no silver bullet which can always perform the best. Even for a single execution provider, often there are several knobs that can be tuned (e.g. thread number, wait policy etc.). -This document will document some basic tools and knobs you could leverage to find the best performace for your model and your hardware. +This document covers basic tools and knobs that can be leveraged to find the best performance for your model and hardware. +## Is there a tool to help with performance tuning? +Yes, the onnxruntime_perf_test.exe tool (available from the build drop) can be used to test various knobs. Please find the usage instructions using `onnxruntime_perf_test.exe -h`. -## How do I use different execution providers? -**Please be kindly noted that this is subject to change. We will try to make it consistent and easy to use across different language bindings in the future.** +Additionally, the [ONNX Go Live "OLive" tool](https://github.com/microsoft/OLive) provides an easy-to-use pipeline for converting models to ONNX and optimizing performance with ONNX Runtime. The tool can help identify the optimal runtime configuration to get the best performance on the target hardware for the model. + +## Using different execution providers ### Python API -For official python package which are released to Pypi, we only support the default CPU (MLAS) and default GPU execution provider. If you want to get other execution providers, -you need to build from source. +Official Python packages on Pypi only support the default CPU (MLAS) and default GPU (CUDA) execution providers. For other execution providers, you need to build from source. Please refer to the [build instructions](../BUILD.md). The recommended instructions build the wheel with debug info in parallel. + +For example: + +`MKLDNN: ./build.sh --config RelWithDebInfo --use_mkldnn --build_wheel --parallel` + +` CUDA: ./build.sh --config RelWithDebInfo --use_cuda --build_wheel --parallel` -Here are the build instructions: -* MKLDNN: ./build.sh --config RelWithDebInfo --use_mkldnn --build_wheel --parallel -* MKLML: ./build.sh --config RelWithDebInfo --use_mklml --build_wheel --parallel -* nGraph: ./build.sh --config RelWithDebInfo --use_ngraph --build_wheel --parallel -* CUDA: ./build.sh --config RelWithDebInfo --use_cuda --build_wheel --parallel -* TensorRT: ./build.sh --config RelWithDebInfo --use_tensorrt --build_wheel --parallel ### C and C# API -Official release (nuget package) supports default (MLAS) and MKL-ML for CPU, and CUDA for GPU. For other execution providers, you need to build from source. +Official release (nuget package) supports default (MLAS) and MKL-ML for CPU, and CUDA for GPU. For other execution providers, you need to build from source. Append `--build_csharp` to the instructions to build both C# and C packages. + +For example: + +`MKLDNN: ./build.sh --config RelWithDebInfo --use_mkldnn --build_csharp --parallel` -Similarly, here are the cmds to build from source. --build_csharp will build both C# and C package. -* MKLDNN: ./build.sh --config RelWithDebInfo --use_mkldnn --build_csharp --parallel -* MKLML: ./build.sh --config RelWithDebInfo --use_mklml --build_csharp --parallel -* nGraph: ./build.sh --config RelWithDebInfo --use_ngraph --build_csharp --parallel -* CUDA: ./build.sh --config RelWithDebInfo --use_cuda --build_csharp --parallel -* TensorRT: ./build.sh --config RelWithDebInfo --use_tensorrt --build_csharp --parallel +`CUDA: ./build.sh --config RelWithDebInfo --use_cuda --build_csharp --parallel` -In order to use MKLDNN, nGraph, CUDA, or TensorRT execution provider, you need to call C API OrtSessionOptionsAppendExecutionProvider. Here is one example for CUDA execution provider: +In order to use MKLDNN, nGraph, CUDA, or TensorRT execution provider, you need to call the C API OrtSessionOptionsAppendExecutionProvider. Here is an example for the CUDA execution provider: C API Example: ```c @@ -54,13 +55,12 @@ so.AppendExecutionProvider(ExecutionProvider.MklDnn); var session = new InferenceSession(modelPath, so); ``` -## How to tune performance for a specific execution provider? +## Tuning performance for a specific execution provider ### Default CPU Execution Provider (MLAS) -Default execution provider use different knobs to control thread number. Here are some details: -** Please kindly noted that those are subject to change in the future** +The default execution provider uses different knobs to control the thread number. -For default CPU execution provider, you can try following knobs in Python API: +For the default CPU execution provider, you can try following knobs in the Python API: ```python import onnxruntime as rt @@ -70,30 +70,34 @@ sess_options.intra_op_num_threads=2 sess_options.execution_mode=rt.ExecutionMode.ORT_SEQUENTIAL sess_options.set_graph_optimization_level(2) ``` -* sess_options.intra_op_num_threads=2 controls how many thread do you want to use to run your model -* sess_options.execution_mode=rt.ExecutionMode.ORT_SEQUENTIAL controls whether you want to execute operators in your graph sequentially or in parallel. Usually when the model has many branches, setting this option to ExecutionMode.ORT_PARALLEL will give you better performance. -* When sess_options.execution_mode=rt.ExecutionMode.ORT_PARALLEL, you can set sess_options.inter_op_num_threads to control the + +* Thread Count + * `sess_options.intra_op_num_threads=2` controls the number of threads to use to run the model +* Sequential vs Parallel Execution + * `sess_options.execution_mode=rt.ExecutionMode.ORT_SEQUENTIAL` controls whether then operators in the graph should run sequentially or in parallel. Usually when a model has many branches, setting this option to false will provide better performance. + * When `sess_options.execution_mode=rt.ExecutionMode.ORT_PARALLEL`, you can set `sess_options.inter_op_num_threads` to control the + number of threads used to parallelize the execution of the graph (across nodes). -* sess_options.set_graph_optimization_level(2). Default is 1. Please see [onnxruntime_c_api.h](../include/onnxruntime/core/session/onnxruntime_c_api.h#L241) (enum GraphOptimizationLevel) for the full list of all optimization levels. +* Graph Optimization Level + * `sess_options.set_graph_optimization_level(2)`. The default is 1. Please see [onnxruntime_c_api.h](../include/onnxruntime/core/session/onnxruntime_c_api.h#L241) (enum GraphOptimizationLevel) for the full list of all optimization levels. ### MKL_DNN/nGraph/MKL_ML Execution Provider -MKL_DNN, MKL_ML and nGraph all depends on openmp for parallization. For those execution providers, we need to use openmp enviroment variable to tune the performance. +MKL_DNN, MKL_ML and nGraph all depends on openmp for parallization. For those execution providers, we need to use the openmp enviroment variable to tune the performance. The most widely used enviroment variables are: + * OMP_NUM_THREADS=n -* OMP_WAIT_POLICY=PASSIVE/ACTIVE + * Controls the thread pool size -As you can tell from the name, OMP_NUM_THREADS controls the thread pool size, while OMP_WAIT_POLICY controls whether enable thread spinning or not. -OMP_WAIT_POLICY=PASSIVE is also called throughput mode, it will yield CPU after finishing current task. OMP_WAIT_POLICY=ACTIVE will not yield CPU, instead it will have a while loop to check -whether next task is ready or not. Use PASSIVE if your CPU usage already high, use ACTIVE when you want to trade CPU with latency. +* OMP_WAIT_POLICY=PASSIVE/ACTIVE + * Controls whether thread spinning is enabled + * PASSIVE is also called throughput mode and will yield CPU after finishing current task + * ACTIVE will not yield CPU, instead it will have a while loop to check whether the next task is ready + * Use PASSIVE if your CPU usage already high, and use ACTIVE when you want to trade CPU with latency -## Is there a tool to help tune the performance easily? -Yes, we have created a tool named onnxruntime_perf_test.exe, and you find it at the build drop. -You can use this tool to test all those knobs easily. Please find the usage of this tool by onnxruntime_perf_test.exe -h -The [ONNX Go Live "OLive" tool](https://github.com/microsoft/OLive) provides an easy-to-use pipeline for converting models to ONNX and optimizing performance with ONNX Runtime. The tool can help identify the optimal runtime configuration to get the best performance on the target hardware for the model. -## How to enable profiling and view the generated JSON file? +## Profiling and Performance Report You can enable ONNX Runtime latency profiling in code: @@ -103,10 +107,9 @@ import onnxruntime as rt sess_options = rt.SessionOptions() sess_options.enable_profiling = True ``` -Or, if you are using the onnxruntime_perf_test.exe tool, you can add -p [profile_file] to enable performance profiling. +If you are using the onnxruntime_perf_test.exe tool, you can add `-p [profile_file]` to enable performance profiling. -In both ways, you will get a JSON file, which contains the detailed performance data (threading, latency of each operator, etc). This file is a standard performance tracing file, and to view it in a user friendly way, you can open it by using chrome://tracing: +In both cases, you will get a JSON file which contains the detailed performance data (threading, latency of each operator, etc). This file is a standard performance tracing file, and to view it in a user friendly way, you can open it by using chrome://tracing: * Open chrome browser * Type chrome://tracing in the address bar * Load the generated JSON file - diff --git a/docs/execution_providers/MKL-DNN-ExecutionProvider.md b/docs/execution_providers/MKL-DNN-ExecutionProvider.md index 92019d7d0878c..1e5d302b3d148 100644 --- a/docs/execution_providers/MKL-DNN-ExecutionProvider.md +++ b/docs/execution_providers/MKL-DNN-ExecutionProvider.md @@ -1,13 +1,13 @@ # MKL-DNN Execution Provider -Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) is an open-source performance library for deep-learning applications. The library accelerates deep-learning applications and frameworks on Intel® architecture and Intel® Processor Graphics Architecture. Intel MKL-DNN contains vectorized and threaded building blocks that you can use to implement deep neural networks (DNN) with C and C++ interfaces. For more visit MKL-DNN documentation at (https://intel.github.io/mkl-dnn/) +Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) is an open-source performance library for deep-learning applications. The library accelerates deep-learning applications and frameworks on Intel® architecture and Intel® Processor Graphics Architecture. Intel MKL-DNN contains vectorized and threaded building blocks that you can use to implement deep neural networks (DNN) with C and C++ interfaces. For more, please see the MKL-DNN documentation on (https://intel.github.io/mkl-dnn/). -Intel and Microsoft have developed MKL-DNN Execution Provider (EP) for ONNX Runtime to accelerate performance of ONNX Runtime using Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) optimized primitives +Intel and Microsoft have developed MKL-DNN Execution Provider (EP) for ONNX Runtime to accelerate performance of ONNX Runtime using Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) optimized primitives. -## MKL-DNN/MKLML -To build ONNX Runtime with MKL-DNN support, build it with `./build.sh --use_mkldnn` +For information on how MKL-DNN optimizes subgraphs, see [Subgraph Optimization](./MKL-DNN-Subgraphs.md) -To build ONNX Runtime using MKL-DNN built with dependency on MKL small libraries, build it with `./build.sh --use_mkldnn --use_mklml` +## Build +For build instructions, please see the [BUILD page](../../BUILD.md#mkldnn-and-mklml). ## Supported OS * Ubuntu 16.04 @@ -16,9 +16,8 @@ To build ONNX Runtime using MKL-DNN built with dependency on MKL small libraries ## Supported backend * CPU -* More to be added soon! -## Using the nGraph execution provider +## Using the MKL-DNN Execution Provider ### C/C++ The MKLDNNExecutionProvider execution provider needs to be registered with ONNX Runtime to enable in the inference session. ``` @@ -28,7 +27,7 @@ status = session_object.Load(model_file_name); ``` The C API details are [here](../C_API.md#c-api). -## Python +### Python When using the python wheel from the ONNX Runtime built with MKL-DNN execution provider, it will be automatically prioritized over the CPU execution provider. Python APIs details are [here](https://aka.ms/onnxruntime-python). ## Performance Tuning diff --git a/docs/execution_providers/Nuphar-ExecutionProvider.md b/docs/execution_providers/Nuphar-ExecutionProvider.md index 097fe9364f3ac..05c6d57bfcc81 100644 --- a/docs/execution_providers/Nuphar-ExecutionProvider.md +++ b/docs/execution_providers/Nuphar-ExecutionProvider.md @@ -1,23 +1,23 @@ -## Nuphar Execution Provider (preview) +# Nuphar Execution Provider (preview) NUPHAR stands for Neural-network Unified Preprocessing Heterogeneous ARchitecture. As an execution provider in the ONNX Runtime, it is built on top of [TVM](https://github.com/dmlc/tvm) and [LLVM](https://llvm.org) to accelerate ONNX models by compiling nodes in subgraphs into optimized functions via JIT. It also provides JIT caching to save compilation time at runtime. -This execution provider release is currently in preview. With the Nuphar execution provider, the ONNX Runtime delivers better inferencing performance on the same hardware compared to generic X64 CPU acceleration, especially for quantized recurrent neural networks. Various products at Microsoft have seen up to a 5x improvement in performance with no loss of accuracy, by running quantized LSTMs via the Nuphar execution provider in the ONNX Runtime. +Developers can tap into the power of Nuphar through ONNX Runtime to accelerate inferencing of ONNX models. The Nuphar execution provider comes with a common ONNX to TVM lowering [library](../../onnxruntime/core/codegen) that can potentially be reused by other execution providers to leverage TVM. With the Nuphar execution provider, the ONNX Runtime delivers better inferencing performance on the same hardware compared to generic X64 CPU acceleration, especially for quantized recurrent neural networks. Various products at Microsoft have seen up to a 5x improvement in performance with no loss of accuracy, by running quantized LSTMs via the Nuphar execution provider in the ONNX Runtime. -### Build Nuphar execution provider -Developers can now tap into the power of Nuphar through ONNX Runtime to accelerate inferencing of ONNX models. Besides, the Nuphar execution provider also comes with a common ONNX to TVM lowering [library](../../onnxruntime/core/codegen), that could be reused by other execution providers to leverage TVM. Instructions to build the Nuphar execution provider from source is available [here](../../BUILD.md#nuphar). +## Build +For build instructions, please see the [BUILD page](../../BUILD.md#nuphar). -### Using the Nuphar execution provider -#### C/C++ +## Using the Nuphar execution provider +### C/C++ The Nuphar execution provider needs to be registered with ONNX Runtime to enable in the inference session. The C API details are [here](../C_API.md#c-api). ### Python You can use the Nuphar execution provider via the python wheel from the ONNX Runtime build. The Nuphar execution provider will be automatically prioritized over the default CPU execution providers, thus no need to separately register the execution provider. Python APIs details are [here](../python/api_summary.rst#api-summary). -### Using onnxruntime_perf_test/onnx_test_runner for performance and accuracy test +## Performance and Accuracy Testing You can test your ONNX model's performance with [onnxruntime_perf_test](../../onnxruntime/test/perftest/README.md), or test accuracy with [onnx_test_runner](../../onnxruntime/test/onnx/README.txt). To run these tools with the Nuphar execution provider, please pass `-e nuphar` in command line options. -### Model conversion/quantization +## Model Conversion and Quantization You may use Python script [model_editor.py](../../onnxruntime/core/providers/nuphar/scripts/model_editor.py) to turn LSTM/GRU/RNN ops to Scan ops for a given model, and then use [model_quantizer.py](../../onnxruntime/core/providers/nuphar/scripts/model_quantizer.py) to quantize MatMul ops into MatMulInteger ops. We use dynamic per-row quantization for inputs of LSTM MatMul, so MatMul becomes three parts: quantization, MatMulInteger and dequantization. Weights for MatMulInteger are statically quantized per-column to int8. We have observed good speed-up and no loss of accuracy with this quantization scheme inside Scan for various LSTM models. @@ -36,7 +36,7 @@ As an experiment, you may test conversion and quantization on [the BiDAF model]( Speed-up in this model is ~20% on Intel Xeon E5-1620v4 (Note that AVX2 is required for Nuphar int8 GEMV performance), when comparing CPU execution provider with the floating point model with LSTM ops, vs. the Nuphar execution provider with quantized MatMulInteger inside Scan ops. Profile shows that most of the cost is in input projection outside of Scan ops, which uses MKL SGEMM. It's worth noting that MKL int8 GEMM is about the same speed as SGEMM in this model, so quantization of SGEMMs outside of Scan won't help performance. We are looking at ways to speedup int8 GEMM for better performance on quantized models. -### JIT caching +## JIT caching You may cache JIT binaries to reduce model loading time spent in JIT, using [create_shared.cmd](../../onnxruntime/core/providers/nuphar/scripts/create_shared.cmd) on Windows with Visual Studio 2017, or [create_shared.sh](../../onnxruntime/core/providers/nuphar/scripts/create_shared.sh) on Linux with gcc. Windows @@ -63,6 +63,9 @@ create_shared.sh -c /path/to/jit/cache/NUPHAR_CACHE_VERSION [-m optional_model_f # run Nuphar inference again with cached JIT dll ``` + +## Debugging + ### NGEMM NGEMM (Nuphar GEMM) is an optimized low-precision GEMM implementation based on compiler techniques. Please refer to our paper for more details of NGEMM: ["NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques"](https://arxiv.org/abs/1910.00178). @@ -79,7 +82,7 @@ NGEMM has default tiling parameters, but users can overwrite them through enviro This enviornment variable is to control the loop permutation in GEMM. The default is to not apply any loop permutation. Other options are "inner/outer/all",referring to apply permutations to only inner tile loops / only outer loops / both inner and outer loops, respectively. -### Debugging + There are several [environment variables](../../onnxruntime/core/codegen/common/settings.h) to dump debug information during code generation, plus [some more environment variables](../../onnxruntime/core/providers/nuphar/common/nuphar_settings.h) to dump/control the Nuphar execution provider. You can set environment variables prior to inference to dump debug info to the console. To list some most useful ones: * CODEGEN_DUMP_LOWER @@ -105,7 +108,7 @@ There are several [environment variables](../../onnxruntime/core/codegen/common/ Set it to "1" to dump partitions. -### Settings +## Settings When there are conflicts of environment variables running Nuphar in multiple processes, user can specify settings string when creating the Nuphar execution provider. The string comprises of comma separated key:value pairs. Keys should be lower cased environment variable names as shown above, and separated from corresponding values with colon. For example, the equivalent string of setting environment variables of NUPHAR_CACHE_PATH/NUPHAR_CACHE_MODEL_CHECKSUM would be "nuphar_cache_path:, nuphar_cache_model_checksum:". * Using in C/C++ @@ -134,7 +137,7 @@ onnxruntime.capi._pybind_state.set_nuphar_settings(nuphar_settings) sess = onnxruntime.InferenceSession(model_path) ``` -### Known issues +## Known issues * ONNX shape inference dependency To save runtime JIT cost, Nuphar requires models to have shape inference information from ONNX after model is loaded. Some nodes in ONNX can generate dynamic output tensor shapes from input data value, i.e. ConstantOfShape, Tile, Slice in opset 10, Compress, etc. Those ops may block ONNX shape inference and make the part of graph after such nodes not runnable in Nuphar. @@ -155,4 +158,4 @@ git submodule sync git submodule foreach --recursive git stash git submodule foreach --recursive git clean -fd git submodule update --init --recursive -``` \ No newline at end of file +``` diff --git a/docs/execution_providers/OpenVINO-ExecutionProvider.md b/docs/execution_providers/OpenVINO-ExecutionProvider.md index 1d5838268d3f6..e52a6466e1b49 100644 --- a/docs/execution_providers/OpenVINO-ExecutionProvider.md +++ b/docs/execution_providers/OpenVINO-ExecutionProvider.md @@ -1,10 +1,13 @@ -# Hardware Enabled with OpenVINO Execution Provider +# OpenVINO Execution Provider OpenVINO Execution Provider enables deep learning inference on Intel CPUs, Intel integrated GPUs and Intel® MovidiusTM Vision Processing Units (VPUs). Please refer to [this](https://software.intel.com/en-us/openvino-toolkit/hardware) page for details on the Intel hardware supported. -# ONNX Layers supported using OpenVINO +## Build +For build instructions, please see the [BUILD page](../../BUILD.md#openvino). -Below table shows the ONNX layers supported using OpenVINO Execution Provider and the mapping between ONNX layers and OpenVINO layers. The below table also lists the Intel hardware support for each of the layers. CPU refers to Intel® +## ONNX Layers supported using OpenVINO + +The table below shows the ONNX layers supported using OpenVINO Execution Provider and the mapping between ONNX layers and OpenVINO layers. The below table also lists the Intel hardware support for each of the layers. CPU refers to Intel® Atom, Core, and Xeon processors. GPU refers to the Intel Integrated Graphics. VPU refers to USB based Intel® MovidiusTM VPUs as well as Intel® Vision accelerator Design with Intel Movidius TM MyriadX VPU. @@ -35,11 +38,11 @@ VPUs as well as Intel® Vision accelerator Design with Intel Movidiu *MatMul is supported in GPU only when the following layer is an Add layer in the topology. -# Topology Support +## Topology Support Below topologies are supported from ONNX open model zoo using OpenVINO Execution Provider -## Image Classification Networks +### Image Classification Networks | **Topology** | **CPU** | **GPU** | **VPU** | | --- | --- | --- | --- | @@ -68,7 +71,7 @@ Below topologies are supported from ONNX open model zoo using OpenVINO Execution | vgg19 | Yes | Yes | Yes -## Image Recognition Networks +### Image Recognition Networks | **Topology** | **CPU** | **GPU** | **VPU** | | --- | --- | --- | --- | @@ -76,14 +79,14 @@ Below topologies are supported from ONNX open model zoo using OpenVINO Execution **Inception_v1 and MNIST are supported in OpenVINO R1.1 and are not supported in OpenVINO R5.0.1. -## Object Detection Networks +### Object Detection Networks | **Topology** | **CPU** | **GPU** | **VPU** | | --- | --- | --- | --- | |TinyYOLOv2 | Yes | Yes | Yes | ResNet101\_DUC\_HDC | Yes | No | No -# Application code changes for VAD-M performance scaling +## Application code changes for VAD-M performance scaling VAD-M has 8 VPUs and is suitable for applications that require multiple inferences to run in parallel. We use batching approach for performance scaling on VAD-M. @@ -100,14 +103,14 @@ import numpy import time import glob ~~~ -### Load the input onnx model +#### Load the input onnx model ~~~ sess = rt.InferenceSession(str(sys.argv[1])) print("\n") ~~~ -### Preprocessing input images +#### Preprocessing input images ~~~ for i in range(iters): y = None @@ -126,11 +129,11 @@ for i in range(iters): y = numpy.concatenate((y,x), axis=0) ~~~ -### Start Inference +#### Start Inference ~~~ res = sess.run([sess.get_outputs()[0].name], {sess.get_inputs()[0].name: y}) ~~~ -### Post-processing output results +#### Post-processing output results ~~~ print("Output probabilities:") i = 0 diff --git a/docs/execution_providers/TensorRT-ExecutionProvider.md b/docs/execution_providers/TensorRT-ExecutionProvider.md index 2290c7b55f0d6..56a86acaab0c3 100644 --- a/docs/execution_providers/TensorRT-ExecutionProvider.md +++ b/docs/execution_providers/TensorRT-ExecutionProvider.md @@ -1,14 +1,14 @@ -## TensortRT Execution Provider +# TensortRT Execution Provider The TensorRT execution provider in the ONNX Runtime makes use of NVIDIA's [TensortRT](https://developer.nvidia.com/tensorrt) Deep Learning inferencing engine to accelerate ONNX model in their family of GPUs. Microsoft and NVIDIA worked closely to integrate the TensorRT execution provider with ONNX Runtime. With the TensorRT execution provider, the ONNX Runtime delivers better inferencing performance on the same hardware compared to generic GPU acceleration. -### Build TensorRT execution provider -Developers can now tap into the power of TensorRT through ONNX Runtime to accelerate inferencing of ONNX models. Instructions to build the TensorRT execution provider from source are available [here](../../BUILD.md#build). [Dockerfiles](../../dockerfiles#tensorrt-version-preview) are available for convenience. +## Build +For build instructions, please see the [BUILD page](../../BUILD.md#tensorrt). -### Using the TensorRT execution provider -#### C/C++ +## Using the TensorRT execution provider +### C/C++ The TensortRT execution provider needs to be registered with ONNX Runtime to enable in the inference session. ``` InferenceSession session_object{so}; @@ -20,23 +20,22 @@ The C API details are [here](../C_API.md#c-api). ### Python When using the Python wheel from the ONNX Runtime build with TensorRT execution provider, it will be automatically prioritized over the default GPU or CPU execution providers. There is no need to separately register the execution provider. Python APIs details are [here](https://microsoft.github.io/onnxruntime/api_summary.html). - -### Sample +#### Sample Please see [this Notebook](../python/notebooks/onnx-inference-byoc-gpu-cpu-aks.ipynb) for an example of running a model on GPU using ONNX Runtime through Azure Machine Learning Services. -### Performance Tuning +## Performance Tuning For performance tuning, please see guidance on this page: [ONNX Runtime Perf Tuning](../ONNX_Runtime_Perf_Tuning.md) When/if using [onnxruntime_perf_test](../../onnxruntime/test/perftest#onnxruntime-performance-test), use the flag `-e tensorrt` -### Configuring Engine Max Batch Size and Workspace Size +## Configuring Engine Max Batch Size and Workspace Size By default TensorRT execution provider builds an ICudaEngine with max batch size = 1 and max workspace size = 1 GB One can override these defaults by setting environment variables ORT_TENSORRT_MAX_BATCH_SIZE and ORT_TENSORRT_MAX_WORKSPACE_SIZE. e.g. on Linux -#### override default batch size to 10 +### override default batch size to 10 export ORT_TENSORRT_MAX_BATCH_SIZE=10 -#### override default max workspace size to 2GB +### override default max workspace size to 2GB export ORT_TENSORRT_MAX_WORKSPACE_SIZE=2147483648 diff --git a/docs/execution_providers/nGraph-ExecutionProvider.md b/docs/execution_providers/nGraph-ExecutionProvider.md index aa8dca11dbb2b..3c53e5d461ae5 100644 --- a/docs/execution_providers/nGraph-ExecutionProvider.md +++ b/docs/execution_providers/nGraph-ExecutionProvider.md @@ -1,27 +1,26 @@

- +

-## nGraph Execution Provider +# nGraph Execution Provider [nGraph](https://github.com/NervanaSystems/ngraph) is a deep learning compiler from Intel®. The integration of nGraph as an execution provider (EP) into ONNX Runtime accelerates performance of ONNX model workloads across wide range of hardware offerings. Microsoft and Intel worked closely to integrate the nGraph EP with ONNX Runtime to showcase the benefits of quantization (int8). The nGraph EP leverages Intel® DL Boost and delivers performance increase with minimal loss of accuracy relative to FP32 ONNX models. With the nGraph EP, the ONNX Runtime delivers better inference performance across range of Intel hardware including Intel® Xeon® Processors compared to a generic CPU execution provider. -### Build nGraph execution provider -Developers can now tap into the power of nGraph through ONNX Runtime to accelerate inference performance of ONNX models. Instructions for building the nGraph execution provider from the source is available [here](../../BUILD.md#nGraph). +## Build +For build instructions, please see the [BUILD page](../../BUILD.md#nGraph). +## Supported OS While the nGraph Compiler stack supports various operating systems and backends ([full list available here](https://www.ngraph.ai/ecosystem)), the nGraph execution provider for ONNX Runtime is validated for the following: -### Supported OS * Ubuntu 16.04 * Windows 10 (`DEX_ONLY` mode is only one supported for the moment, codegen mode is work-in-progress.) * More to be added soon! -### Supported backend +## Supported backend * CPU -* More to be added soon! -### Using the nGraph execution provider -#### C/C++ +## Using the nGraph execution provider +### C/C++ To use nGraph as execution provider for inferencing, please register it as below. ``` InferenceSession session_object{so}; @@ -33,7 +32,7 @@ The C API details are [here](../C_API.md#c-api). ### Python When using the python wheel from the ONNX Runtime built with nGraph execution provider, it will be automatically prioritized over the CPU execution provider. Python APIs details are [here](../python/api_summary.rst#api-summary). -### Performance Tuning +## Performance Tuning For performance tuning, please see guidance on this page: [ONNX Runtime Perf Tuning](../ONNX_Runtime_Perf_Tuning.md) When/if using [onnxruntime_perf_test](../../onnxruntime/test/perftest), use the flag -e ngraph diff --git a/samples/README.md b/samples/README.md new file mode 100644 index 0000000000000..cd93dc3bbfcb4 --- /dev/null +++ b/samples/README.md @@ -0,0 +1,52 @@ +# ONNX Runtime Samples and Tutorials + +Here you will find various samples, tutorials, and reference implementations for using ONNX Runtime. +For a list of available dockerfiles and published images to help with getting started, see [this page](../dockerfiles/README.md). + +* [Python](#Python) +* [C#](#C) +* [C/C++](#CC) +*** + +## Python +**Inference only** +* [Basic Model Inferencing (single node Sigmoid) on CPU](https://github.com/onnx/onnx-docker/blob/master/onnx-ecosystem/inference_demos/simple_onnxruntime_inference.ipynb) +* [Model Inferencing (Resnet50) on CPU](https://github.com/onnx/onnx-docker/blob/master/onnx-ecosystem/inference_demos/resnet50_modelzoo_onnxruntime_inference.ipynb) +* [Model Inferencing on CPU](https://github.com/onnx/onnx-docker/tree/master/onnx-ecosystem/inference_demos) using [ONNX-Ecosystem Docker image](https://github.com/onnx/onnx-docker/tree/master/onnx-ecosystem) +* [Model Inferencing on CPU using ONNX Runtime Server (SSD Single Shot MultiBox Detector)](https://github.com/onnx/tutorials/blob/master/tutorials/OnnxRuntimeServerSSDModel.ipynb) +* [Model Inferencing using NUPHAR Execution Provider](../docs/python/notebooks/onnxruntime-nuphar-tutorial.ipynb) + +**Inference with model conversion** +* [SKL Pipeline: Train, Convert, and Inference](https://microsoft.github.io/onnxruntime/tutorial.html) +* [Keras: Convert and Inference](https://microsoft.github.io/onnxruntime/auto_examples/plot_dl_keras.html#sphx-glr-auto-examples-plot-dl-keras-py) + +**Inference and deploy through AzureML** +* Inferencing on CPU using [ONNX Model Zoo](https://github.com/onnx/models) models: + * [Facial Expression Recognition](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/onnx/onnx-inference-facial-expression-recognition-deploy.ipynb) + * [MNIST Handwritten Digits](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.ipynb) + * [Resnet50 Image Classification](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/onnx/onnx-modelzoo-aml-deploy-resnet50.ipynb) +* Inferencing on CPU with model conversion step for existing models: + * [TinyYolo](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/onnx/onnx-convert-aml-deploy-tinyyolo.ipynb) +* Inferencing on CPU with PyTorch model training: + * [MNIST](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb) + + *For aditional information on training in AzureML, please see [AzureML Training Notebooks](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/training)* + +* Inferencing on GPU with TensorRT Execution Provider (AKS) + * [FER+](../docs/python/notebooks/onnx-inference-byoc-gpu-cpu-aks.ipynb) + +**Inference and Deploy wtih Azure IoT Edge** + * [Intel OpenVINO](http://aka.ms/onnxruntime-openvino) + * [NVIDIA TensorRT on Jetson Nano (ARM64)](http://aka.ms/onnxruntime-arm64) + +**Other** +* [Running ONNX model tests](./docs/Model_Test.md) +* [Common Errors with explanations](https://microsoft.github.io/onnxruntime/auto_examples/plot_common_errors.html#sphx-glr-auto-examples-plot-common-errors-py) + +## C# +* [Inferencing Tutorial](../docs/CSharp_API.md#getting-started) + +## C/C++ +* [C - Inferencing (SqueezeNet)](../csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/C_Api_Sample.cpp) +* [C++ - Inferencing (SqueezeNet)](../csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/CXX_Api_Sample.cpp) +* [C++ - Inferencing (MNIST)](../samples/c_cxx/MNIST) diff --git a/samples/c_cxx/README.md b/samples/c_cxx/README.md index a372c4a85b36b..3e15190ec336f 100644 --- a/samples/c_cxx/README.md +++ b/samples/c_cxx/README.md @@ -14,9 +14,8 @@ This directory contains a few C/C++ sample applications for demoing onnxruntime You may get a precompiled libpng library from [https://onnxruntimetestdata.blob.core.windows.net/models/libpng.zip](https://onnxruntimetestdata.blob.core.windows.net/models/libpng.zip) ## Install ONNX Runtime -You may either get a prebuit onnxruntime from nuget.org, or build it from source by following the [BUILD.md document](../../../BUILD.md). -If you build it by yourself, you must append the "--build_shared_lib" flag to your build command. Like: - +You may either get a prebuit onnxruntime from nuget.org, or build it from source by following the [build instructions](../../BUILD.md). +If you build it by yourself, you must append the "--build_shared_lib" flag to your build command. ``` build.bat --config RelWithDebInfo --build_shared_lib --parallel ```