Skip to content

Enable integration with mimalloc memory allocator#1673

Merged
ybrnathan merged 25 commits intomicrosoft:masterfrom
kile0:kile/mimalloc
Sep 14, 2019
Merged

Enable integration with mimalloc memory allocator#1673
ybrnathan merged 25 commits intomicrosoft:masterfrom
kile0:kile/mimalloc

Conversation

@kile0
Copy link
Copy Markdown
Contributor

@kile0 kile0 commented Aug 22, 2019

Description:
Enables subbing in the mimalloc memory allocator for the default memory allocator (the flag --use_mimalloc is off by default though).

It's important to note why the Windows vs Linux/MacOS builds of mimalloc differ in the CMake changes below. While the mimalloc project does have a CMake project that builds on Windows, the mimalloc dll produced doesn't have the required hooks necessary to override malloc on Windows at runtime. As this is a known issue, mimalloc provides the needed hooks via a special VS solution (both 2017 and 2019 are now supported). Linux/MacOS don't appear to have this issue and so can depend on the default CMakeLists.txt.

Motivation and Context

  • Why is this change required? What problem does it solve?
    Mimalloc has better performance than the default ONNXRuntime allocator. In locally run experiments I observed ~10% performance improvement with mimalloc.

@kile0 kile0 requested a review from a team as a code owner August 22, 2019 17:29
@kile0
Copy link
Copy Markdown
Contributor Author

kile0 commented Aug 22, 2019

It should also be noted that I don't have access to non-Windows hardware and so haven't been able to test the Linux/MacOS builds with mimalloc. The Linux/MacOS code path (add_subdirectory etc.) does build successfully when run on Windows (it's just missing the needed runtime malloc override hooks), and so most likely works on other OS's, but I can't be 100% sure there.

@snnn
Copy link
Copy Markdown
Contributor

snnn commented Aug 22, 2019

/azp run

@snnn
Copy link
Copy Markdown
Contributor

snnn commented Aug 22, 2019

May I know how did you observe ~10% performance improvement? Could you show us more details so that I can reproduce your experiment?

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 21 pipeline(s), but failed to run 1 pipeline(s).

@snnn
Copy link
Copy Markdown
Contributor

snnn commented Aug 22, 2019

BTW, we didn't enable the jemalloc for some reason. Overriding the global malloc/free function in a production environment is general bad, especially in Linux. Because Linux has a global symbol table. If you want to override the malloc function, you must do it before any malloc call happens. It means, it can't just do it inside onnxruntime, you'll also need to override python's malloc/free function, and pthread's, etc.

@kile0
Copy link
Copy Markdown
Contributor Author

kile0 commented Aug 22, 2019

@snnn Nathan Yan was kind enough to provide one of the first-party models offline so I measured with and without mimalloc. Here are the results:

Hardware: Intel Xeon Gold 6252 CPU @ 2.10GHz
ONNXRuntime commit SHA: a6a4c4c (From: Mon Aug 12)
Compilation command: build.bat --config RelWithDebInfo --build_wheel --parallel --cmake_generator="Visual Studio 16 2019"
Run Command: onnxruntime_perf_test.exe -t 60 -o 3 classifier.onnx .\output.txt
Methodology: 8 runs, with the max/min runs thrown out

  Reference With mimalloc % Improvement
Average (ms) 0.0213 0.0189 11.2
Std Dev 0.0006 0.0003  

@ybrnathan
Copy link
Copy Markdown
Contributor

Kile0 is working with me on performance improvement. I will share details of the model offline.

@snnn
Copy link
Copy Markdown
Contributor

snnn commented Aug 22, 2019

The model only took 0.02 ms for each inference?

Comment thread tools/ci_build/build.py Outdated
Comment thread tools/ci_build/build.py
@snnn
Copy link
Copy Markdown
Contributor

snnn commented Sep 5, 2019

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 21 pipeline(s), but failed to run 1 pipeline(s).

@snnn
Copy link
Copy Markdown
Contributor

snnn commented Sep 5, 2019

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 21 pipeline(s), but failed to run 1 pipeline(s).

Comment thread cmake/onnxruntime_providers.cmake Outdated
@snnn
Copy link
Copy Markdown
Contributor

snnn commented Sep 5, 2019

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 21 pipeline(s).

Copy link
Copy Markdown
Contributor

@snnn snnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/azp run

@snnn
Copy link
Copy Markdown
Contributor

snnn commented Sep 6, 2019

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 21 pipeline(s).

@snnn
Copy link
Copy Markdown
Contributor

snnn commented Sep 6, 2019

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 21 pipeline(s).

@snnn
Copy link
Copy Markdown
Contributor

snnn commented Sep 8, 2019

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 21 pipeline(s).

@snnn
Copy link
Copy Markdown
Contributor

snnn commented Sep 9, 2019

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 21 pipeline(s).

@snnn
Copy link
Copy Markdown
Contributor

snnn commented Sep 9, 2019

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 21 pipeline(s).

@snnn
Copy link
Copy Markdown
Contributor

snnn commented Sep 10, 2019

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 21 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants