Enable integration with mimalloc memory allocator#1673
Enable integration with mimalloc memory allocator#1673ybrnathan merged 25 commits intomicrosoft:masterfrom
Conversation
…fcarena when using mimalloc
|
It should also be noted that I don't have access to non-Windows hardware and so haven't been able to test the Linux/MacOS builds with mimalloc. The Linux/MacOS code path (add_subdirectory etc.) does build successfully when run on Windows (it's just missing the needed runtime malloc override hooks), and so most likely works on other OS's, but I can't be 100% sure there. |
|
/azp run |
|
May I know how did you observe ~10% performance improvement? Could you show us more details so that I can reproduce your experiment? |
|
Azure Pipelines successfully started running 21 pipeline(s), but failed to run 1 pipeline(s). |
|
BTW, we didn't enable the jemalloc for some reason. Overriding the global malloc/free function in a production environment is general bad, especially in Linux. Because Linux has a global symbol table. If you want to override the malloc function, you must do it before any malloc call happens. It means, it can't just do it inside onnxruntime, you'll also need to override python's malloc/free function, and pthread's, etc. |
|
@snnn Nathan Yan was kind enough to provide one of the first-party models offline so I measured with and without mimalloc. Here are the results: Hardware: Intel Xeon Gold 6252 CPU @ 2.10GHz
|
|
Kile0 is working with me on performance improvement. I will share details of the model offline. |
|
The model only took 0.02 ms for each inference? |
|
/azp run |
|
Azure Pipelines successfully started running 21 pipeline(s), but failed to run 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 21 pipeline(s), but failed to run 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 21 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 21 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 21 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 21 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 21 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 21 pipeline(s). |
This reverts commit 7648673.
|
/azp run |
|
Azure Pipelines successfully started running 21 pipeline(s). |
Description:
Enables subbing in the mimalloc memory allocator for the default memory allocator (the flag --use_mimalloc is off by default though).
It's important to note why the Windows vs Linux/MacOS builds of mimalloc differ in the CMake changes below. While the mimalloc project does have a CMake project that builds on Windows, the mimalloc dll produced doesn't have the required hooks necessary to override malloc on Windows at runtime. As this is a known issue, mimalloc provides the needed hooks via a special VS solution (both 2017 and 2019 are now supported). Linux/MacOS don't appear to have this issue and so can depend on the default CMakeLists.txt.
Motivation and Context
Mimalloc has better performance than the default ONNXRuntime allocator. In locally run experiments I observed ~10% performance improvement with mimalloc.