-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why does enable_cpu_mem_arena
have such a large effect on memory usage during inference?
#11627
Comments
This is expected. If you disable Arena, heap memory allocation will take time, so inference latency will increase. One drawback of default Arena extend strategy is that it might allocate more memory than needed, which could be a waste. If you want to save memory, but do not want to impact latency. I recommend to set execution provider options like the following python code:
Then use one input (warm up query) that need most memory to inference after session is created. That will allocate just enough memory for your need, and also ensure that future inference does not allocate heap memory. |
Thank you so much for confirming that this is expected behavior, and for explaining the trade-offs involved here, and also for providing a more detailed configuration to address this issue. ❤️ Would it be worth adding a section in the documentation that covers the CPU memory arena? I first read through the Tune performance page before making this issue, but there is currently no mention of |
@joshuacwnewton, The suggestion sounds good. It worth to have a section about arena settings. |
Here are some info related to Arena: Example code: onnxruntime/onnxruntime/test/shared_lib/test_inference.cc Lines 1877 to 1906 in f0dff6b
It is for C API, and some settings might not be available to python API. |
@snnn, arena is enabled by default for CPU. See https://onnxruntime.ai/docs/api/python/api_summary.html#sessionoptions mentions that the option enable_cpu_mem_arena has default value True. |
Describe the bug
A clear and concise description of what the bug is. To avoid repetition please make sure this is not one of the known issues mentioned on the respective release page.
I'm performing inference using the Python API and a small ONNX model (~2MB) that was converted from a Keras
.h5
model.When running
ort_sess.run()
using default settings, memory usage skyrockets from ~200MB to ~6GB:Searching in past GitHub issues, I found mention of
enable_cpu_mem_arena
. Setting this toFalse
completely addresses the issue:The docs on
enable_cpu_mem_area
mention:But I have some questions to try to better understand what's actually going on here:
enable_cpu_mem_arena = False
?Urgency
If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.
None.
System information
To Reproduce
Here is a
.zip
containing both an.onnx
model file and a.npy
array you can load to use forinput
: enable_cpu_memory_area_example.zipExpected behavior
A clear and concise description of what you expected to happen.
Not pre-allocating 6GB of memory for a 2MB model.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.
The text was updated successfully, but these errors were encountered: