New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help needed for a NUMA-effects reproducer #865
Comments
Update: Following the answer from https://community.intel.com/t5/Intel-oneAPI-Threading-Building/What-is-the-current-state-of-art-solution-to-NUMA-effects-with/m-p/1405677/emcs_t/S2h8ZW1haWx8dG9waWNfc3Vic2NyaXB0aW9ufEw2REUwNjBVNUtMSjhOfDE0MDU2Nzd8U1VCU0NSSVBUSU9OU3xoSw#M15138, I ran this benchmark: https://github.com/Apress/pro-TBB/blob/master/ch20/fig_20_05.cpp, with the only change that I instead of Reason for the change was:
What I am doing is checking the running time of pinned tasks to 1 numa domain and 2 numa domains, e.g.:
I should have minimal noise in my system. |
@ikabadzhov have you seen that OpenMP is showing better bandwidth than oneTBB or you are just looking for a case that highlights that oneTBB can in some configurations shows poor performance without NUMA features activated? There are some key differences in how oneTBB and OpenMP behave by default. By default oneTBB does not pin threads to cores, while OpenMP does. By default, oneTBB uses work-stealing and auto-partitioning to balance the load across cores, while OpenMP uses static partitioning into roughly equal sized chunks and then a repeatable static scheduling of those chunks to pinned threads. Depending on the workload, these defaults can show different tolerance to NUMA effects. You can mimic many of these OpenMP defaults in oneTBB by using NUMA-aware task_arenas and/or static partitioners. |
@ikabadzhov is this issue still relevant for you? Could you please respond? |
Thanks a lot for the replies. The issue could be closed. My main goal was to observe NUMA effects in the ROOT's dataframe. For that I used tests from here. After inspecting the behaviour of ROOT with and without NUMA-aware TBB arena: brief conclusions: 1. we see NUMA effects (within the ROOT's dataframe) only on low core counts, and those effects decrease proportionally with increasing the number of cores. 2. Applying a NUMA-aware TBB arena does help on the lower core counts. We did not end up applying the NUMA aware mechanism - as 1. the most interesting cases (high core counts) are already performing good; 2. there were several limitations which we manually bypassed - for instance having several nested ForEach witihin a vector of arenas - for the purpose of benchmarks we restructured our end to have only a single ForEach call; 3. need to require for a newer tbb version (which might not be available). I might be missing something, but 1. was the main motivation. |
Dear experts,
My question is whether there is a TBB benchmark to see NUMA effects. I have tried to do map reduces on long vectors, but I never saw any NUMA effects. Do I correctly understand from https://link.springer.com/chapter/10.1007/978-1-4842-4398-5_20, that TBB has its own mechanism to optimally decide where work is done?
Note is that, using OpenMP, NUMA problems are very visible.
The text was updated successfully, but these errors were encountered: