-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenMP] Performance degradation from devices reorganization #75677
Comments
@llvm/issue-subscribers-openmp Author: None (dhruvachak)
@ronlieb collected some performance numbers using versions of the upstream compiler that show degradations after the devices reorganization. It appears that https://github.com//pull/74397 is one of the patches causing the degradation. Currently, all available devices on a system are being initialized eagerly. Previously, only devices that were being used by an application would be initialized.
These are the llvm-project before/after commits used for comparison. Before SHA: bb0f162 SPEChpc 2021 benchmark 505.lbm degraded 10% when using 8 MPI ranks on a system with 8 AMD GPUs. The configuration tested is the default setting without the env-var ROCR_VISIBLE_DEVICES set. |
@dhruvachak Could you please provide a profile (LIBOMPTARGET_PROFILE) for the issue? We (= @fel-cab) tried to replicate it and it does not show up on our end. We saw a ~10% slowdown between Oct 23 and 30, maybe that is what you are seeing? |
Thanks for trying to replicate it. What system did you try it on? With how many GPUs? |
We are running on Frontier. 1 Node, 8 GPUs |
Did you run it with 8 MPI processes? |
Yes |
And you don't have something which silently sets the ROCR_VISIBLE_DEVICES right? |
Not that I know. I've been running SPEChpc weekly, with LLVM weekly build on Frontier for about a year. I have not use ROCR_VISIBLE_DEVICES. It is the first time I hear of this |
The profile will include the time we spend in deviceInit, so it will be easy to see. |
hi @fel-cab , i will take a run a this on frontier tomorrow sometime, kinda swamped today. maybe i can call you at some point and compare notes ? |
Sound good. |
We saw the degradation on PCIe interconnects. That's why you're not seeing the performance degradation on your side. |
Can you please share a profile w/ and w/o this patch (or your proposed PR)? |
I don't have any more details than the overall numbers. Slight improvements with my PR on 8GPU PCIe and no real difference on the non-PCIe 8GPU system tested. There could well be other cases where this patch makes a difference but that's the data I have for now. |
The profile system is literally builtin, just run it with the env var set to a file name and share the results. We can then see where the extra time is spend. Edit: |
I'm not sure why oversubscription should be a problem here. The issue should only concern the initial runtime setup time no? |
In that case, the profile will clearly show that part. Let's simply take a look. |
@dhruvachak Can we close this? |
Yes, not sure whether this is reproducible any more. |
@ronlieb collected some performance numbers using versions of the upstream compiler that show degradations after the devices reorganization. It appears that #74397 is one of the patches causing the degradation. Currently, all available devices on a system are being initialized eagerly. Previously, only devices that were being used by an application would be initialized.
These are the llvm-project before/after commits used for comparison.
Before SHA: bb0f162
After SHA: 77c40ea
SPEChpc 2021 benchmark 505.lbm degraded 10% when using 8 MPI ranks on a system with 8 AMD GPUs. The configuration tested is the default setting without the env-var ROCR_VISIBLE_DEVICES set.
The text was updated successfully, but these errors were encountered: