-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slowdown due to tracking_enabled() in 2.04.00 (found by Albany app) #1016
Comments
Try it again with current develop (soon to be master) where some rendezvous stuff was improved again. |
Currently the suspect is actually the
|
Regarding your 2nd point, this is a question I have had for a while. Is it in fact true that functions within a kernel can and should take views by const reference? |
Although ideally we would have a near-zero-overhead shallow copy mechanism, it seems that the current mechanism is slow enough to affect a certain application. I would personally write my own code using const references, until the overhead of the system improves (which it very well might by next release). |
If you do any shallow copies inside the kernel, also make sure you are using unmanaged views, otherwise the overhead of allocation tracking can really kill performance. |
kokkos-kernels and Tpetra tend to use unmanaged Views in kernels, and take unmanaged Views before taking subviews. |
Current indications are that this issue affects unmanaged views as well. |
Now going to attempt various mitigations. First a baseline with the latest Kokkos develop, Trilinos develop, and Albany master:
|
Making
|
Made t_tracking_enabled a static class member to assist with this [#1016]
Making
Seems like a tiny improvement, not sure if its worth it or not, at least for this application. |
Keeping
|
Both optimizations put together:
|
On the
|
@ibaned can you comment if this is back to normal? |
@stanmoore1 I think we've removed the main source of overhead introduced in Kokkos 2.04.00, and we may even have made it faster than before because we found other issues and repaired them. Merge commit 8c8a483 is the key one. At the end of the day though, the only real truth is application benchmarking with old and new Kokkos versions. |
@ikalash and I have been tracking down a slowdown in the Albany CISM climate integration, and unfortunately it bisects down to the last Kokkos promotion.
Currently we're seeing a consistent 7% slowdown in the application test we're working with, with the only difference being which version of Kokkos its compiling against.
gahansen/Albany#159 contains the detailed analysis we've done to date, which hints at several possible causes of this performance regression within Kokkos.
I suspect this will take some effort amongst the Kokkos team, but it would be good if we could find a way to restore performance.
I know there were already concerns at the time of promotion (namely, the global rendezvous performance) but its not immediately obvious to me that this is the culprit.
The text was updated successfully, but these errors were encountered: