Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowdown due to tracking_enabled() in 2.04.00 (found by Albany app) #1016

Closed
ibaned opened this issue Aug 4, 2017 · 15 comments
Closed

Slowdown due to tracking_enabled() in 2.04.00 (found by Albany app) #1016

ibaned opened this issue Aug 4, 2017 · 15 comments
Assignees
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Milestone

Comments

@ibaned
Copy link
Contributor

ibaned commented Aug 4, 2017

@ikalash and I have been tracking down a slowdown in the Albany CISM climate integration, and unfortunately it bisects down to the last Kokkos promotion.

Currently we're seeing a consistent 7% slowdown in the application test we're working with, with the only difference being which version of Kokkos its compiling against.

gahansen/Albany#159 contains the detailed analysis we've done to date, which hints at several possible causes of this performance regression within Kokkos.

I suspect this will take some effort amongst the Kokkos team, but it would be good if we could find a way to restore performance.
I know there were already concerns at the time of promotion (namely, the global rendezvous performance) but its not immediately obvious to me that this is the culprit.

@ibaned ibaned added the Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) label Aug 4, 2017
@ibaned ibaned added this to the 2017-September (mid) milestone Aug 4, 2017
@hcedwar hcedwar modified the milestones: Backlog, 2017-September (mid) Aug 30, 2017
@crtrott
Copy link
Member

crtrott commented Dec 13, 2017

Try it again with current develop (soon to be master) where some rendezvous stuff was improved again.

@crtrott crtrott modified the milestones: Backlog, 2018 February Dec 13, 2017
@ibaned
Copy link
Contributor Author

ibaned commented Jan 5, 2018

Currently the suspect is actually the tracking_enabled function that was changed to query a thread-local variable during the early stages of the OpenMP backend refactor by @dsunder . Before and after commit 8216185, there is a 14% increase in the runtime of an ice sheet modeling test (50 seconds to 57 seconds). This suggests to me that portions of the code in Trilinos and Albany are doing large amounts of shallow copying of views, possibly within kernels. I suggest we do two things:

  • discuss with @dsunder regarding the thread-private implementation and search for a compromise
  • identify and reduce unecessary shallow copies in user code (for example, functions which are called in kernel code can take a view by const reference instead of by value).

@dholladay00
Copy link

Regarding your 2nd point, this is a question I have had for a while. Is it in fact true that functions within a kernel can and should take views by const reference?

@ibaned
Copy link
Contributor Author

ibaned commented Jan 5, 2018

Although ideally we would have a near-zero-overhead shallow copy mechanism, it seems that the current mechanism is slow enough to affect a certain application. I would personally write my own code using const references, until the overhead of the system improves (which it very well might by next release).

@stanmoore1
Copy link
Contributor

If you do any shallow copies inside the kernel, also make sure you are using unmanaged views, otherwise the overhead of allocation tracking can really kill performance.

@mhoemmen
Copy link
Contributor

mhoemmen commented Jan 5, 2018

kokkos-kernels and Tpetra tend to use unmanaged Views in kernels, and take unmanaged Views before taking subviews.

@ibaned
Copy link
Contributor Author

ibaned commented Jan 9, 2018

Current indications are that this issue affects unmanaged views as well.

@ibaned
Copy link
Contributor Author

ibaned commented Jan 9, 2018

Now going to attempt various mitigations. First a baseline with the latest Kokkos develop, Trilinos develop, and Albany master:

1/1 Test #128: FO_GIS_Gis20km_Tpetra ............   Passed   54.55 sec
1/1 Test #128: FO_GIS_Gis20km_Tpetra ............   Passed   55.75 sec
1/1 Test #128: FO_GIS_Gis20km_Tpetra ............   Passed   53.76 sec

@ibaned
Copy link
Contributor Author

ibaned commented Jan 9, 2018

Making tracking_enabled() an inline function (bcb78974):

1/1 Test #128: FO_GIS_Gis20km_Tpetra ............   Passed   51.34 sec
1/1 Test #128: FO_GIS_Gis20km_Tpetra ............   Passed   49.85 sec
1/1 Test #128: FO_GIS_Gis20km_Tpetra ............   Passed   49.81 sec

ibaned added a commit that referenced this issue Jan 9, 2018
Made t_tracking_enabled a static class member
to assist with this
[#1016]
@ibaned
Copy link
Contributor Author

ibaned commented Jan 9, 2018

Making t_tracking_enabled a global variable instead of a thread-local variable:

1/1 Test #128: FO_GIS_Gis20km_Tpetra ............   Passed   51.39 sec
1/1 Test #128: FO_GIS_Gis20km_Tpetra ............   Passed   48.53 sec
1/1 Test #128: FO_GIS_Gis20km_Tpetra ............   Passed   48.54 sec

Seems like a tiny improvement, not sure if its worth it or not, at least for this application.

@ibaned
Copy link
Contributor Author

ibaned commented Jan 10, 2018

Keeping tracking_enabled() non-inline (i.e. slow) but reducing the number of calls to it by fixing certain things in Kokkos::View (2b046cc):

1/1 Test #128: FO_GIS_Gis20km_Tpetra ............   Passed   52.37 sec
1/1 Test #128: FO_GIS_Gis20km_Tpetra ............   Passed   51.26 sec
1/1 Test #128: FO_GIS_Gis20km_Tpetra ............   Passed   52.04 sec

@ibaned
Copy link
Contributor Author

ibaned commented Jan 11, 2018

Both optimizations put together:

1/1 Test #128: FO_GIS_Gis20km_Tpetra ............   Passed   51.73 sec
1/1 Test #128: FO_GIS_Gis20km_Tpetra ............   Passed   51.17 sec
1/1 Test #128: FO_GIS_Gis20km_Tpetra ............   Passed   48.46 sec

@ibaned
Copy link
Contributor Author

ibaned commented Jan 18, 2018

On the kokkos-dev machine:

Albany 7660bba3effb1ab19302330dc32f8b5f681272b4
Trilinos ec787f84c3458bd77130b4b52f76d1fff76b8b0b
Kokkos 0a89b4aab5dba2ae302a2bf964fb203702bcd9c3

1/1 Test #45: FO_GIS_Gis20km_Tpetra ............   Passed   66.79 sec
1/1 Test #45: FO_GIS_Gis20km_Tpetra ............   Passed   64.92 sec
1/1 Test #45: FO_GIS_Gis20km_Tpetra ............   Passed   65.48 sec

Kokkos 7fd4dcb03cb9f5af7e3ec4d05cd4a636096a4129

1/1 Test #45: FO_GIS_Gis20km_Tpetra ............   Passed   50.71 sec
1/1 Test #45: FO_GIS_Gis20km_Tpetra ............   Passed   50.87 sec
1/1 Test #45: FO_GIS_Gis20km_Tpetra ............   Passed   50.50 sec

@stanmoore1
Copy link
Contributor

@ibaned can you comment if this is back to normal?

@ibaned
Copy link
Contributor Author

ibaned commented Jan 18, 2018

@stanmoore1 I think we've removed the main source of overhead introduced in Kokkos 2.04.00, and we may even have made it faster than before because we found other issues and repaired them. Merge commit 8c8a483 is the key one. At the end of the day though, the only real truth is application benchmarking with old and new Kokkos versions.

@ibaned ibaned changed the title Slowdown in Albany climate application Slowdown due to tracking_enabled() in 2.04.00 (found by Albany app) Jan 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Projects
None yet
Development

No branches or pull requests

8 participants