-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't use __constant__ cache for lock arrays, enable once per run update instead of once per call #1385
Comments
@crtrott let me get this straight. I think these are true?
If so, some questions:
|
Yes right on every front. And no this is not an issue as long as we dont hit a hundred million translation units, and while I am hesitant to put anything beyond our capacity to produce stupendously complex software I believe even we wont reach that ;-) |
To answer the other question: we neex to update the first time we call a kernel in a guven translation unit. Thats what I tried to do though actually thinking about it i might do it more often than that (i.e. once per kernel) but either should solve our performance issue. |
Okay, so basically each translation unit will have an initialize, but it is persistent so if we visit the same translation unit twice it won't re-copy the arrays. That sounds pretty acceptable to me. Can we get a PR for this by the February milestone? I can help if needed. |
The PR is already there. You just need to approve it 👍 |
Address issue #1385 not using __constant__ for lock arrays on CUDA
Based on the discussion #1375 I checked what happens if you don't use constant cache for the lock arrays. Basically we loose something like 2% in a "big atomics" benchmark (kokkos/benchmarks/atomics using
./test.cuda 100000 100000 100 1000 1 100 5
) both on Pascal and Kepler. But in miniMD for a small test I get the same performance as with RDC now (that test doesn't need the lock arrays). Being able to do this is based on the fact that device symbols have different scope semantics than device constant symbols according to discussions with NVIDIA folks.The text was updated successfully, but these errors were encountered: