Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA atomic_fetch_sub for doubles is hitting CAS instead of intrinsic #1624

Closed
crtrott opened this issue May 16, 2018 · 3 comments
Closed

CUDA atomic_fetch_sub for doubles is hitting CAS instead of intrinsic #1624

crtrott opened this issue May 16, 2018 · 3 comments
Assignees
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Milestone

Comments

@crtrott
Copy link
Member

crtrott commented May 16, 2018

This manifested in LAMMPS and ExaMiniMD where atomic_fetch_sub was exercised via the -= operator of an atomic view.

@crtrott crtrott added Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) Blocks Promotion Overview issue for release-blocking bugs labels May 16, 2018
@crtrott crtrott added this to the 2018 April milestone May 16, 2018
@crtrott crtrott self-assigned this May 16, 2018
@mhoemmen
Copy link
Contributor

woah

@stanmoore1
Copy link
Contributor

This caused a huge performance regression in LAMMPS.

crtrott added a commit that referenced this issue May 16, 2018
crtrott added a commit that referenced this issue May 16, 2018
Fix #1624 (AtomicSub on CUDA) and #1626 (KOKKOS_INLINE_FUNCTION) in UinqueToken for Serial
@crtrott crtrott added InDevelop and removed Blocks Promotion Overview issue for release-blocking bugs labels May 16, 2018
@stanmoore1
Copy link
Contributor

stanmoore1 commented May 16, 2018

I confirm that #1627 fixes the performance regression in LAMMPS. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Projects
None yet
Development

No branches or pull requests

4 participants