Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hotfix/stackframe #1369

Merged
merged 19 commits into from
Apr 8, 2023
Merged

Hotfix/stackframe #1369

merged 19 commits into from
Apr 8, 2023

Conversation

maddyscientist
Copy link
Member

This PR primarily serves to remove the induced stack frame in a number of kernels

  • Use flavor/chirality swizzle trick in non-degenerate Twisted-mass preconditioned clover application kernel
  • Laplace and Staggered quark smearing kernel with NVSHMEM (lack of __force_inline__)
  • Use SharedMemoryCache to act as virtual registers (Symanzik improved Wilson-flow and STOUT kernels)
    • In doing so we add support for multiple concurrent dynamic cache objects through use of an offset to the base shared-memory pointer
  • Reordering of clover derivative kernel to reduce register pressure

Some other minor changes:

@kostrzewa
Copy link
Member

Doesn't seem to break anything on 8xP100 in the tmLQCD HMC.

include/quda_matrix.h Outdated Show resolved Hide resolved
lib/gauge_wilson_flow.cu Outdated Show resolved Hide resolved
@weinbe2 weinbe2 mentioned this pull request Apr 7, 2023
Copy link
Contributor

@weinbe2 weinbe2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @maddyscientist ! Pending Jenkins completing this is good to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants