Skip to content

Conversation

@ssahasra
Copy link
Collaborator

@ssahasra ssahasra commented Dec 2, 2025

Flat instructions need a waitcnt(0) on both VMEM and LDS accesses, but only when the instruction really is using flat addressing. The LDS DMA instructions (on GFX9) have the FLAT flag set, but they have very clear semantics. These instructions update only VM_CNT (on GFX9), and hence do not need to be treated like actual flat instructions.

Flat instructions need a waitcnt(0) on both VMEM and LDS accesses, but only when
the instruction really is using flat addressing. The LDS DMA instructions (on
GFX9) have the FLAT flag set, but they have very clear semantics. These
instructions update only VM_CNT (on GFX9), and hence do not need to be treated
like actual flat instructions.
@ssahasra ssahasra merged commit cb8ce28 into llvm:main Dec 4, 2025
11 of 12 checks passed
@ssahasra ssahasra deleted the ssahasra/ldsdma-noflat branch December 4, 2025 11:53
// - it will require that both the VM and LGKM be flushed to zero if it is
// pending when a VM or LGKM dependency occurs.
if (FlatASCount > 1)
// If this is a truly flat memory operation, then it accesss both VMEM and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: accesses

Comment on lines +2298 to +2299
// For example, LDS DMA operations have FLAT set in their TSFlags for
// unspecified reasons, but they are not flat operations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FLAT in TSFlags means they use FLAT encoding, which is true for all FLAT_* GLOBAL_* and SCRATCH_* instructions.

kcloudy0717 pushed a commit to kcloudy0717/llvm-project that referenced this pull request Dec 4, 2025
…m#170263)

Flat instructions need a waitcnt(0) on both VMEM and LDS accesses, but
only when the instruction really is using flat addressing. The LDS DMA
instructions (on GFX9) have the FLAT flag set, but they have very clear
semantics. These instructions update only VM_CNT (on GFX9), and hence do
not need to be treated like actual flat instructions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants