Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2291,10 +2291,13 @@ void SIInsertWaitcnts::updateEventWaitcntAfter(MachineInstr &Inst,
ScoreBrackets->updateByEvent(LDS_ACCESS, Inst);
}

// This is a flat memory operation that access both VMEM and LDS, so note it
// - it will require that both the VM and LGKM be flushed to zero if it is
// pending when a VM or LGKM dependency occurs.
if (FlatASCount > 1)
// If this is a truly flat memory operation, then it accesss both VMEM and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: accesses

// LDS, so note it - it will require that both the VM and LGKM be flushed to
// zero if it is pending when a VM or LGKM dependency occurs.
//
// For example, LDS DMA operations have FLAT set in their TSFlags for
// unspecified reasons, but they are not flat operations)
Comment on lines +2298 to +2299
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FLAT in TSFlags means they use FLAT encoding, which is true for all FLAT_* GLOBAL_* and SCRATCH_* instructions.

if (!SIInstrInfo::isLDSDMA(Inst) && FlatASCount > 1)
ScoreBrackets->setPendingFlat();
} else if (SIInstrInfo::isVMEM(Inst) &&
!llvm::AMDGPU::getMUBUFIsBufferInv(Inst.getOpcode())) {
Expand Down
3 changes: 2 additions & 1 deletion llvm/test/CodeGen/AMDGPU/lds-dma-waits.ll
Original file line number Diff line number Diff line change
Expand Up @@ -107,9 +107,10 @@ define amdgpu_kernel void @global_load_lds_dword_2_arrays(ptr addrspace(1) nocap
; GFX9-NEXT: s_lshl_b32 s1, s3, 2
; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_waitcnt vmcnt(2)
; GFX9-NEXT: ds_read_b32 v0, v0
; GFX9-NEXT: ; wave barrier
; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: ds_read_b32 v1, v1 offset:256
; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: global_store_dwordx2 v2, v[0:1], s[6:7]
Expand Down
Loading