Skip to content

AMDGPU does not use write2 with AGPR inputs #155769

@arsenm

Description

@arsenm

#155765 adds some tests which demonstrate suboptimal ds_write2* formation. Both write2 data operands need to be the same subclass, AGPR or VGPR. They cannot be individually controlled.

Additionally, the pass does not try to constrain AV input registers to VGPR, which would also enable more cases. e.g.

define void @ds_write2_b32_av_v(ptr addrspace(3) %lds) #0 {
  %gep.0 = getelementptr inbounds [512 x float], ptr addrspace(3) %lds, i32 0, i32 10
  %gep.1 = getelementptr inbounds [512 x float], ptr addrspace(3) %lds, i32 0, i32 24
  %av0 = call i32 asm "; def $0", "=^VA"()
  %v0 = call i32 asm "; def $0", "=v"()
  store i32 %av0, ptr addrspace(3) %gep.0
  store i32 %v0, ptr addrspace(3) %gep.1
  ret void
}

We can try to constrainRegClass on %v0's register to VGPR in order to enable the fold.

To fix this we should first:
1 - try to constrain the AV classes to VGPRs with the existing instructions
2 - Define new AGPR pseudoinstruction variants of DS_WRITE2* with AGPR data operands
3 - Handle ds_write2* instructions in AMDGPURewriteAGPRCopyMFMA. This has the same problem as MFMAs

Additionally, the same problem exists for atomics with used return values. The data input and return value need to be the same. We will need another set of AGPR pseudos for those.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions