Skip to content

Conversation

@broxigarchen
Copy link
Contributor

@broxigarchen broxigarchen commented Oct 29, 2025

The issue in #157795 also reproducible for gfx12.

This should only be used in true16 mode. Although gfx12 not yet has true16 mode enable, there could be downstream branch to test true16 mode manually thus enable this workaround now

@broxigarchen broxigarchen marked this pull request as ready for review October 29, 2025 16:00
@broxigarchen broxigarchen changed the title [AMDGPU] enable D16Writes32BitVgpr for gfx12 [AMDGPU][CodeGen] enable D16Writes32BitVgpr for gfx12 Oct 29, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 29, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

The issue in #157795 also producible for gfx12


Full diff: https://github.com/llvm/llvm-project/pull/165587.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+2-1)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 54d94b1f8682e..84121138a40df 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -1614,7 +1614,8 @@ def FeatureGFX12 : GCNSubtargetFeatureGeneration<"GFX12",
    FeatureDefaultComponentBroadcast, FeatureMaxHardClauseLength32,
    FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF32FlatInsts,
    FeatureIEEEMinimumMaximumInsts, FeatureMinimum3Maximum3F32,
-   FeatureMinimum3Maximum3F16, FeatureAgentScopeFineGrainedRemoteMemoryAtomics
+   FeatureMinimum3Maximum3F16, FeatureAgentScopeFineGrainedRemoteMemoryAtomics,
+   FeatureD16Writes32BitVgpr
   ]
 >;
 

Copy link
Contributor

@Sisyph Sisyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a test. For example we should see a new waitcnt on the true16 gfx12 path in some test.

@broxigarchen broxigarchen force-pushed the main-fix-gfx12-d16 branch 2 times, most recently from a4d3236 to ad88a93 Compare October 31, 2025 19:02
@broxigarchen
Copy link
Contributor Author

broxigarchen commented Oct 31, 2025

There should be a test. For example we should see a new waitcnt on the true16 gfx12 path in some test.

It seems we don't have any gfx12 runline that generates a 16bit related waitcnt. Update one test runline to generate gfx12 runline with/without the d16writevgpr32 attribute. But still it's not very straightforward to view the diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants