Skip to content

Add MI355X details#108

Merged
msaroufim merged 4 commits intomainfrom
amd-comp-3
Mar 4, 2026
Merged

Add MI355X details#108
msaroufim merged 4 commits intomainfrom
amd-comp-3

Conversation

@msaroufim
Copy link
Copy Markdown
Member

@msaroufim msaroufim commented Mar 4, 2026

Summary

  • amd-mxfp4-mm: MXFP4 matrix multiplication — 4/4 tests pass, benchmarks pass
  • amd-moe-mxfp4: Fused MoE with MXFP4 quantization — tests fail due to aiter non-determinism (see known issue below)
  • amd-mixed-mla: Mixed-precision Multi-head Latent Attention — 4/4 tests pass, 8/8 benchmarks pass
  • Problems sourced from AMD-AIM/reference-kernels branch 20260209
  • Competition YAML: amd_202602.yaml
  • Shared eval.py includes regex fix for underscore-containing test case keys and boolean value parsing

Known issue

The moe-mxfp4 reference submission calls aiter's fused_moe kernel which is non-deterministic on MI355X — it produces different results across identical calls, failing correctness checks against itself. This is documented in the task.yml.

Test plan

  • E2E tested all 3 problems on MI355X via kernelbot GitHub runner (mia1-p02-g29)
  • mxfp4-mm: 4/4 tests pass, 1/1 benchmark pass (0.015 ms)
  • mixed-mla: 4/4 tests pass, 8/8 benchmarks pass (0.58–35.7 ms)
  • moe-mxfp4: runs through pipeline but fails correctness (aiter bug, documented)

Mark Saroufim added 2 commits March 3, 2026 18:13
3 problems targeting MI355X from AMD-AIM/reference-kernels@20260209.
Also fixes eval.py regex to support underscored keys and booleans.
aiter's fused_moe kernel produces different results across calls with
identical inputs on gfx950, causing the reference submission to fail
correctness checks against itself.
@msaroufim msaroufim changed the title Add AMD February 2026 competition (MI355X) Add MI355X details Mar 4, 2026
Mark Saroufim added 2 commits March 4, 2026 07:29
- mixed-mla: Add tp (tensor parallel) parameter, variable num_heads,
  qseqlen=4 prefill cases, updated test/benchmark shapes
- moe-mxfp4: Updated benchmark shapes with TP=4/TP=8 variants,
  different batch sizes
- mxfp4-mm: Added m=32 benchmark, adjusted shape set
aiter JIT compilation on first run can take 10+ minutes on MI355X,
causing test timeouts. Bump all timeouts to 1800s (30 min).
@msaroufim msaroufim merged commit 9f717a2 into main Mar 4, 2026
msaroufim added a commit that referenced this pull request Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant