fix: allow addptr for pipe gm_addr lowering by FangRui0 · Pull Request #486 · hw-native-sys/PTOAS

FangRui0 · 2026-04-15T07:56:47Z

fix issue #481
问题
./build/tools/ptoas/ptoas --pto-arch=a3 --enable-insert-sync addptr.pto 失败，报 addptr 类型/使用链路相关错误。

原因
aic/aiv_initialize_pipe 会先降到 initialize_l2g2l_pipe，但 addptr 在这条 gm_addr 路径上没有被及时消解；同时原有规则对 addptr 使用场景限制过严。

解决方案
在 PTOViewToMemref 中新增对 initialize_l2g2l_pipe(gm_addr) 场景的 addptr 折叠（转为 memref.reinterpret_cast offset 语义），并移除强制失败限制，让未折叠 addptr 交由后续流程处理。这样目标命令可通过。

FangRui0 · 2026-04-15T07:57:20Z

/run a3

gemini-code-assist

Code Review

This pull request introduces a new transformation stage (Stage 1.75) in PTOViewToMemref.cpp to fold AddPtrOp chains used by InitializeL2G2LPipeOp into memref.reinterpret_cast operations, and removes the previous strict error checking for unhandled AddPtrOp instances. Feedback highlights that the current logic is too restrictive because it only processes AddPtrOp if all its users are InitializeL2G2LPipeOp, which causes failures when pointers are shared across different operations. It is recommended to refactor the logic to iterate over InitializeL2G2LPipeOp instances directly and to move the IRRewriter instantiation outside the loop for better performance.

gemini-code-assist · 2026-04-15T08:03:31Z

+      SmallVector<mlir::pto::AddPtrOp, 8> addPtrsForPipeInit;
+      func.walk([&](mlir::pto::AddPtrOp op) {
+        bool eligible = !op->use_empty();
+        for (Operation *user : op->getUsers()) {
+          auto init = dyn_cast<mlir::pto::InitializeL2G2LPipeOp>(user);
+          if (!init || init.getGmAddr() != op.getResult()) {
+            eligible = false;
+            break;
+          }
+        }
+        if (eligible)
+          addPtrsForPipeInit.push_back(op);
+      });
+
+      for (auto op : addPtrsForPipeInit) {
+        IRRewriter rewriter(ctx);
+        rewriter.setInsertionPoint(op);
+        Location loc = op.getLoc();
+
+        Value base = op.getPtr();
+        Value totalOffset = ensureIndex(rewriter, loc, op.getOffset(), op);
+        while (auto add = base.getDefiningOp<mlir::pto::AddPtrOp>()) {
+          Value off = ensureIndex(rewriter, loc, add.getOffset(), add);
+          totalOffset = rewriter.create<arith::AddIOp>(loc, totalOffset, off);
+          base = add.getPtr();
+        }
+
+        auto baseMrTy = dyn_cast<MemRefType>(base.getType());
+        if (!baseMrTy || baseMrTy.getRank() != 1)
+          continue;
+
+        int64_t dyn = ShapedType::kDynamic;
+        auto layout = StridedLayoutAttr::get(ctx, dyn, {dyn});
+        auto targetTy = MemRefType::get({dyn}, baseMrTy.getElementType(), layout,
+                                        baseMrTy.getMemorySpace());
+        SmallVector<OpFoldResult, 1> sizes{rewriter.getIndexAttr(1)};
+        SmallVector<OpFoldResult, 1> strides{rewriter.getIndexAttr(1)};
+        auto rc = rewriter.create<memref::ReinterpretCastOp>(
+            loc, targetTy, base, OpFoldResult(totalOffset), sizes, strides);
+        rc->setAttr("pto.addptr_trace", rewriter.getUnitAttr());
+        rewriter.replaceOp(op, rc.getResult());
+      }


The current implementation of Stage 1.75 is too restrictive because it only folds an AddPtrOp if all of its users are InitializeL2G2LPipeOp. This logic fails in scenarios where an addptr is shared between an initialize_l2g2l_pipe and another operation (even if that other operation is already handled, like a load_scalar or another addptr in a chain).

For example, in a chain arg -> add1 -> init1 and arg -> add1 -> add2 -> init2, add1 will not be collected because it has a user (add2) that is not an init_pipe. Consequently, init1 will continue to use add1, which will have an invalid memref input after Stage 0, leading to verification failures.

It is recommended to follow the pattern used in Stage 1 (MakeTensorView): walk the InitializeL2G2LPipeOp operations instead, and fold the addptr chain specifically for the gm_addr operand. The existing cleanup loop will then correctly erase any addptr that becomes unused.

Additionally, the IRRewriter should be instantiated once outside the loop for better efficiency.

SmallVector<mlir::pto::InitializeL2G2LPipeOp, 8> pipeInits; func.walk([&](mlir::pto::InitializeL2G2LPipeOp op) { pipeInits.push_back(op); }); IRRewriter rewriter(ctx); for (auto op : pipeInits) { Value gmAddr = op.getGmAddr(); auto addPtr = gmAddr.getDefiningOp<mlir::pto::AddPtrOp>(); if (!addPtr) continue; rewriter.setInsertionPoint(op); Location loc = op.getLoc(); Value base = addPtr.getPtr(); Value totalOffset = ensureIndex(rewriter, loc, addPtr.getOffset(), addPtr); while (auto add = base.getDefiningOp<mlir::pto::AddPtrOp>()) { Value off = ensureIndex(rewriter, loc, add.getOffset(), add); totalOffset = rewriter.create<arith::AddIOp>(loc, totalOffset, off); base = add.getPtr(); } auto baseMrTy = dyn_cast<MemRefType>(base.getType()); if (!baseMrTy || baseMrTy.getRank() != 1) continue; int64_t dyn = ShapedType::kDynamic; auto layout = StridedLayoutAttr::get(ctx, dyn, {dyn}); auto targetTy = MemRefType::get({dyn}, baseMrTy.getElementType(), layout, baseMrTy.getMemorySpace()); SmallVector<OpFoldResult, 1> sizes{rewriter.getIndexAttr(1)}; SmallVector<OpFoldResult, 1> strides{rewriter.getIndexAttr(1)}; auto rc = rewriter.create<memref::ReinterpretCastOp>( loc, targetTy, base, OpFoldResult(totalOffset), sizes, strides); rc->setAttr("pto.addptr_trace", rewriter.getUnitAttr()); op.setGmAddr(rc.getResult()); }

reedhecre · 2026-04-15T08:06:32Z

Codex Review

该评论由 review 机器人自动更新。

PR: fix: allow addptr for pipe gm_addr lowering #486 fix: allow addptr for pipe gm_addr lowering
Author: FangRui0
Base/Head: main / fix_verify
Head SHA: 73ee8e1fa1cc
Trigger: PR 有新提交
Generated At: 2026-04-16T02:20:21Z
Previous Head SHA: 6c290ac2d022
Status: completed

Summary

未检查到 PR #486 存在问题，并返回 findings=[].

Findings

No issues found.

reedhecre · 2026-04-15T08:32:26Z

A3 板测完成（有跳过）

触发方式：manual
源码提交：71a6946a12ed
结果汇总：OK 184 / FAIL 0 / SKIP 1
日志：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260415_160204_manual_pr486.log
结果 TSV：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260415_160204_manual_pr486.tsv
手动指令：/run a3
触发人：FangRui0
触发评论：fix: allow addptr for pipe gm_addr lowering #486 (comment)

reedhecre · 2026-04-16T07:13:22Z

A5 板测失败

触发方式：merged
源码提交：504dbadf5714
源码策略：merge commit 504dbadf5714
结果汇总：OK 0 / FAIL 0 / SKIP 0
日志：/root/ptoas-board-monitor-a5/logs/20260416_150906_merged_pr486.log
失败阶段：sample-build-and-test / exit=1

日志尾部

core_sync_a5_ptoisa_vec.py) OK   generated: test_intercore_sync_a5_ptoisa_vec-pto.cpp
Sync(test_intercore_sync_a5.py) OK   generated: test_intercore_sync_a5-pto.cpp
Sync(test_mem_inject_sync_basic.py) OK   generated: test_mem_inject_sync_basic-pto.cpp
Sync(test_set_wait_unified_api.py) OK   generated: test_set_wait_unified_api-pto.cpp
Sync(test_tmov_col_major_16x1_align_a5.pto) FAIL ptoas failed: test_tmov_col_major_16x1_align_a5.pto
Sync(test_tmov_col_major_16x1_align_a5.py) FAIL ptoas failed: test_tmov_col_major_16x1_align_a5-pto-ir.pto
Sync(test_tmov_row_major_1x16_control_a5.pto) FAIL ptoas failed: test_tmov_row_major_1x16_control_a5.pto
Sync(test_tmov_row_major_1x16_control_a5.py) FAIL ptoas failed: test_tmov_row_major_1x16_control_a5-pto-ir.pto
Sync(tmatmulk_autosync_a5.py) OK   generated: tmatmulk_autosync_a5-pto.cpp
TileSetGetValue(tile_getval_mat_invalid.py) XFAIL python failed as expected
TileSetGetValue(tileSetGetValue.py) OK   generated: tileSetGetValue-pto.cpp
TInsert(tinsert_fp.py) OK   generated: tinsert_fp-pto.cpp
TInsert(tinsert.py) OK   generated: tinsert-pto.cpp
TPack(tpack.py) OK   generated: tpack-pto.cpp
TPrefetch(tprefetch.py) OK   generated: tprefetch-pto.cpp
Trans(trans.py) OK   generated: trans-pto.cpp
Trap(trap.py) OK   generated: trap-pto.cpp
TTri(ttri.py) OK   generated: ttri-pto.cpp
VectorAddition(vadd_pto_ir.py) OK   generated: vadd_pto_ir-pto.cpp
VectorAddition(vadd_validshape_hyper.py) OK   generated: vadd_validshape_hyper-pto.cpp
VectorAddition(vectorAddition.py) OK   generated: vectorAddition-pto.cpp
Xors(xors.py) OK   generated: xors-pto.cpp
Xor(xor.py)  OK   generated: xor-pto.cpp
-----------------------------
OK=208  FAIL=10  SKIP=21
=============================
===== END STAGE sample-build-and-test rc=1 @ 2026-04-16 15:13:21 =====

reedhecre · 2026-04-16T07:18:04Z

A3 板测失败（Qwen 1/2）

触发方式：merged
源码提交：504dbadf5714
结果汇总：OK 16 / FAIL 1 / SKIP 1
日志：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260416_150905_merged_pr486.log
阶段：Qwen 1/2
失败阶段：board-validation-qwen / exit=1

失败用例

qwen3_decode_incore_0 (run, exit=2)

reedhecre · 2026-04-16T07:37:13Z

A3 板测失败（全量 2/2）

触发方式：merged
源码提交：504dbadf5714
结果汇总：OK 200 / FAIL 1 / SKIP 1
日志：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260416_150905_merged_pr486.log
阶段：全量 2/2
失败阶段：board-validation-full / exit=1

失败用例

qwen3_decode_incore_0 (run, exit=2)

gemini-code-assist bot reviewed Apr 15, 2026

View reviewed changes

FangRui0 force-pushed the fix_verify branch from ad7ce72 to bf1fae2 Compare April 15, 2026 08:24

FangRui0 force-pushed the fix_verify branch 3 times, most recently from 6c290ac to 45f61f3 Compare April 16, 2026 02:11

fix: allow addptr for pipe gm_addr lowering

73ee8e1

FangRui0 force-pushed the fix_verify branch from 45f61f3 to 73ee8e1 Compare April 16, 2026 02:12

zhangstevenunity merged commit 504dbad into hw-native-sys:main Apr 16, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: allow addptr for pipe gm_addr lowering#486

fix: allow addptr for pipe gm_addr lowering#486
zhangstevenunity merged 1 commit intohw-native-sys:mainfrom
FangRui0:fix_verify

FangRui0 commented Apr 15, 2026 •

edited

Loading

Uh oh!

FangRui0 commented Apr 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 15, 2026

Uh oh!

reedhecre commented Apr 15, 2026 •

edited

Loading

Uh oh!

reedhecre commented Apr 15, 2026

Uh oh!

Uh oh!

reedhecre commented Apr 16, 2026

Uh oh!

reedhecre commented Apr 16, 2026

Uh oh!

reedhecre commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

FangRui0 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FangRui0 commented Apr 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

reedhecre commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codex Review

Summary

Findings

Uh oh!

reedhecre commented Apr 15, 2026

A3 板测完成（有跳过）

Uh oh!

Uh oh!

reedhecre commented Apr 16, 2026

A5 板测失败

日志尾部

Uh oh!

reedhecre commented Apr 16, 2026

A3 板测失败（Qwen 1/2）

失败用例

Uh oh!

reedhecre commented Apr 16, 2026

A3 板测失败（全量 2/2）

失败用例

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

FangRui0 commented Apr 15, 2026 •

edited

Loading

reedhecre commented Apr 15, 2026 •

edited

Loading