fix: allow addptr for pipe gm_addr lowering#486
fix: allow addptr for pipe gm_addr lowering#486zhangstevenunity merged 1 commit intohw-native-sys:mainfrom
Conversation
|
/run a3 |
There was a problem hiding this comment.
Code Review
This pull request introduces a new transformation stage (Stage 1.75) in PTOViewToMemref.cpp to fold AddPtrOp chains used by InitializeL2G2LPipeOp into memref.reinterpret_cast operations, and removes the previous strict error checking for unhandled AddPtrOp instances. Feedback highlights that the current logic is too restrictive because it only processes AddPtrOp if all its users are InitializeL2G2LPipeOp, which causes failures when pointers are shared across different operations. It is recommended to refactor the logic to iterate over InitializeL2G2LPipeOp instances directly and to move the IRRewriter instantiation outside the loop for better performance.
| SmallVector<mlir::pto::AddPtrOp, 8> addPtrsForPipeInit; | ||
| func.walk([&](mlir::pto::AddPtrOp op) { | ||
| bool eligible = !op->use_empty(); | ||
| for (Operation *user : op->getUsers()) { | ||
| auto init = dyn_cast<mlir::pto::InitializeL2G2LPipeOp>(user); | ||
| if (!init || init.getGmAddr() != op.getResult()) { | ||
| eligible = false; | ||
| break; | ||
| } | ||
| } | ||
| if (eligible) | ||
| addPtrsForPipeInit.push_back(op); | ||
| }); | ||
|
|
||
| for (auto op : addPtrsForPipeInit) { | ||
| IRRewriter rewriter(ctx); | ||
| rewriter.setInsertionPoint(op); | ||
| Location loc = op.getLoc(); | ||
|
|
||
| Value base = op.getPtr(); | ||
| Value totalOffset = ensureIndex(rewriter, loc, op.getOffset(), op); | ||
| while (auto add = base.getDefiningOp<mlir::pto::AddPtrOp>()) { | ||
| Value off = ensureIndex(rewriter, loc, add.getOffset(), add); | ||
| totalOffset = rewriter.create<arith::AddIOp>(loc, totalOffset, off); | ||
| base = add.getPtr(); | ||
| } | ||
|
|
||
| auto baseMrTy = dyn_cast<MemRefType>(base.getType()); | ||
| if (!baseMrTy || baseMrTy.getRank() != 1) | ||
| continue; | ||
|
|
||
| int64_t dyn = ShapedType::kDynamic; | ||
| auto layout = StridedLayoutAttr::get(ctx, dyn, {dyn}); | ||
| auto targetTy = MemRefType::get({dyn}, baseMrTy.getElementType(), layout, | ||
| baseMrTy.getMemorySpace()); | ||
| SmallVector<OpFoldResult, 1> sizes{rewriter.getIndexAttr(1)}; | ||
| SmallVector<OpFoldResult, 1> strides{rewriter.getIndexAttr(1)}; | ||
| auto rc = rewriter.create<memref::ReinterpretCastOp>( | ||
| loc, targetTy, base, OpFoldResult(totalOffset), sizes, strides); | ||
| rc->setAttr("pto.addptr_trace", rewriter.getUnitAttr()); | ||
| rewriter.replaceOp(op, rc.getResult()); | ||
| } |
There was a problem hiding this comment.
The current implementation of Stage 1.75 is too restrictive because it only folds an AddPtrOp if all of its users are InitializeL2G2LPipeOp. This logic fails in scenarios where an addptr is shared between an initialize_l2g2l_pipe and another operation (even if that other operation is already handled, like a load_scalar or another addptr in a chain).
For example, in a chain arg -> add1 -> init1 and arg -> add1 -> add2 -> init2, add1 will not be collected because it has a user (add2) that is not an init_pipe. Consequently, init1 will continue to use add1, which will have an invalid memref input after Stage 0, leading to verification failures.
It is recommended to follow the pattern used in Stage 1 (MakeTensorView): walk the InitializeL2G2LPipeOp operations instead, and fold the addptr chain specifically for the gm_addr operand. The existing cleanup loop will then correctly erase any addptr that becomes unused.
Additionally, the IRRewriter should be instantiated once outside the loop for better efficiency.
SmallVector<mlir::pto::InitializeL2G2LPipeOp, 8> pipeInits;
func.walk([&](mlir::pto::InitializeL2G2LPipeOp op) { pipeInits.push_back(op); });
IRRewriter rewriter(ctx);
for (auto op : pipeInits) {
Value gmAddr = op.getGmAddr();
auto addPtr = gmAddr.getDefiningOp<mlir::pto::AddPtrOp>();
if (!addPtr)
continue;
rewriter.setInsertionPoint(op);
Location loc = op.getLoc();
Value base = addPtr.getPtr();
Value totalOffset = ensureIndex(rewriter, loc, addPtr.getOffset(), addPtr);
while (auto add = base.getDefiningOp<mlir::pto::AddPtrOp>()) {
Value off = ensureIndex(rewriter, loc, add.getOffset(), add);
totalOffset = rewriter.create<arith::AddIOp>(loc, totalOffset, off);
base = add.getPtr();
}
auto baseMrTy = dyn_cast<MemRefType>(base.getType());
if (!baseMrTy || baseMrTy.getRank() != 1)
continue;
int64_t dyn = ShapedType::kDynamic;
auto layout = StridedLayoutAttr::get(ctx, dyn, {dyn});
auto targetTy = MemRefType::get({dyn}, baseMrTy.getElementType(), layout,
baseMrTy.getMemorySpace());
SmallVector<OpFoldResult, 1> sizes{rewriter.getIndexAttr(1)};
SmallVector<OpFoldResult, 1> strides{rewriter.getIndexAttr(1)};
auto rc = rewriter.create<memref::ReinterpretCastOp>(
loc, targetTy, base, OpFoldResult(totalOffset), sizes, strides);
rc->setAttr("pto.addptr_trace", rewriter.getUnitAttr());
op.setGmAddr(rc.getResult());
}
Codex Review该评论由 review 机器人自动更新。
Summary未检查到 PR #486 存在问题,并返回 findings=[]. FindingsNo issues found. |
A3 板测完成(有跳过)
|
6c290ac to
45f61f3
Compare
A5 板测失败
日志尾部 |
A3 板测失败(Qwen 1/2)
失败用例
|
A3 板测失败(全量 2/2)
失败用例
|
fix issue #481
问题
./build/tools/ptoas/ptoas --pto-arch=a3 --enable-insert-sync addptr.pto 失败,报 addptr 类型/使用链路相关错误。
原因
aic/aiv_initialize_pipe 会先降到 initialize_l2g2l_pipe,但 addptr 在这条 gm_addr 路径上没有被及时消解;同时原有规则对 addptr 使用场景限制过严。
解决方案
在 PTOViewToMemref 中新增对 initialize_l2g2l_pipe(gm_addr) 场景的 addptr 折叠(转为 memref.reinterpret_cast offset 语义),并移除强制失败限制,让未折叠 addptr 交由后续流程处理。这样目标命令可通过。