Skip to content

Allow selected zero-valid tileop verifier paths#720

Merged
zhangstevenunity merged 1 commit into
mainfrom
codex/issue708-zero-valid-subview
May 28, 2026
Merged

Allow selected zero-valid tileop verifier paths#720
zhangstevenunity merged 1 commit into
mainfrom
codex/issue708-zero-valid-subview

Conversation

@zhangstevenunity
Copy link
Copy Markdown
Collaborator

@zhangstevenunity zhangstevenunity commented May 28, 2026

Fixes #708.

This PR relaxes verifier/type semantics for selected zero-valid tile regions:

  • permit explicit subview valid_row/valid_col constants of 0 and allow the result tilebuf to carry v_row=0 or v_col=0
  • allow selected tileop verifier paths to accept valid_shape 0 where the operation can consistently describe an empty valid region, including load/prefetch, matmul valid m/k/n bounds, rem tmp rows, histogram/mgather-style fixed valid extents, and store valid-column ranges
  • keep tgemv row-valid, row/col reduction, and rowexpand one-sided empty cases at their previous stricter verifier constraints
  • keep EmitC no-op/erase behavior out of scope

Validation:

  • cmake --build build-wsl --target ptoas -j8
  • issue708_zero_valid_tileops.pto
  • issue708_zero_valid_subview.pto
  • subview_validshape_guard.pto
  • issue708_zero_valid_tgemv_invalid.pto
  • issue708_zero_valid_row_reduction_invalid.pto
  • issue708_zero_valid_col_reduction_invalid.pto
  • issue708_zero_valid_rowexpand_invalid.pto
  • tstore_forms_emitc.pto --check-prefix=A3
  • tcvt_emitc.pto --check-prefix=A3/A5

@reedhecre
Copy link
Copy Markdown

reedhecre commented May 28, 2026

Codex Review

该评论由 review 机器人自动更新。

  • PR: Allow selected zero-valid tileop verifier paths #720 Allow selected zero-valid tileop verifier paths
  • Author: zhangstevenunity
  • Base/Head: main / codex/issue708-zero-valid-subview
  • Head SHA: 1f9dab4d1e2c
  • Trigger: PR 有新提交
  • Generated At: 2026-05-28T06:01:30Z
  • Previous Head SHA: 858a55d45b15
  • Status: completed

Summary

Verifier relaxations in this PR are too broad for mgather/mscatter, thistogram, and trem/trems: they now accept zero-valid selector/tmp shapes that can still reach runtime on non-empty ops.

Findings

  1. P1 `mgather`/`mscatter` row-mode now accepts selector tiles with no valid selector lane lib/PTO/IR/PTO.cpp:3422

The new isKnownZeroOrUnitExtent checks widen the row-mode index contract from [1, R] / [R, 1] to [0|1, R] / [R, 0|1]. That means a non-empty gather/scatter can now verify with idx.valid_shape = [0, R] or [R, 0], even though row mode still needs one valid selector row/column to provide idx[r]. These IRs lower unchanged to MGATHER/MSCATTER, so this is a real wrong-results/runtime risk rather than just a doc mismatch. If the goal was only to allow empty ops, the zero case needs to be gated on data.valid_row == 0.

  1. P1 `thistogram` zero-valid relaxation admits nonsensical non-empty histograms lib/PTO/IR/PTO.cpp:6900

The histogram changes allow idx.valid_shape[1] = 0 on the ui16 path, allow idx.valid_shape[0] = 0 on the ui32 byte=0/1/2 paths, and also allow dst.valid_shape[1] = 0 unconditionally. There is no accompanying check that the source histogram tile is empty, so previously-rejected cases like a non-empty src with an empty selector column now verify and still lower to THISTOGRAM. Since the op semantics consume idx to choose histogram buckets, accepting empty selector extents for non-empty inputs is a correctness/runtime bug.

  1. P2 `trem`/`trems` now accept `tmp.v_row = 0` even for non-empty outputs lib/PTO/IR/PTO.cpp:8047

These verifiers used to require the scratch tile to have at least one valid row; after this change they only reject negative rows. The new all-zero test case is fine, but the relaxation is broader than that: a kernel with dst.valid_row > 0 and tmp.valid_row = 0 now passes because the only remaining tmp check is on valid columns. That contradicts the existing op contract for TREM/TREMS and lets obviously under-provisioned scratch tiles reach the backend unchanged.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables support for statically empty valid regions (where valid dimensions can be zero) in the PTO IR and during lowering to EmitC. When a static valid dimension or row is zero, the lowering process now elides emitting hardware calls (such as TSTORE, TCVT, TMOV, and TPUSH), effectively treating them as no-ops. The review feedback suggests generalizing the hasStaticZeroValidDim and hasStaticZeroValidRow helper functions in PTOToEmitC.cpp to support other PTO types with valid shapes, such as TensorViewType and PartitionTensorViewType, rather than limiting them only to TileBufType.

Comment thread lib/PTO/Transforms/PTOToEmitC.cpp Outdated
Comment on lines +426 to +441
static bool hasStaticZeroValidDim(Type type) {
auto tileTy = dyn_cast<pto::TileBufType>(type);
if (!tileTy)
return false;
ArrayRef<int64_t> validShape = tileTy.getValidShape();
return llvm::any_of(validShape,
[](int64_t dim) { return dim == 0; });
}

static bool hasStaticZeroValidRow(Type type) {
auto tileTy = dyn_cast<pto::TileBufType>(type);
if (!tileTy)
return false;
ArrayRef<int64_t> validShape = tileTy.getValidShape();
return !validShape.empty() && validShape[0] == 0;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of hasStaticZeroValidDim and hasStaticZeroValidRow only supports pto::TileBufType. However, other types like pto::TensorViewType and pto::PartitionTensorViewType also have valid shapes and can be passed to operations like pto.tpush (as documented in the manual). If these types are used, the zero-valid dimension check will return false, preventing the elision of the EmitC hardware calls.

We should generalize these helpers to support all PTO types that have a valid shape.

static bool hasStaticZeroValidDim(Type type) {
  ArrayRef<int64_t> validShape;
  if (auto tileTy = dyn_cast<pto::TileBufType>(type))
    validShape = tileTy.getValidShape();
  else if (auto viewTy = dyn_cast<pto::TensorViewType>(type))
    validShape = viewTy.getValidShape();
  else if (auto partTy = dyn_cast<pto::PartitionTensorViewType>(type))
    validShape = partTy.getValidShape();
  else
    return false;
  return llvm::any_of(validShape, [](int64_t dim) { return dim == 0; });
}

static bool hasStaticZeroValidRow(Type type) {
  ArrayRef<int64_t> validShape;
  if (auto tileTy = dyn_cast<pto::TileBufType>(type))
    validShape = tileTy.getValidShape();
  else if (auto viewTy = dyn_cast<pto::TensorViewType>(type))
    validShape = viewTy.getValidShape();
  else if (auto partTy = dyn_cast<pto::PartitionTensorViewType>(type))
    validShape = partTy.getValidShape();
  else
    return false;
  return !validShape.empty() && validShape[0] == 0;
}

@zhangstevenunity zhangstevenunity force-pushed the codex/issue708-zero-valid-subview branch from a29013f to a29413d Compare May 28, 2026 03:54
@zhangstevenunity zhangstevenunity changed the title Fix zero-valid subview no-op lowering Allow zero-valid subview verifier paths May 28, 2026
@zhangstevenunity zhangstevenunity force-pushed the codex/issue708-zero-valid-subview branch from a29413d to 858a55d Compare May 28, 2026 04:54
@zhangstevenunity zhangstevenunity changed the title Allow zero-valid subview verifier paths Allow zero-valid tileop verifier paths May 28, 2026
@zhangstevenunity zhangstevenunity marked this pull request as ready for review May 28, 2026 04:59
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 858a55d45b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread lib/PTO/IR/PTO.cpp
Comment on lines 8402 to 8405
if (srcValid[0] != ShapedType::kDynamic && dstValid[0] != ShapedType::kDynamic &&
srcValid[0] != dstValid[0])
return emitOpError("expects src and dst to have the same valid_shape[0]");
if (srcValid[0] != ShapedType::kDynamic && srcValid[0] == 0)
return emitOpError("expects src valid_shape[0] to be non-zero");
if (srcValid[1] != ShapedType::kDynamic && srcValid[1] == 0)
return emitOpError("expects src valid_shape[1] to be non-zero");
if (dstValid[0] != ShapedType::kDynamic && dstValid[0] == 0)
return emitOpError("expects dst valid_shape[0] to be non-zero");
if (dstValid[1] != ShapedType::kDynamic && dstValid[1] == 0)
return emitOpError("expects dst valid_shape[1] to be non-zero");
return success();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject non-empty expands from zero-valid scalar

When the source column vector has valid_shape[1] == 0 but the destination has a positive valid column count, this verifier now succeeds because the removed non-zero checks leave only the row-count comparison. That accepts a trowexpand that tries to fill a non-empty destination from an empty scalar column, and the EmitC lowering still emits TROWEXPAND unconditionally, so the invalid case can reach codegen instead of being rejected or modeled as an empty/no-op result.

Useful? React with 👍 / 👎.

@zhangstevenunity zhangstevenunity force-pushed the codex/issue708-zero-valid-subview branch from 858a55d to 1f9dab4 Compare May 28, 2026 05:51
@zhangstevenunity zhangstevenunity changed the title Allow zero-valid tileop verifier paths Allow May 28, 2026
@zhangstevenunity zhangstevenunity changed the title Allow Allow selected zero-valid tileop verifier paths May 28, 2026
@zhangstevenunity zhangstevenunity merged commit 4f4ddea into main May 28, 2026
14 checks passed
@reedhecre
Copy link
Copy Markdown

A5 板测成功

  • 触发方式:merged
  • 源码提交:4f4ddeab580e
  • 结果汇总:OK 21 / FAIL 0 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260528_141706_merged_pr720.log
  • 结果 TSV:/root/ptoas-board-monitor-a5/logs/20260528_141706_merged_pr720.tsv

@reedhecre
Copy link
Copy Markdown

A3 板测失败

  • 触发方式:merged
  • 源码提交:4f4ddeab580e
  • 结果汇总:OK 217 / FAIL 2 / SKIP 1
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260528_141706_merged_pr720.log
  • 失败阶段:board-validation / exit=1

失败用例

  • syncall_binding (run, exit=1)
  • tprefetch_async_binding (run, exit=1)

@reedhecre
Copy link
Copy Markdown

A3 板测失败详情:PR #720

syncall_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507014 (/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260528_141706_merged_pr720/npu_validation/SyncAll/syncall_binding/main.cpp:84)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 1693560] 2026-05-28-15:01:08.714.167 (EZ9999):  The error from device(chipId:2, dieId:0), serial number is 284, there is an exception of aicore error, core id is 6, error code = 0, dump info: pc start: 0x124800000000, current: 0x124800000188, vec error info: 0, mte error info: 0xc503000030, ifu error info: 0x212c200090900, ccu error info: 0x40a01900778000d8, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100000000.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:645]
        TraceBack (most recent call last):
       The extend info: errcode:(0, 0, 0) errorStr: timeout or trap error. fixp_error0 info: 0x3000030, fixp_error1 info: 0xc5, fsmId:1, tslot:2, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:658]
       Kernel task happen error, retCode=0x25, [aicore timeout].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1729]
       AICORE Kernel task happen error, retCode=0x25.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [DFX_INFO]Aicore kernel execute failed, device_id=4, stream_id=46, report_stream_id=46, task_id=0, flip_num=0, fault kernel_name=_Z22syncall_binding_kernelPii, fault kernel info ext=_Z22syncall_binding_kernelPii, program id=0, hash=3129332313788381512.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       rtStreamSynchronize execution failed, reason=aicore timeout[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507014[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-28 15:01:09] ERROR: testcase failed (exit 1): syncall_binding
tprefetch_async_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260528_141706_merged_pr720/npu_validation/TPrefetchAsync/tprefetch_async_binding/main.cpp:91)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 1999804] 2026-05-28-15:01:47.411.913 (EZ9999):  The error from device(chipId:2, dieId:0), serial number is 285, there is an exception of aivec error, core id is 47, error code = 0, dump info: pc start: 0x124800000000, current: 0x124800000160, vec error info: 0x1e000000a8, mte error info: 0xa50313208b, ifu error info: 0x212c081200200, ccu error info: 0x52, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100000000.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:645]
        TraceBack (most recent call last):
       The extend info: errcode:(0, 0x200000000000000, 0) errorStr: The MPU address access is invalid. fixp_error0 info: 0x313208b, fixp_error1 info: 0xa5, fsmId:0, tslot:3, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:658]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1729]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [DFX_INFO]Aicore kernel execute failed, device_id=4, stream_id=46, report_stream_id=46, task_id=0, flip_num=0, fault kernel_name=_Z30tprefetch_async_binding_kernelPfPa, fault kernel info ext=_Z30tprefetch_async_binding_kernelPfPa, program id=0, hash=8435686547367685641.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-28 15:01:48] ERROR: testcase failed (exit 1): tprefetch_async_binding

Likai-19 pushed a commit to Likai-19/PTOAS that referenced this pull request May 30, 2026
…ons (hw-native-sys#708)

PyPTO's dual-AIV no-op replay (SplitMode::None on a2a3) stamps a fully-empty
valid region (v_row/v_col = 0) on the inactive lane's tiles. This makes that
"no useful output" marker legal end-to-end through PTOAS for the affected ops,
completing the PTOAS-side work for hw-native-sys#708 beyond hw-native-sys#720 (which fixed only the
explicit-operand subview path).

pto.subview inferred path (no valid operand, result type declares v=0):
- SubViewOp::verify: accept dstValid[dim]==0 when the valid operand is omitted.
- Lowering (PTOViewToMemref clampSubViewValidDim/lowerSubViewOps): derive the
  bind_tile valid extent from the result type's valid_shape instead of
  backfilling the subview size, so the 0 marker survives lowering instead of
  being silently widened back to a real write on the no-op lane.
- SubViewOp::isCompatibleReturnTypes override (PTOOps.td extraClassDeclaration):
  accept a declared empty (0) in place of the size-inferred valid extent at
  parse time; verify() backstops any operand/type contradiction.

Reduction / row-expand ops (blocker hw-native-sys#3): add a fully-empty-dst (valid==[0,0])
early-accept to verifyRowReductionValidRegion (trowmax/trowsum),
verifyColReductionValidRegion (tcolmax/tcolsum), TRowExpandOp::verify, and
verifyTRowExpandReduceLikeOp (trowexpandmax/min/expdif). Only the [0,0] marker
is accepted; one-sided/partial empties still fall through to the existing
strict checks, so hw-native-sys#720's *_invalid.pto negative tests stay green.

Scope / deferred:
- This commits to the "hardware Rv=0 is a no-op" branch of the item-3 audit
  (pto-isa#143), which is not verified here; mitigated by accepting only the
  fully-empty marker (an op that writes zero elements).
- EmitC still emits the instruction for a v=0 op (e.g. TSTORE/TROWMAX); ISA-level
  elision/no-op is out of scope, matching hw-native-sys#720.
- TRowExpandAddOp and the argmax/argmin family are left strict (not in scope).

Tests: 4 new lit fixtures (subview inferred positive + 2 negative bounds;
reductions positive). Full lit suite passes (287/287).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support odd-axis tile shapes via physical-even + odd-valid split, and define a codegen no-op contract for valid_row=0

2 participants