Skip to content

[codex] add a3 test3 validshape tpush/tpop sample#472

Draft
zhangstevenunity wants to merge 1 commit intomainfrom
codex/add-a3-test3-validshape-pop
Draft

[codex] add a3 test3 validshape tpush/tpop sample#472
zhangstevenunity wants to merge 1 commit intomainfrom
codex/add-a3-test3-validshape-pop

Conversation

@zhangstevenunity
Copy link
Copy Markdown
Collaborator

What changed

  • add a new sample at test/samples/TPushTPop/a3/test3/kernel.pto
  • keep the cube/vector handoff structure under TPushTPop
  • update the sample so the accumulator tile uses dynamic valid shape metadata
  • set %acc_tile to valid shape (8, 16) after pto.tmatmul
  • pop the result on the vector side as a 4x16 tile and print it

Why

This adds the exact A3 sample variant requested for the tmatmul -> set_validshape -> vec pop flow, but under a3/test3 instead of a3/test2.

Impact

  • provides a focused regression/sample case for changing acc valid shape before tpush/tpop
  • demonstrates consuming an 8x16 valid accumulator tile from the vector side in 4x16 chunks

Validation

  • local file creation and local commit completed successfully
  • I did not run ptoas validation in this environment because a runnable ptoas binary was not available in the workspace
  • direct shell push to GitHub was blocked by local network access to github.com:443, so the branch/file/PR were created through the GitHub connector instead

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new PTO kernel file to test TPush and TPop operations between cube and vector cores. However, the implementation contains several critical issues. There is a logical mismatch between the data produced in @cube_func and consumed in @vec_func, resulting in data being discarded prematurely due to incorrect tile sizes and improper use of tfree_from_aic. Additionally, multiple operations—including aic_initialize_pipe, reserve_buffer, tload, tmov, and various pipe operations—violate the PTO dialect definitions specified in PTOOps.td regarding required operands, return types, and attribute formats.

Comment on lines +35 to +41
scf.for %i = %c0 to %c4 step %c1 {
%vec_tile = pto.tpop_from_aic ins(%vec_l0c : !pto.async_buffer<core="vector", direction="in", slots=4, slot_size=1024>) { split = 0 : index } : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>
%vec_print = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>
pto.tmov ins(%vec_tile : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>) outs(%vec_print : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
pto.tprint ins(%vec_print : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
pto.tfree_from_aic ins(%vec_l0c : !pto.async_buffer<core="vector", direction="in", slots=4, slot_size=1024>)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a logical mismatch between the data produced on the cube side and consumed on the vector side. The cube side pushes 4 tiles of 8x16 (Line 24). The vector side pops 4 times, but each pop is only 4x16 (Line 36). Furthermore, tfree_from_aic is called in every iteration (Line 40), which releases the entire slot. This means half of the data in each pushed slot is discarded. To process all data in 4x16 chunks as described in the PR, the vector loop should run 8 times, and tfree should only be called after every two pops.

Comment on lines +11 to +14
pto.aic_initialize_pipe 4, 1
%left_l1 = pto.reserve_buffer "cube" [1] { slot_size = 256 } : !pto.async_buffer<core="cube", direction="in", slots=1, slot_size=256>
%right_l1 = pto.reserve_buffer "cube" [1] { slot_size = 256 } : !pto.async_buffer<core="cube", direction="in", slots=1, slot_size=256>
%acc_l0c = pto.reserve_buffer "cube" [4] { slot_size = 1024 } : !pto.async_buffer<core="cube", direction="out", slots=4, slot_size=1024>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The operations pto.aic_initialize_pipe and pto.reserve_buffer do not follow the definitions in PTOOps.td. aic_initialize_pipe is missing the attribute dictionary braces and the required operands (gm_slot_buffer, c2v_consumer_buf, v2c_consumer_buf). reserve_buffer uses an incorrect syntax and returns a !pto.async_buffer type, whereas the ODS specifies it returns an i32 address.

Comment on lines +16 to +20
%left_tile = pto.tload ins(%left[%c0] : memref<64xf32, #pto.address_space<gm>>) : !pto.tile_buf<loc=mat, dtype=f32, rows=16, cols=16, v_row=16, v_col=4, blayout=row_major, slayout=none_box, fractal=512, pad=0>
%offset = arith.muli %i, %c64 : index
%right_tile = pto.tload ins(%right[%offset] : memref<256xf32, #pto.address_space<gm>>) : !pto.tile_buf<loc=mat, dtype=f32, rows=16, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>
%left_tile_l0a = pto.tmov ins(%left_tile : !pto.tile_buf<loc=mat, dtype=f32, rows=16, cols=16, v_row=16, v_col=4, blayout=row_major, slayout=none_box, fractal=512, pad=0>) outs(!pto.tile_buf<loc=left, dtype=f32, rows=16, cols=16, v_row=16, v_col=4, blayout=row_major, slayout=row_major, fractal=512, pad=0>)
%right_tile_l0b = pto.tmov ins(%right_tile : !pto.tile_buf<loc=mat, dtype=f32, rows=16, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>) outs(!pto.tile_buf<loc=right, dtype=f32, rows=16, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=col_major, fractal=512, pad=0>)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Several operations use an incorrect syntax relative to the PTO dialect definition:

  • pto.tload (Lines 16, 18) is used in a functional style, but it is defined as a Destination-Passing Style (DPS) operation in PTOOps.td requiring an outs operand.
  • pto.tmov (Lines 19, 20) provides a type in the outs clause instead of a destination operand.

Comment on lines +24 to +40
pto.tpush_to_aiv ins(%acc_tile : !pto.tile_buf<loc=acc, dtype=f32, rows=16, cols=16, v_row=?, v_col=?, blayout=col_major, slayout=row_major, fractal=1024, pad=0>) outs(%acc_l0c : !pto.async_buffer<core="cube", direction="out", slots=4, slot_size=1024>) { split = 0 : index }
}
return
}

func.func @vec_func() {
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%c4 = arith.constant 4 : index
pto.aiv_initialize_pipe 1, 4
%vec_l0c = pto.reserve_buffer "vector" [4] { slot_size = 1024 } : !pto.async_buffer<core="vector", direction="in", slots=4, slot_size=1024>
scf.for %i = %c0 to %c4 step %c1 {
%vec_tile = pto.tpop_from_aic ins(%vec_l0c : !pto.async_buffer<core="vector", direction="in", slots=4, slot_size=1024>) { split = 0 : index } : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>
%vec_print = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>
pto.tmov ins(%vec_tile : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>) outs(%vec_print : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
pto.tprint ins(%vec_print : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
pto.tfree_from_aic ins(%vec_l0c : !pto.async_buffer<core="vector", direction="in", slots=4, slot_size=1024>)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The frontend pipe operations do not match the ODS:

  • pto.tpush_to_aiv (Line 24) and pto.tfree_from_aic (Line 40) should not have an outs or ins operand respectively.
  • pto.tpop_from_aic (Line 36) should not have an ins operand.
  • The split attribute in these operations should be an i8 integer attribute (e.g., 0 : i8), not an index.

@zhangstevenunity
Copy link
Copy Markdown
Collaborator Author

/run a3 test/samples/TPushTPop/a3/test3/kernel.pto

@reedhecre
Copy link
Copy Markdown

A3 板测失败

  • 触发方式:manual
  • 源码提交:f0607284e26e
  • 结果汇总:OK 0 / FAIL 0 / SKIP 0
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260413_161904_manual_pr472.log
  • 手动指令:/run a3 test/samples/TPushTPop/a3/test3/kernel
  • 触发人:zhangstevenunity
  • 指定用例:test/samples/TPushTPop/a3/test3/kernel
  • 触发评论:[codex] add a3 test3 validshape tpush/tpop sample #472 (comment)
  • 失败阶段:build-ptoas / exit=1

日志尾部

ransforms/PTOToEmitC.cpp: At global scope:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_161904_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:6743:20: warning: ‘std::string maskPatternTok(mlir::pto::MaskPatternAttr)’ defined but not used [-Wunused-function]
 6743 | static std::string maskPatternTok(mlir::pto::MaskPatternAttr a) {
      |                    ^~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_161904_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:4190:20: warning: ‘std::string getPipeName(mlir::pto::PIPE)’ defined but not used [-Wunused-function]
 4190 | static std::string getPipeName(pto::PIPE pipe) {
      |                    ^~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_161904_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:2601:13: warning: ‘Role inferSubviewRole(mlir::memref::SubViewOp)’ defined but not used [-Wunused-function]
 2601 | static Role inferSubviewRole(memref::SubViewOp sv) {
      |             ^~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_161904_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:2461:13: warning: ‘void inferTileMNK(mlir::func::FuncOp, int&, int&, int&)’ defined but not used [-Wunused-function]
 2461 | static void inferTileMNK(func::FuncOp f, int &M, int &N, int &K) {
      |             ^~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_161904_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:2448:19: warning: ‘KernelKind inferKernelKind(mlir::func::FuncOp)’ defined but not used [-Wunused-function]
 2448 | static KernelKind inferKernelKind(func::FuncOp f) {
      |                   ^~~~~~~~~~~~~~~
ninja: build stopped: subcommand failed.
===== END STAGE build-ptoas rc=1 @ 2026-04-13 16:19:57 =====

@zhangstevenunity
Copy link
Copy Markdown
Collaborator Author

/run a3 test/samples/TPushTPop/a3/test3/kernel

@reedhecre
Copy link
Copy Markdown

A3 板测失败

  • 触发方式:manual
  • 源码提交:f0607284e26e
  • 结果汇总:OK 0 / FAIL 0 / SKIP 0
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260413_164604_manual_pr472.log
  • 手动指令:/run a3 test/samples/TPushTPop/a3/test3/kernel
  • 触发人:zhangstevenunity
  • 指定用例:test/samples/TPushTPop/a3/test3/kernel
  • 触发评论:[codex] add a3 test3 validshape tpush/tpop sample #472 (comment)
  • 失败阶段:build-ptoas / exit=1

日志尾部

ransforms/PTOToEmitC.cpp: At global scope:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_164604_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:6743:20: warning: ‘std::string maskPatternTok(mlir::pto::MaskPatternAttr)’ defined but not used [-Wunused-function]
 6743 | static std::string maskPatternTok(mlir::pto::MaskPatternAttr a) {
      |                    ^~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_164604_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:4190:20: warning: ‘std::string getPipeName(mlir::pto::PIPE)’ defined but not used [-Wunused-function]
 4190 | static std::string getPipeName(pto::PIPE pipe) {
      |                    ^~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_164604_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:2601:13: warning: ‘Role inferSubviewRole(mlir::memref::SubViewOp)’ defined but not used [-Wunused-function]
 2601 | static Role inferSubviewRole(memref::SubViewOp sv) {
      |             ^~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_164604_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:2461:13: warning: ‘void inferTileMNK(mlir::func::FuncOp, int&, int&, int&)’ defined but not used [-Wunused-function]
 2461 | static void inferTileMNK(func::FuncOp f, int &M, int &N, int &K) {
      |             ^~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_164604_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:2448:19: warning: ‘KernelKind inferKernelKind(mlir::func::FuncOp)’ defined but not used [-Wunused-function]
 2448 | static KernelKind inferKernelKind(func::FuncOp f) {
      |                   ^~~~~~~~~~~~~~~
ninja: build stopped: subcommand failed.
===== END STAGE build-ptoas rc=1 @ 2026-04-13 16:47:00 =====

Copy link
Copy Markdown
Collaborator Author

Superseded by #475.

The new PR carries the PTO syntax rewrite that compiles with the available local ptoas and keeps the requested set_validshape(8,16) -> vector pop 4x16 behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants