Skip to content

Fix p2p comm insert sync modeling#707

Open
zhangstevenunity wants to merge 3 commits into
mainfrom
codex/issue706-p2p-comm-sync
Open

Fix p2p comm insert sync modeling#707
zhangstevenunity wants to merge 3 commits into
mainfrom
codex/issue706-p2p-comm-sync

Conversation

@zhangstevenunity
Copy link
Copy Markdown
Collaborator

@zhangstevenunity zhangstevenunity commented May 27, 2026

Summary

  • Model pto.comm.tput and pto.comm.tget for both InsertSync and GraphSyncSolver.
  • In GSS, split synchronous p2p calls into MTE2 staging and MTE3 commit RW phases so scratch writes are attributed to MTE2 and scratch reads to MTE3.
  • Skip only the internal phase pair for the same TPut/TGet op, while preserving loop-carried same-op dependencies.
  • Add issue706 lit coverage for InsertSync, GraphSyncSolver, and the GSS staging-tile reuse case that requires PIPE_V -> PIPE_MTE2.

Tests

  • cmake -G Ninja -S . -B build-wsl ... from .codex/worktrees/issue706-p2p-gss-phase
  • ninja -C build-wsl tools/ptoas/ptoas
  • build-wsl/tools/ptoas/ptoas --pto-arch=a3 --pto-level=level3 --enable-insert-sync test/lit/pto/issue706_comm_p2p_insert_sync.pto | /home/rdp/llvm-workspace/llvm-project/build-shared/bin/FileCheck test/lit/pto/issue706_comm_p2p_insert_sync.pto
  • build-wsl/tools/ptoas/ptoas --pto-arch=a3 --pto-level=level3 --enable-graph-sync-solver --graph-sync-solver-event-id-max=64 test/lit/pto/issue706_comm_p2p_insert_sync_gss.pto | /home/rdp/llvm-workspace/llvm-project/build-shared/bin/FileCheck test/lit/pto/issue706_comm_p2p_insert_sync_gss.pto
  • build-wsl/tools/ptoas/ptoas --pto-level=level3 --pto-arch=a5 --enable-graph-sync-solver --graph-sync-solver-event-id-max=64 test/lit/pto/issue664_mscatter_pipe_selection_gss.pto | /home/rdp/llvm-workspace/llvm-project/build-shared/bin/FileCheck test/lit/pto/issue664_mscatter_pipe_selection_gss.pto
  • build-wsl/tools/ptoas/ptoas --pto-level=level3 --pto-arch a5 --enable-graph-sync-solver --graph-sync-solver-event-id-max=64 test/lit/pto/insert_sync_level3_enable_gss.pto | /home/rdp/llvm-workspace/llvm-project/build-shared/bin/FileCheck test/lit/pto/insert_sync_level3_enable_gss.pto
  • build-wsl/tools/ptoas/ptoas --pto-level=level3 --pto-arch a5 --enable-graph-sync-solver --graph-sync-solver-event-id-max=8 --emit-pto-ir test/lit/pto/graph_sync_solver_basic.pto | /home/rdp/llvm-workspace/llvm-project/build-shared/bin/FileCheck test/lit/pto/graph_sync_solver_basic.pto
  • build-wsl/tools/ptoas/ptoas --pto-arch=a3 test/lit/pto/comm_p2p_emitc.pto 2>&1 | /home/rdp/llvm-workspace/llvm-project/build-shared/bin/FileCheck test/lit/pto/comm_p2p_emitc.pto --check-prefix=A3

@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@reedhecre
Copy link
Copy Markdown

reedhecre commented May 27, 2026

Codex Review

该评论由 review 机器人自动更新。

  • PR: Fix p2p comm insert sync modeling #707 Fix p2p comm insert sync modeling
  • Author: zhangstevenunity
  • Base/Head: main / codex/issue706-p2p-comm-sync
  • Head SHA: a4afa97e07c9
  • Trigger: PR 有新提交
  • Generated At: 2026-05-27T09:13:34Z
  • Previous Head SHA: da3694618847
  • Status: completed

Summary

GraphSyncSolver 版修复仍会漏掉循环里的 p2p stage 复用同步,存在运行时 correctness 风险。

Findings

  1. P1 GSS 会把同一 p2p op 的跨迭代 phase 冲突也一起跳过 lib/PTO/Transforms/GraphSyncSolver/SyncSolver.cpp:2325

这里按 rwOp1->op == rwOp2->op 无条件跳过 pto::TPutOp/pto::TGetOp 的双 phase 配对。GraphSyncSolver 的循环建模会把同一个 loop body 复制成两组 occurrence 来发现 loop-carried hazard,所以同一个 pto.comm.tput/tgetiter0 的 MTE3 phase 和 iter1 的 MTE2 phase 仍然共享同一个 MLIR Operation*,也会被这个条件直接过滤掉。结果是循环里复用同一个 ping/pong stage buffer 的 p2p 通信仍然可能缺少必须的 MTE3 -> MTE2 同步,前一轮还在消费 stage 时下一轮就能覆盖它,属于真实的运行时错误。当前新增用例都是直线代码,CI 覆盖不到这个场景。

@zhangstevenunity zhangstevenunity marked this pull request as ready for review May 27, 2026 05:25
@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants