feat(transforms): add intra-core GraphSyncSolver pass for PTO#613
feat(transforms): add intra-core GraphSyncSolver pass for PTO#613zhangstevenunity wants to merge 2 commits intomainfrom
Conversation
Use the repository PR386 eight-line CANN license header on all GraphSyncSolver sources so CI license checks pass. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Code Review
This pull request adds a graph-based intra-core synchronization solver to the PTO dialect, featuring Dijkstra-based reachability analysis and Welsh-Powell event ID coloring. The implementation includes a hierarchical IR translator and a dedicated code generator for sync operations. Review feedback highlights performance concerns in the conflict detection logic, specifically regarding redundant graph reconstructions and memory allocations in hot paths. It also recommends adding diagnostics for unsupported multi-block control flow to avoid silent analysis failures.
| handleSetWaitConflict(o1, o2, src, dst); | ||
| } | ||
|
|
||
| bool Solver::checkGraphConflict(Occurrence *o1, Occurrence *o2, |
There was a problem hiding this comment.
The checkGraphConflict function is called for every memory conflict between every pair of operations (potentially GraphSolver and populating it from scratch by walking all ancestor scopes on every call can become a significant performance bottleneck for large functions. Consider reusing a single GraphSolver instance and incrementally updating it, or at least avoiding the redundant getAllParents() allocations to improve performance.
| return; | ||
| gs.addConflictPair(cp); | ||
| }; | ||
| for (auto *p : o1->getAllParents()) { |
There was a problem hiding this comment.
The call to o1->getAllParents() allocates and returns a SmallVector by value. Since this is inside a hot loop in checkGraphConflict, it is more efficient to walk up the parent pointers directly to avoid unnecessary memory allocations.
| for (auto *p : o1->getAllParents()) { | |
| for (Occurrence *p = o1->parentOcc; p; p = p->parentOcc) { |
| for (auto *cp : it->second) | ||
| consider(cp); | ||
| } | ||
| for (auto *p : o2->getAllParents()) { |
| if (!isFunctionRegion && region.getBlocks().size() > 1) { | ||
| // Conservative: arbitrary CFG inside non-function regions is not | ||
| // supported in this minimal port. Bail out by treating it as empty. | ||
| return scope; | ||
| } |
There was a problem hiding this comment.
Bailing out and treating regions with multiple blocks as empty is a dangerous silent failure. While this is a minimal port, it should at least emit a warning or diagnostic to inform the user that synchronization analysis is being skipped for this part of the code, which could lead to incorrect execution due to missing sync operations.
Codex Review该评论由 review 机器人自动更新。
Summary发现 4 个高优先级问题:新 GraphSyncSolver 已知会触发 Findings
PR 自己在这里注明:
这里对 same-pipe conflict 无条件发出
CLI 把
新的冲突枚举只检查了读写、写读、写写三类别名对,没有把现有 InsertSync 里那条“ACC 上跨 pipe 的 read/read 也必须同步”的规则移植过来。主线 |
- Add *_gss.pto companions plus README for GraphSyncSolver coverage. - Align issue454 nested-loop InsertSync checks with PIPE_FIX staging. - Relax issue564 tail checks (barrier/TADD/TSTORE vs TPUSH) for level3 codegen. - issue428 GSS companion anchors on TMATMUL and DAV_CUBE tail (no tail helper).
Use the repository PR386 eight-line CANN license header on all GraphSyncSolver sources so CI license checks pass.