Skip to content

[WIP] Fix insert-sync control-flow: avoid deadlocks and #128 crash#132

Merged
zhangstevenunity merged 4 commits into
hw-native-sys:mainfrom
TaoTao-real:codex/fix-128-insert-sync-double-buffer
Feb 27, 2026
Merged

[WIP] Fix insert-sync control-flow: avoid deadlocks and #128 crash#132
zhangstevenunity merged 4 commits into
hw-native-sys:mainfrom
TaoTao-real:codex/fix-128-insert-sync-double-buffer

Conversation

@TaoTao-real
Copy link
Copy Markdown
Contributor

@TaoTao-real TaoTao-real commented Feb 26, 2026

Fixes #128

Problem

ptoas --enable-insert-sync could crash in SyncEventIdAllocation with:

Assertion `begin < end' failed.

Root cause (#128)

The sync insertion phase could build an invalid syncOperations_[syncIndex] group where syncPair[0] is not a SET op (e.g. it becomes a pipe_barrier). Then event-id allocation treats that op as setFlag, computes a non-forward lifetime interval (begin >= end), and aborts.

This was triggered by double-buffer + dynamic-shape IR with control flow, where the old BlockSyncAnalysis branch-merge logic injected “phantom” SETs and could reuse a syncIndex belonging to a barrier-only group.

Fix (#128, AscendNPU-IR style)

Make the analysis/insertion phase maintain the allocator invariant (begin < end) instead of adding allocator-side fallbacks:

  • Port loop/branch handling to AscendNPU-IR’s InjectSync approach:
    • Use a fixed-size SyncRecordList (per multi-buffer slot) with { alreadySync: bool[], syncFinder: map }.
    • Merge branch state via intersection only; remove the “phantom set” compensation injection.
    • For loop-carried (back-edge) analysis, clone the full loop slice (compound/loop/branch/placeholder) and analyze via InsertBackForSync so lifetime intervals remain forward.
  • Keep the strict assert(begin < end) in SyncEventIdAllocation (with a debug print when violated).

Additional fix: avoid auto-sync deadlock for tmatmulk (#112 regression)

During remote NPU validation, the auto-sync variant (manual sync removed) could hang because we hoisted WAIT_EVENT out of if/else regions but did not sink the matched SET_EVENT out of the regions. This leaves conditional set_flag statements inside branches while corresponding wait_flag are unconditional; since wait_flag consumes the flag on Ascend, later iterations can wait on an event-id that is never re-set on that path.

Fix: restore AscendNPU-IR behavior in MoveSyncState to sink SET_EVENT out of if/else when the matched WAIT_EVENT is outside the branch region. This materializes a safe “merge” signal at IF_END, and prevents deadlock/hangs for tmatmulk_autosync.

Additional fix: movfp NPU validation (TMOV_FP tile layout)

Remote NPU validation was failing for movfp with a pto-isa static_assert (and could manifest as aicore exception) because the generated C++ emitted the wrong tile layouts for TMOV_FP.

Root cause: PTOToEmitC's lowering for pto.pointer_cast treated TileBufConfigAttr's enum attrs (BLayoutAttr / SLayoutAttr / PadValueAttr) as raw IntegerAttr, so it always defaulted to BLayout::RowMajor + SLayout::NoneBox even when the MLIR !pto.tile_buf type requested BLayout::ColMajor + SLayout::RowMajor.

Fix: decode those enum attrs properly and emit the requested layout/fractal/pad into the generated Tile<...> template parameters.

Tests

  • Add regression reproducer: test/samples/InsertSync/add_double_dynamic.py.
  • Add sample guards in test/samples/runop.sh for tmatmulk_autosync to catch known deadlock signatures.
  • Local: PYTHONPATH=<mlir_core>:<PTOAS/build/python> PTOAS_BIN=... bash test/samples/runop.sh all
  • Remote NPU validation (subset): tmatmulk, tmatmulk_autosync, add_double_dynamic, movfp

Misc

  • Update a few Python samples to use the current generated op helpers (e.g. pto.tgatherb, pto.mgather, pto.tmins, pto.txor/txors) so runop.sh all stays green.

@TaoTao-real TaoTao-real force-pushed the codex/fix-128-insert-sync-double-buffer branch from adc055b to 54ac891 Compare February 26, 2026 09:23
@TaoTao-real TaoTao-real changed the title Fix --enable-insert-sync crash on double-buffer dynamic IR [WIP] Fix --enable-insert-sync crash on double-buffer dynamic IR Feb 26, 2026
@TaoTao-real TaoTao-real force-pushed the codex/fix-128-insert-sync-double-buffer branch from 54ac891 to c998729 Compare February 26, 2026 12:45
@TaoTao-real TaoTao-real changed the title [WIP] Fix --enable-insert-sync crash on double-buffer dynamic IR Fix #128: avoid insert-sync reverse lifetimes (Ascend-style analysis) Feb 26, 2026
@TaoTao-real TaoTao-real changed the title Fix #128: avoid insert-sync reverse lifetimes (Ascend-style analysis) [WIP] Fix #128: avoid insert-sync reverse lifetimes (Ascend-style analysis) Feb 27, 2026
@TaoTao-real TaoTao-real changed the title [WIP] Fix #128: avoid insert-sync reverse lifetimes (Ascend-style analysis) Fix insert-sync control-flow: avoid deadlocks and #128 crash Feb 27, 2026
@TaoTao-real
Copy link
Copy Markdown
Contributor Author

Manual remote NPU validation (Ascend910, CANN 8.5.0, DEVICE_ID=2, PTO_ISA_COMMIT=082b94a3): movfp OK, tmatmulk OK, tmatmulk_autosync OK, add_double_dynamic OK.

@TaoTao-real TaoTao-real changed the title Fix insert-sync control-flow: avoid deadlocks and #128 crash [WIP] Fix insert-sync control-flow: avoid deadlocks and #128 crash Feb 27, 2026
lishengtao added 4 commits February 27, 2026 16:33
Port loop/branch sync analysis to AscendNPU-IR style to guarantee begin<end for event-id allocation. Add a reproducer sample and fix outdated Python sample op names so runop.sh all passes.
@TaoTao-real TaoTao-real force-pushed the codex/fix-128-insert-sync-double-buffer branch from 12d0bfc to 42ee599 Compare February 27, 2026 08:39
@zhangstevenunity zhangstevenunity merged commit 3ffbfa5 into hw-native-sys:main Feb 27, 2026
2 of 9 checks passed
@learning-chip
Copy link
Copy Markdown
Contributor

#141 reverted this commit so #128 is still not fixed on master😅

@learning-chip
Copy link
Copy Markdown
Contributor

OK I see #149 now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ptoas --enable-insert-sync fails on double-buffer example at SyncEventIdAllocation, Assertion 'begin < end' failed

3 participants