Missing MTE2→V synchronization when loop body has zero iterations

## Summary

`--enable-insert-sync` fails to insert a synchronization barrier between `TLOAD` (MTE2 pipe) operations before a loop and `TROWEXPANDDIV` (V pipe) after the loop, when the loop executes zero iterations at runtime.

## Version

ptoas 0.26 (also likely affects v0.25)

## Reproduction

**Input `.pto` pattern** (simplified from a real qwen3 attention kernel):

```mlir
// Before loop: 3 TLOADs from GM to UB (MTE2 pipe)
TLOAD(oi_tile, gm_oi)       // MTE2: load to UB[6304]
TLOAD(mi_tile, gm_mi)       // MTE2: load to UB[10400]
TLOAD(li_tile, gm_li)       // MTE2: load to UB[10432]

// Loop: iterates from 1 to ctx_blocks (dynamic scalar)
for (v41 = 1; v41 < ctx_blocks; v41++) {
    // Loop body has proper MTE2<->V synchronization via wait_flag/set_flag
    ...
}

// After loop: V pipe reads from UB
TROWEXPANDDIV(result, oi_tile, li_tile)   // V pipe: reads UB[6304] and UB[10432]
```

**Compile command:**
```bash
ptoas input.pto -o output.cpp --enable-insert-sync --pto-level=level3
```

**Generated C++ (relevant section):**
```cpp
// Lines 74-75: V pipe grants MTE2 permission to proceed
set_flag(PIPE_V, PIPE_MTE2, EVENT_ID0);
set_flag(PIPE_V, PIPE_MTE2, EVENT_ID1);

// Lines 76-88: MTE2 loads (async, no completion wait)
TLOAD(v29, v32);   // oi → UB[6304]
TLOAD(v33, v36);   // mi → UB[10400]
TLOAD(v37, v40);   // li → UB[10432]

// Line 89: Loop — when ctx_blocks=1, range is [1,1) → ZERO iterations
for (size_t v41 = 1; v41 < ctx_blocks; v41++) {
    // Contains wait_flag(PIPE_V, PIPE_MTE2, EVENT_ID0) at line 96
    // This implicitly syncs the initial TLOADs — but ONLY if the loop executes
    ...
}

// Line 204: Only syncs V pipe, NOT MTE2!
pipe_barrier(PIPE_V);

// Line 205: V pipe reads from UB — TLOADs may not have completed!
TROWEXPANDDIV(v88, v29, v37);
```

## Root Cause

When the loop executes ≥1 iteration, the `wait_flag(PIPE_V, PIPE_MTE2, ...)` inside the loop body implicitly ensures the initial TLOADs have completed before any V pipe operation reads the UB data.

When the loop executes **zero iterations**, this synchronization is skipped entirely. The `pipe_barrier(PIPE_V)` at line 204 only synchronizes the V pipe with itself — it does **not** wait for MTE2 to complete. The subsequent `TROWEXPANDDIV` reads stale/uninitialized UB data.

## Impact

- **Deterministic wrong results** (not intermittent) when the loop trip count is 0
- Output values are wildly incorrect (e.g., ±398 vs expected ±0.1), with occasional NaN
- Affects any kernel where TLOADs feed into post-loop V-pipe operations via a potentially-zero-trip loop

## Workaround

Manually changing line 204 from `pipe_barrier(PIPE_V)` to `pipe_barrier(PIPE_ALL)` in the generated C++ fixes the issue. All tests pass after this change.

## Expected Fix

The `--enable-insert-sync` pass should insert an MTE2→V barrier (e.g., `set_flag(PIPE_MTE2, PIPE_V, ...); wait_flag(PIPE_MTE2, PIPE_V, ...)` or `pipe_barrier(PIPE_ALL)`) after the loop, when:
1. There are pending MTE2 operations (TLOADs) before the loop
2. The loop trip count is dynamic (may be zero)
3. V pipe operations after the loop read from UB addresses written by those TLOADs

## Context

Discovered while investigating qwen3 decode attention kernel failures in hw-native-sys/pypto#1098. The kernel implements online-softmax with a dynamic number of context blocks. When `seq_len ≤ SEQ_TILE` (context blocks = 1), the accumulation loop has zero iterations, triggering this bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing MTE2→V synchronization when loop body has zero iterations #533

Summary

Version

Reproduction

Root Cause

Impact

Workaround

Expected Fix

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Missing MTE2→V synchronization when loop body has zero iterations #533

Description

Summary

Version

Reproduction

Root Cause

Impact

Workaround

Expected Fix

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions