Skip to content

Port frontend tile fusion to EmitC mainline#679

Open
Zhendong404 wants to merge 1 commit into
hw-native-sys:mainfrom
Zhendong404:feature-tile-fusion-frontend
Open

Port frontend tile fusion to EmitC mainline#679
Zhendong404 wants to merge 1 commit into
hw-native-sys:mainfrom
Zhendong404:feature-tile-fusion-frontend

Conversation

@Zhendong404
Copy link
Copy Markdown
Contributor

@Zhendong404 Zhendong404 commented May 16, 2026

Summary

This PR ports the frontend-only tile fusion path onto the current EmitC mainline and intentionally leaves VPTO-only backend lifecycle optimizations out of scope.

Included in this PR:

  • restore pto.fusion_region / pto.yield IR surface needed by the frontend fusion flow
  • port PreFusionAnalysis, FusionPlan, OpScheduling, and PTOFusionRegionGen
  • preserve fusion regions through the shared mainline until an explicit pre-EmitC flatten stage
  • add PTOMarkLastUse and emit final C++ as [[pto::last_use(...)]] CALLEE(...)
  • add/regenerate focused tile-fusion, last-use, and non-fused control coverage

Explicitly not included:

  • PTOLowLevelLoopFusion
  • PTOFusionPredicateElision
  • PTOFusionLoadStoreElision
  • any VPTO backend-dependent post-fusion lifecycle cleanup
  • OpenSpec artifacts

Validation

  • cmake --build build --target ptoas -j4
  • /home/zhangzhendong/ptoas-workspace/llvm-project/build-shared/bin/llvm-lit -sv build/test/lit/tile_fusion
  • /home/zhangzhendong/ptoas-workspace/llvm-project/build-shared/bin/llvm-lit -sv build/test/lit/tile_fusion/mark_last_use_slot_mask_level2.pto build/test/lit/tile_fusion/mark_last_use_repeated_ssa_level2.pto

@Zhendong404 Zhendong404 marked this pull request as ready for review May 16, 2026 06:22
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request ports frontend tile fusion capabilities to the EmitC mainline, introducing pto.fusion_region and pto.yield operations along with passes for analysis, planning, scheduling, and region formation. The implementation ensures fusion occurs on tile-native PTO IR and is preserved through the shared mainline passes until a final flattening stage. Review feedback correctly identifies non-deterministic logic in the liveness analysis where lastLocalConsumer is assigned without considering block order, as Value::getUses() returns uses in an arbitrary sequence. An improvement to string reservation in the C++ post-processing logic was also suggested to optimize performance by reducing reallocations.

if (nodeIt == computeNodeByOp.end())
continue;
appendUniqueNode(state.live.consumerNodes, nodeIt->second);
state.live.lastLocalConsumer = nodeIt->second;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The assignment of lastLocalConsumer here is non-deterministic because Value::getUses() returns operands in an arbitrary order. Since node.id is assigned in block order, you should only update lastLocalConsumer if the current node.id is greater than the previously recorded one.

        unsigned consumerId = nodeIt->second;
        appendUniqueNode(state.live.consumerNodes, consumerId);
        if (!state.live.lastLocalConsumer || consumerId > *state.live.lastLocalConsumer)
          state.live.lastLocalConsumer = consumerId;

if (nodeIt == computeNodeByOp.end())
continue;
appendUniqueNode(writeLive.consumerNodes, nodeIt->second);
writeLive.lastLocalConsumer = nodeIt->second;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the liveness finalization above, lastLocalConsumer for write instances should be updated using a maximum check to ensure it correctly identifies the last consumer in block order, regardless of the iteration order of getUses().

        unsigned consumerId = nodeIt->second;
        appendUniqueNode(writeLive.consumerNodes, consumerId);
        if (!writeLive.lastLocalConsumer || consumerId > *writeLive.lastLocalConsumer)
          writeLive.lastLocalConsumer = consumerId;

Comment thread lib/PTO/Transforms/CppPostprocess.cpp Outdated
}

std::string replacement;
replacement.reserve(callee.size() + lastUseArgs.size() + 32);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To minimize reallocations when constructing the replacement string, consider including the size of the original arguments string (argsRef) in the initial reservation.

Suggested change
replacement.reserve(callee.size() + lastUseArgs.size() + 32);
replacement.reserve(callee.size() + lastUseArgs.size() + argsRef.size() + 32);

@Zhendong404 Zhendong404 force-pushed the feature-tile-fusion-frontend branch 2 times, most recently from 806764b to 59bf8fb Compare May 16, 2026 06:30
@reedhecre
Copy link
Copy Markdown

reedhecre commented May 16, 2026

Codex Review

该评论由 review 机器人自动更新。

  • PR: Port frontend tile fusion to EmitC mainline #679 Port frontend tile fusion to EmitC mainline
  • Author: Zhendong404
  • Base/Head: main / feature-tile-fusion-frontend
  • Head SHA: 15ec7fc3a355
  • Trigger: PR 有新提交
  • Generated At: 2026-05-16T15:24:43Z
  • Previous Head SHA: 59bf8fb61c40
  • Status: completed

Summary

PR #679 has a correctness issue in PTOViewToMemref: lowering treshape/bitcast can rewrite a pto.fusion_region result to the region-internal yielded value, producing invalid SSA/dominance for legal post-region users.

Findings

  1. P1 View-like lowering leaks fusion-region internals across the region boundary lib/PTO/Transforms/PTOViewToMemref.cpp:169

buildTileBufViewLikeValue() now calls resolveTileBufViewLikeSource() before reconcileFusionRegionResultTypes() has rewritten pto.fusion_region results to memrefs. When the source is a pto.fusion_region result, resolveTileBufViewLikeSource() walks through pto.yield and returns the yielded SSA value from inside the region. The subsequent outer pto.bind_tile/view-like rewrite is then built with a value defined inside the region body, which violates dominance once the use stays outside the region. Any legal pattern where a fused region result feeds a later pto.treshape/pto.bitcast will therefore produce invalid IR or fail compilation. Metadata backtracking through pto.yield is fine, but the actual operand used outside the region must remain the region result.

@Zhendong404 Zhendong404 force-pushed the feature-tile-fusion-frontend branch from 59bf8fb to 15ec7fc Compare May 16, 2026 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants