Skip to content

[Core] Add SpaceToDepth fusion pattern#27747

Merged
hariharans29 merged 13 commits intomainfrom
hari/space_to_depth_fusion
Mar 22, 2026
Merged

[Core] Add SpaceToDepth fusion pattern#27747
hariharans29 merged 13 commits intomainfrom
hari/space_to_depth_fusion

Conversation

@hariharans29
Copy link
Member

@hariharans29 hariharans29 commented Mar 18, 2026

Description

Fuses a pattern of 4 Slices + 1 Concat into 1 SpaceToDepth op (+1 Gather if the order doesn't match the expected default pattern). Saves about ~1ms for Yolox_tiny which is a sub-15ms model on an AVX512 machine. So, it is good savings.

Motivation and Context

Improves performance especially when the SpaceToDepth operation occurs in a cheap real-time model

Original SpaceToDepth operation in the model (4 slices + 1 Concat is a lot of memory transactions that can be replaced by one optimized kernel):

image

@hariharans29 hariharans29 requested a review from Copilot March 18, 2026 20:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Level1 optimizer fusion that recognizes a 4x Slice (stride-2, phase offsets) + Concat(axis=1) downsample pattern and replaces it with SpaceToDepth (plus optional Gather to preserve non-canonical block ordering).

Changes:

  • Introduces SliceConcatToSpaceToDepthFusion graph transformer and integrates it into Level1 transformer generation.
  • Implements fusion logic to replace Slice/Concat subgraphs with SpaceToDepth, optionally followed by Gather for channel reordering.
  • Adds optimizer unit tests covering canonical, Constant-node-input, and permuted block order cases.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File Description
onnxruntime/test/optimizer/graph_transform_test.cc Adds tests validating the new fusion behavior (including Constant-based Slice inputs and permuted block order).
onnxruntime/core/optimizer/slice_concat_to_space_to_depth_fusion.h Declares the new SliceConcatToSpaceToDepthFusion transformer.
onnxruntime/core/optimizer/slice_concat_to_space_to_depth_fusion.cc Implements the matching and rewrite logic for Slice+Concat → SpaceToDepth (+ optional Gather).
onnxruntime/core/optimizer/graph_transformer_utils.cc Registers the new transformer in the Level1 transformer pipeline.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new Level1 graph optimization that recognizes a specific Slice×4 + Concat downsampling pattern and fuses it into a SpaceToDepth (and optionally a Gather to preserve non-canonical slice ordering), improving runtime efficiency for eligible models.

Changes:

  • Added SliceConcatToSpaceToDepthFusion transformer implementation and integrated it into Level1 transformer generation.
  • Added optimizer tests covering canonical fusion, constant-input variants, permuted block order (requires Gather), and negative (non-triggering) scenarios.
  • Updated minimal-build optimizer source allowlist to include the new transformer.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
onnxruntime/test/optimizer/graph_transform_test.cc Adds test coverage for the new Slice+Concat → SpaceToDepth fusion (and non-trigger cases).
onnxruntime/core/optimizer/slice_concat_to_space_to_depth_fusion.h Declares the new graph transformer.
onnxruntime/core/optimizer/slice_concat_to_space_to_depth_fusion.cc Implements pattern matching and rewrite to SpaceToDepth (+ optional Gather).
onnxruntime/core/optimizer/graph_transformer_utils.cc Registers the transformer in Level1 pipeline.
cmake/onnxruntime_optimizer.cmake Adds new source files to minimal-build allowlist.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR introduces a highly valuable graph optimization connecting a common pattern generated by YOLO's Focus layers (Concat of 4 phase-shifted strided Slice operations) into ONNX's SpaceToDepth operator. The transformation correctly mitigates the heavy memory bus constraints imposed by piecewise slicing and concatenation, yielding performance wins for YOLO variants.

The mathematical mapping leveraged here relies on an elegant quirk of ONNXRuntime's SpaceToDepth native output layout p * C + c, perfectly aligning with the channel dimension interleaved sequence of Concat([slices...], axis=1). The algorithm is robustly designed, securely verifying phase overlaps without hardcoding static dimension constraints where unnecessary.

There are just a few isolated concerns to address, primarily involving the optimizer's execution flow within minimal builds and a few overly restrictive structural matching rules.

Findings by Component

Subgraph Matching & Analysis (slice_concat_to_space_to_depth_fusion.cc)

Positive: The way GetSliceInfo integrates axes tracking correctly populates missing slice domains with .starts zero-fills and .ends INT_MAX fills is immaculate. Likewise, TryGetPhasePermutation perfectly computes the re-indexing necessary to handle alternate Focus phase orderings cleanly via an auxiliary Gather.

  • ⚠️ Concern - Missed Match on Overshoot Ends: In IsFullExtentEnd (Line 183), the check end == dim_value risks rejecting functionally equivalent full-extent slices. When Slice operations specify overshoot values (e.g., ends = [1000] for a dimension of size 8), the ONNX specification natively clamps this, meaning it captures the full extent.
    • Fix Suggestion: Relax the check to allow end >= dim_value given that Slice clamps at boundaries.
      return TryGetStaticInputDim(input, axis, dim_value) && end >= dim_value;
  • ⚠️ Concern - Missing Node Edge Generation for Gather: As identified in another review, in non-canonical cases the created Gather node is disjointed from the SpaceToDepth node within the internal Graph dependency mappings. While Replacement_end ties the IO args properly, the Node edge is missing.
    • Fix Suggestion: Add explicit topological edging right after binding the execution provider.
      gather.SetExecutionProviderType(provider_type);
      graph.AddEdge(space_to_depth.Index(), gather.Index(), 0, 0); // Connect Data Edge
      replacement_end = &gather;

Graph Pass Initialization (GraphTransformer::ApplyImpl)

  • ⚠️ Concern - Caching Stale Operator Topologies: Within SliceConcatToSpaceToDepthFusion::ApplyImpl (Line 549), graph_viewer.GetNodesInTopologicalOrder() evaluates the graph nodes exactly once. Because FuseSliceConcatToSpaceToDepth drastically alters graph boundaries (mutating upstream links and deleting multiple nodes), relying on the static node_topology_list for subsequent nodes may reference invalidated pointer offsets downstream (especially in consecutive pattern instances).
    • Fix Suggestion: Re-initialize the outer loop execution over GraphViewer whenever modified is marked true, similarly to how other level 1 graph rewrites handle cascading topology updates.

Code Style & Hygiene

Positive: Naming schemes, constant declarations, inline vectors allocation tracking, and helper extractions all align beautifully with the surrounding codebase in optimizer. Utilizing MoveAllNodeInputEdges and carefully wiping the auxiliary constant edges for the 1st slice node was a brilliant strategy to bypass native ONNX initializer mismatches.

--
This review focuses selectively to cover unique structural and boundary concerns. Outstanding points involving minimal build lookups and Float16 constraints raised by Copilot in earlier reviews remain equally valid and advised.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…on.cc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@hariharans29 hariharans29 requested a review from tianleiwu March 21, 2026 05:07
@hariharans29 hariharans29 merged commit 3fa4205 into main Mar 22, 2026
91 checks passed
@hariharans29 hariharans29 deleted the hari/space_to_depth_fusion branch March 22, 2026 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants