Skip to content

Arm backend: Preserve duplicate output slots with TOSA identity fanout#18866

Merged
AdrianLundell merged 4 commits intopytorch:mainfrom
AdrianLundell:change-1225529
Apr 22, 2026
Merged

Arm backend: Preserve duplicate output slots with TOSA identity fanout#18866
AdrianLundell merged 4 commits intopytorch:mainfrom
AdrianLundell:change-1225529

Conversation

@AdrianLundell
Copy link
Copy Markdown
Collaborator

@AdrianLundell AdrianLundell commented Apr 14, 2026

When FuseEqualPlaceholdersPass fuses equal constant placeholders, the graph output can contain the same node in multiple output slots. In this case ToTosaMemoryFormatPass was rewriting the output node with replace_input_with() while inserting output transposes.

That rewrote all matching occurrences at once, so duplicated logical output slots were collapsed onto the same transpose node instead of remaining distinct.

Fix this by handling duplicate outputs in the output rewrite path. For shared output nodes, create a single boundary TOSA TRANSPOSE and preserve distinct output slots by inserting TOSA IDENTITY fanout nodes for later duplicates.

This keeps insert_input_transpose() focused on normal input rewrites, avoids duplicating equivalent transposes for shared outputs, and preserves the output slot structure expected by later lowering and serialization stages.

Add regression coverage for FuseEqualPlaceholdersPass + ToTosaMemoryFormatPass with duplicate outputs, and add TOSA IDENTITY dialect and visitor coverage.

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

When FuseEqualPlaceholdersPass fuses equal constant placeholders,
the graph output can contain the same node in multiple output slots.
In this case ToTosaMemoryFormatPass was rewriting the output node
with replace_input_with() while inserting output transposes.

That rewrote all matching occurrences at once, so duplicated
logical output slots were collapsed onto the same transpose node
instead of remaining distinct.

Fix this by handling duplicate outputs in the output rewrite path.
For shared output nodes, create a single boundary TOSA TRANSPOSE
and preserve distinct output slots by inserting TOSA IDENTITY
fanout nodes for later duplicates.

This keeps insert_input_transpose() focused on normal input
rewrites, avoids duplicating equivalent transposes for shared
outputs, and preserves the output slot structure expected by later
lowering and serialization stages.

Add regression coverage for FuseEqualPlaceholdersPass +
ToTosaMemoryFormatPass with duplicate outputs, and add TOSA
IDENTITY dialect and visitor coverage.

Signed-off-by: Baris Demir <baris.demir@arm.com>
Change-Id: Ie14bc88bfadaad7f993b71ef1b5332b5953b72c8
@AdrianLundell AdrianLundell added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk release notes: none Do not include this in the release notes labels Apr 14, 2026
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 14, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18866

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 3 New Failures, 4 Pending, 4 Unrelated Failures

As of commit bf44c30 with merge base 6be4fb5 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 14, 2026
@zingo
Copy link
Copy Markdown
Collaborator

zingo commented Apr 14, 2026

Hi @digantdesai this adds file, maybe you want to check this?

Copilot AI review requested due to automatic review settings April 20, 2026 11:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an Arm backend lowering issue where duplicated logical output slots (caused by FuseEqualPlaceholdersPass) could be inadvertently collapsed during ToTosaMemoryFormatPass output rewrites. The solution preserves output-slot structure by introducing tosa.IDENTITY fanout nodes for repeated outputs, while avoiding redundant boundary transposes.

Changes:

  • Add a tosa.IDENTITY backend dialect op plus serializer visitor support.
  • Introduce EnsureUniqueOutputNodesPass and integrate it into the Arm TOSA pipeline after ToTosaMemoryFormatPass.
  • Update output-transpose insertion logic to avoid creating duplicate/dead transposes for shared outputs, and add regression tests covering duplicate-output scenarios.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
backends/arm/tosa/dialect/ops/identity.py Registers the new TOSA IDENTITY dialect op (fake impl for shape/dtype propagation).
backends/arm/tosa/dialect/__init__.py Ensures the identity op is imported/registered with the dialect.
backends/arm/operators/op_tosa_identity.py Adds serializer lowering via IdentityVisitor for tosa.IDENTITY.default.
backends/arm/operators/__init__.py Registers the new identity visitor module.
backends/arm/_passes/to_tosa_memory_format_pass.py Avoids inserting redundant/dead output transposes when outputs share the same producer node.
backends/arm/_passes/ensure_unique_output_nodes_pass.py New pass that inserts tosa.IDENTITY nodes so each output leaf is a distinct node even when producers are shared.
backends/arm/_passes/arm_pass_manager.py Adds EnsureUniqueOutputNodesPass into the default Arm TOSA pipeline after memory-format rewriting.
backends/arm/_passes/__init__.py Exposes EnsureUniqueOutputNodesPass from the passes package.
backends/arm/test/passes/test_to_tosa_memory_format.py Adds regression tests for duplicate constant outputs with memory-format rewriting + placeholder fusion.
backends/arm/test/passes/test_ensure_unique_output_nodes_pass.py Adds focused unit tests for EnsureUniqueOutputNodesPass.
backends/arm/test/misc/test_tosa_dialect_identity.py Adds dialect registration + fake-mode coverage for tosa.IDENTITY.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +244 to +248
passes_with_exported_program=[
FuseEqualPlaceholdersPass,
ToTosaMemoryFormatPass,
EnsureUniqueOutputNodesPass,
],
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnsureUniqueOutputNodesPass is included in passes_with_exported_program, whose harness wrapper instantiates passes as PassClass(ep) unconditionally. Since EnsureUniqueOutputNodesPass doesn't take an exported_program parameter, this positional arg will get bound to ArmPass.__init__'s tfa_pass and can silently change pass behavior. Either move this pass out of passes_with_exported_program (keeping execution order correct), or update EnsureUniqueOutputNodesPass.__init__ to accept an exported_program positional argument and ignore it so instantiation is safe.

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +23
class EnsureUniqueOutputNodesPass(ArmPass):
"""Ensure each graph output leaf references a unique producer node.

If the same node appears multiple times in the output structure, insert a
``tosa.IDENTITY`` node for each occurrence and replace the repeated output
entries with those identity nodes.

"""
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pass inherits ArmPass.__init__(tfa_pass=False, *args, **kwargs), so accidental construction as EnsureUniqueOutputNodesPass(exported_program) (as done by some test harness paths) will bind the exported program object to tfa_pass and treat the pass as a transform-for-annotation pass. Define an explicit __init__ that accepts an optional/ignored exported_program positional parameter and always forwards tfa_pass=False to ArmPass to avoid this silent behavior change.

Copilot uses AI. Check for mistakes.
@AdrianLundell
Copy link
Copy Markdown
Collaborator Author

No buck2 changes required:

  • backends/arm/test/passes/test_ensure_unique_output_nodes_pass.py is covered by native.glob(["passes/test_*.py"]).
  • New _passes/operators/tosa dialect ops Python sources are covered by existing glob(...) source rules.

The new test is not covered in backends/arm/tests/target.bzl, but many other misc tests are already missing there.

Copy link
Copy Markdown
Collaborator

@zingo zingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to merge

with graph.inserting_before(output_node):
identity_node = create_node(
graph,
exir_ops.backend.tosa.IDENTITY.default,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for output placeholders I guess to not mess up the signature?

@github-actions github-actions Bot added the module: arm Issues related to arm backend label Apr 22, 2026
Signed-off-by: Adrian Lundell <adrian.lundell@arm.com>
Copilot AI review requested due to automatic review settings April 22, 2026 09:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +36 to +51
validate_num_inputs(self.target, inputs, 1)
validate_same_dtype(self.target, [inputs[0], output], ts)
validate_valid_dtype(
self.target,
[inputs[0], output],
[
ts.DType.BOOL,
ts.DType.INT8,
ts.DType.INT16,
ts.DType.INT32,
ts.DType.FP16,
ts.DType.FP32,
ts.DType.BF16,
],
self.tosa_spec,
)
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_valid_dtype doesn’t use tosa_spec, so passing a hard-coded list here effectively enables INT16/BF16 for all specs. This can allow lowering an IDENTITY on dtypes that the active TOSA profile/extensions don’t support. Build supported_dtypes dynamically from self.tosa_spec (similar to TransposeVisitor) and only include INT/FP/BF16 types when the spec indicates support.

Copilot uses AI. Check for mistakes.
@AdrianLundell AdrianLundell merged commit f9506f8 into pytorch:main Apr 22, 2026
430 of 443 checks passed
Gasoonjia pushed a commit that referenced this pull request Apr 23, 2026
#18866)

When FuseEqualPlaceholdersPass fuses equal constant placeholders, the
graph output can contain the same node in multiple output slots. In this
case ToTosaMemoryFormatPass was rewriting the output node with
replace_input_with() while inserting output transposes.

That rewrote all matching occurrences at once, so duplicated logical
output slots were collapsed onto the same transpose node instead of
remaining distinct.

Fix this by handling duplicate outputs in the output rewrite path. For
shared output nodes, create a single boundary TOSA TRANSPOSE and
preserve distinct output slots by inserting TOSA IDENTITY fanout nodes
for later duplicates.

This keeps insert_input_transpose() focused on normal input rewrites,
avoids duplicating equivalent transposes for shared outputs, and
preserves the output slot structure expected by later lowering and
serialization stages.

Add regression coverage for FuseEqualPlaceholdersPass +
ToTosaMemoryFormatPass with duplicate outputs, and add TOSA IDENTITY
dialect and visitor coverage.

---------

Signed-off-by: Baris Demir <baris.demir@arm.com>
Signed-off-by: Adrian Lundell <adrian.lundell@arm.com>
Co-authored-by: Baris Demir <baris.demir@arm.com>
zeel2104 pushed a commit to zeel2104/executorch that referenced this pull request May 5, 2026
pytorch#18866)

When FuseEqualPlaceholdersPass fuses equal constant placeholders, the
graph output can contain the same node in multiple output slots. In this
case ToTosaMemoryFormatPass was rewriting the output node with
replace_input_with() while inserting output transposes.

That rewrote all matching occurrences at once, so duplicated logical
output slots were collapsed onto the same transpose node instead of
remaining distinct.

Fix this by handling duplicate outputs in the output rewrite path. For
shared output nodes, create a single boundary TOSA TRANSPOSE and
preserve distinct output slots by inserting TOSA IDENTITY fanout nodes
for later duplicates.

This keeps insert_input_transpose() focused on normal input rewrites,
avoids duplicating equivalent transposes for shared outputs, and
preserves the output slot structure expected by later lowering and
serialization stages.

Add regression coverage for FuseEqualPlaceholdersPass +
ToTosaMemoryFormatPass with duplicate outputs, and add TOSA IDENTITY
dialect and visitor coverage.

---------

Signed-off-by: Baris Demir <baris.demir@arm.com>
Signed-off-by: Adrian Lundell <adrian.lundell@arm.com>
Co-authored-by: Baris Demir <baris.demir@arm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: arm Issues related to arm backend partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm release notes: none Do not include this in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants