Arm backend: Preserve duplicate output slots with TOSA identity fanout#18866
Arm backend: Preserve duplicate output slots with TOSA identity fanout#18866AdrianLundell merged 4 commits intopytorch:mainfrom
Conversation
When FuseEqualPlaceholdersPass fuses equal constant placeholders, the graph output can contain the same node in multiple output slots. In this case ToTosaMemoryFormatPass was rewriting the output node with replace_input_with() while inserting output transposes. That rewrote all matching occurrences at once, so duplicated logical output slots were collapsed onto the same transpose node instead of remaining distinct. Fix this by handling duplicate outputs in the output rewrite path. For shared output nodes, create a single boundary TOSA TRANSPOSE and preserve distinct output slots by inserting TOSA IDENTITY fanout nodes for later duplicates. This keeps insert_input_transpose() focused on normal input rewrites, avoids duplicating equivalent transposes for shared outputs, and preserves the output slot structure expected by later lowering and serialization stages. Add regression coverage for FuseEqualPlaceholdersPass + ToTosaMemoryFormatPass with duplicate outputs, and add TOSA IDENTITY dialect and visitor coverage. Signed-off-by: Baris Demir <baris.demir@arm.com> Change-Id: Ie14bc88bfadaad7f993b71ef1b5332b5953b72c8
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18866
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 3 New Failures, 4 Pending, 4 Unrelated FailuresAs of commit bf44c30 with merge base 6be4fb5 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Hi @digantdesai this adds file, maybe you want to check this? |
There was a problem hiding this comment.
Pull request overview
This PR fixes an Arm backend lowering issue where duplicated logical output slots (caused by FuseEqualPlaceholdersPass) could be inadvertently collapsed during ToTosaMemoryFormatPass output rewrites. The solution preserves output-slot structure by introducing tosa.IDENTITY fanout nodes for repeated outputs, while avoiding redundant boundary transposes.
Changes:
- Add a
tosa.IDENTITYbackend dialect op plus serializer visitor support. - Introduce
EnsureUniqueOutputNodesPassand integrate it into the Arm TOSA pipeline afterToTosaMemoryFormatPass. - Update output-transpose insertion logic to avoid creating duplicate/dead transposes for shared outputs, and add regression tests covering duplicate-output scenarios.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
backends/arm/tosa/dialect/ops/identity.py |
Registers the new TOSA IDENTITY dialect op (fake impl for shape/dtype propagation). |
backends/arm/tosa/dialect/__init__.py |
Ensures the identity op is imported/registered with the dialect. |
backends/arm/operators/op_tosa_identity.py |
Adds serializer lowering via IdentityVisitor for tosa.IDENTITY.default. |
backends/arm/operators/__init__.py |
Registers the new identity visitor module. |
backends/arm/_passes/to_tosa_memory_format_pass.py |
Avoids inserting redundant/dead output transposes when outputs share the same producer node. |
backends/arm/_passes/ensure_unique_output_nodes_pass.py |
New pass that inserts tosa.IDENTITY nodes so each output leaf is a distinct node even when producers are shared. |
backends/arm/_passes/arm_pass_manager.py |
Adds EnsureUniqueOutputNodesPass into the default Arm TOSA pipeline after memory-format rewriting. |
backends/arm/_passes/__init__.py |
Exposes EnsureUniqueOutputNodesPass from the passes package. |
backends/arm/test/passes/test_to_tosa_memory_format.py |
Adds regression tests for duplicate constant outputs with memory-format rewriting + placeholder fusion. |
backends/arm/test/passes/test_ensure_unique_output_nodes_pass.py |
Adds focused unit tests for EnsureUniqueOutputNodesPass. |
backends/arm/test/misc/test_tosa_dialect_identity.py |
Adds dialect registration + fake-mode coverage for tosa.IDENTITY. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| passes_with_exported_program=[ | ||
| FuseEqualPlaceholdersPass, | ||
| ToTosaMemoryFormatPass, | ||
| EnsureUniqueOutputNodesPass, | ||
| ], |
There was a problem hiding this comment.
EnsureUniqueOutputNodesPass is included in passes_with_exported_program, whose harness wrapper instantiates passes as PassClass(ep) unconditionally. Since EnsureUniqueOutputNodesPass doesn't take an exported_program parameter, this positional arg will get bound to ArmPass.__init__'s tfa_pass and can silently change pass behavior. Either move this pass out of passes_with_exported_program (keeping execution order correct), or update EnsureUniqueOutputNodesPass.__init__ to accept an exported_program positional argument and ignore it so instantiation is safe.
| class EnsureUniqueOutputNodesPass(ArmPass): | ||
| """Ensure each graph output leaf references a unique producer node. | ||
|
|
||
| If the same node appears multiple times in the output structure, insert a | ||
| ``tosa.IDENTITY`` node for each occurrence and replace the repeated output | ||
| entries with those identity nodes. | ||
|
|
||
| """ |
There was a problem hiding this comment.
This pass inherits ArmPass.__init__(tfa_pass=False, *args, **kwargs), so accidental construction as EnsureUniqueOutputNodesPass(exported_program) (as done by some test harness paths) will bind the exported program object to tfa_pass and treat the pass as a transform-for-annotation pass. Define an explicit __init__ that accepts an optional/ignored exported_program positional parameter and always forwards tfa_pass=False to ArmPass to avoid this silent behavior change.
|
No buck2 changes required:
The new test is not covered in backends/arm/tests/target.bzl, but many other misc tests are already missing there. |
| with graph.inserting_before(output_node): | ||
| identity_node = create_node( | ||
| graph, | ||
| exir_ops.backend.tosa.IDENTITY.default, |
There was a problem hiding this comment.
Just for output placeholders I guess to not mess up the signature?
Signed-off-by: Adrian Lundell <adrian.lundell@arm.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| validate_num_inputs(self.target, inputs, 1) | ||
| validate_same_dtype(self.target, [inputs[0], output], ts) | ||
| validate_valid_dtype( | ||
| self.target, | ||
| [inputs[0], output], | ||
| [ | ||
| ts.DType.BOOL, | ||
| ts.DType.INT8, | ||
| ts.DType.INT16, | ||
| ts.DType.INT32, | ||
| ts.DType.FP16, | ||
| ts.DType.FP32, | ||
| ts.DType.BF16, | ||
| ], | ||
| self.tosa_spec, | ||
| ) |
There was a problem hiding this comment.
validate_valid_dtype doesn’t use tosa_spec, so passing a hard-coded list here effectively enables INT16/BF16 for all specs. This can allow lowering an IDENTITY on dtypes that the active TOSA profile/extensions don’t support. Build supported_dtypes dynamically from self.tosa_spec (similar to TransposeVisitor) and only include INT/FP/BF16 types when the spec indicates support.
#18866) When FuseEqualPlaceholdersPass fuses equal constant placeholders, the graph output can contain the same node in multiple output slots. In this case ToTosaMemoryFormatPass was rewriting the output node with replace_input_with() while inserting output transposes. That rewrote all matching occurrences at once, so duplicated logical output slots were collapsed onto the same transpose node instead of remaining distinct. Fix this by handling duplicate outputs in the output rewrite path. For shared output nodes, create a single boundary TOSA TRANSPOSE and preserve distinct output slots by inserting TOSA IDENTITY fanout nodes for later duplicates. This keeps insert_input_transpose() focused on normal input rewrites, avoids duplicating equivalent transposes for shared outputs, and preserves the output slot structure expected by later lowering and serialization stages. Add regression coverage for FuseEqualPlaceholdersPass + ToTosaMemoryFormatPass with duplicate outputs, and add TOSA IDENTITY dialect and visitor coverage. --------- Signed-off-by: Baris Demir <baris.demir@arm.com> Signed-off-by: Adrian Lundell <adrian.lundell@arm.com> Co-authored-by: Baris Demir <baris.demir@arm.com>
pytorch#18866) When FuseEqualPlaceholdersPass fuses equal constant placeholders, the graph output can contain the same node in multiple output slots. In this case ToTosaMemoryFormatPass was rewriting the output node with replace_input_with() while inserting output transposes. That rewrote all matching occurrences at once, so duplicated logical output slots were collapsed onto the same transpose node instead of remaining distinct. Fix this by handling duplicate outputs in the output rewrite path. For shared output nodes, create a single boundary TOSA TRANSPOSE and preserve distinct output slots by inserting TOSA IDENTITY fanout nodes for later duplicates. This keeps insert_input_transpose() focused on normal input rewrites, avoids duplicating equivalent transposes for shared outputs, and preserves the output slot structure expected by later lowering and serialization stages. Add regression coverage for FuseEqualPlaceholdersPass + ToTosaMemoryFormatPass with duplicate outputs, and add TOSA IDENTITY dialect and visitor coverage. --------- Signed-off-by: Baris Demir <baris.demir@arm.com> Signed-off-by: Adrian Lundell <adrian.lundell@arm.com> Co-authored-by: Baris Demir <baris.demir@arm.com>
When FuseEqualPlaceholdersPass fuses equal constant placeholders, the graph output can contain the same node in multiple output slots. In this case ToTosaMemoryFormatPass was rewriting the output node with replace_input_with() while inserting output transposes.
That rewrote all matching occurrences at once, so duplicated logical output slots were collapsed onto the same transpose node instead of remaining distinct.
Fix this by handling duplicate outputs in the output rewrite path. For shared output nodes, create a single boundary TOSA TRANSPOSE and preserve distinct output slots by inserting TOSA IDENTITY fanout nodes for later duplicates.
This keeps insert_input_transpose() focused on normal input rewrites, avoids duplicating equivalent transposes for shared outputs, and preserves the output slot structure expected by later lowering and serialization stages.
Add regression coverage for FuseEqualPlaceholdersPass + ToTosaMemoryFormatPass with duplicate outputs, and add TOSA IDENTITY dialect and visitor coverage.
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell