Arm backend: Avoid running passes with no matching target ops#19839
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19839
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 3 Unrelated FailuresAs of commit d22f5c1 with merge base acce7cd ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@claude Please find the email address that causes the unknown commit author error |
|
Claude finished @YufengShi-dudu's task in 1m 37s —— View job Finding the problematic commit email
The problematic email is The EasyCLA bot successfully resolved:
To fix this, either:
|
@claude How can I remove the Per's |
|
Claude finished @YufengShi-dudu's task in 24s —— View job Removing the
|
Add ArmPass.should_run_pass() as a reusable early-exit hook before call() starts the normal ExportPass retracing path. The default hook returns true, preserving existing behavior for ArmPass subclasses. Introduce ArmOpTargetedPass for passes that only transform a known set of operator targets. It implements should_run_pass() by scanning the current graph and nested GraphModules for matching target operators. If no matching target operator is found, the pass returns an unmodified PassResult. For passes that already gate transformations with allowed_to_transform(), allow the target pre-scan to apply the same check before deciding whether the pass needs to run. This avoids running TFA passes when all matching target nodes are marked as disallowed. The should_run_pass() hook and ArmOpTargetedPass pre-scan avoid rebuilding graphs for decomposition and rewrite passes that cannot affect the current graph. The speedup is most visible on large models. Single-run paired benchmarks on Arm backend model tests across FP32, INT, VGF no-quant, and VGF quant variants: | Model | E2E avg | Pass-manager avg | |-------------|--------:|-----------------:| | T5-small | +30.5% | +47.5% | | DeepLabV3 | +12.9% | +49.8% | | Wav2Letter | +16.9% | +51.2% | | InceptionV3 | +22.2% | +46.5% | | MobileNetV2 | +22.2% | +52.5% | | MobileNetV3 | +29.9% | +54.6% | Model rows are unweighted averages over successful variants. Unweighted average across 23 successful model/target variants: E2E speedup: +22.4% Pass-manager speedup: +50.5% Change-Id: Iaa09638473a1d6d1e2ce98f5a0e3fc3a14378143 Signed-off-by: Yufeng Shi <yufeng.shi@arm.com> Co-authored-by: Erik Lundell <erik.lundell@arm.com>
80f629c to
920dc9d
Compare
oscarandersson8218
left a comment
There was a problem hiding this comment.
New file test_arm_op_targeted_pass.py should be covered by glob in targets.bzl. Should be OK to merge.
Add ArmPass.should_run_pass() as a reusable early-exit hook before
call() starts the normal ExportPass retracing path. The default hook
returns true, preserving existing behavior for ArmPass subclasses.
Introduce ArmOpTargetedPass for passes that only transform a known
set of operator targets. It implements should_run_pass() by scanning
the current graph and nested GraphModules for matching target
operators. If no matching target operator is found, the pass returns
an unmodified PassResult.
For passes that already gate transformations with
allowed_to_transform(), allow the target pre-scan to apply the same
check before deciding whether the pass needs to run. This avoids
running TFA passes when all matching target nodes are marked as
disallowed.
The should_run_pass() hook and ArmOpTargetedPass pre-scan avoid
rebuilding graphs for decomposition and rewrite passes that cannot
affect the current graph. The speedup is most visible on large models.
Single-run paired benchmarks on Arm backend model tests
across FP32, INT, VGF no-quant, and VGF quant variants:
Model rows are unweighted averages over successful variants.
Unweighted average across 23 successful model/target variants:
E2E speedup: +22.4%
Pass-manager speedup: +50.5%
Change-Id: Iaa09638473a1d6d1e2ce98f5a0e3fc3a14378143
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani