Arm backend: Remove evaluation from aot_arm_compiler.py#18037
Arm backend: Remove evaluation from aot_arm_compiler.py#18037Sebastian-Larsson merged 1 commit intopytorch:mainfrom
Conversation
Model evaluation, which means to compute and present a model's top-1/top-5 accuracy on ImageNet, as well as some other useful metrics, is to be moved into a new python program. Remove the evaluation feature from aot_arm_compiler.py, which is part of the migration. Signed-off-by: Martin Lindström <Martin.Lindstroem@arm.com> Change-Id: Idff2a670cd1ebdbf4426adddeff0edf246b0bb46
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18037
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Awaiting Approval, 3 New FailuresAs of commit 46ee7f3 with merge base fde943a ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label ciflow/trunk |
|
To add these label(s) (ciflow/trunk) to the PR, please first approve the workflows that are awaiting approval (scroll to the bottom of this page). This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
@pytorchbot label "partner: arm" |
|
@pytorchbot label "release notes: arm" |
There was a problem hiding this comment.
Pull request overview
Migrates Arm AOT compiler example script away from in-script model evaluation (accuracy/metrics), as part of moving evaluation into a separate program.
Changes:
- Removed evaluator lazy-import logic and all evaluation execution paths from
aot_arm_compiler.py. - Deprecated
--evaluate/--evaluate_configCLI flags (hidden from--help) and added a runtime log when they’re supplied. - Simplified quantization calibration to run using
example_inputsonly (removed dataset/DataLoader-based calibration paths).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Calibrate model using example inputs | ||
| # TODO: Add support for using a calibration dataset | ||
| m(*example_inputs) | ||
|
|
There was a problem hiding this comment.
Quantization calibration previously supported using a calibration dataset (e.g., evaluator-provided data or per-model calibration inputs). Calibrating with a single call on example_inputs can significantly degrade quantization quality for real models and is a behavior change beyond removing evaluation. Consider adding a dedicated --calibration_data / --calibration_dataset option (independent of evaluation) or keeping a minimal dataset-loading path so calibration can run over multiple batches when needed.
| # Calibrate model using example inputs | |
| # TODO: Add support for using a calibration dataset | |
| m(*example_inputs) | |
| # Calibrate model using example inputs or a sequence of calibration batches. | |
| # TODO: Add support for using a dedicated calibration dataset CLI option. | |
| def _run_calibration(module, example_inputs_arg): | |
| """ | |
| Run calibration over one or more batches. | |
| Supports the following shapes for example_inputs_arg: | |
| - Single batch: a Tensor or a tuple/list of Tensors (original behavior). | |
| - Multiple batches: a list/tuple where each element is a batch in one of the | |
| above forms, allowing calibration over several input batches. | |
| """ | |
| # Treat a list/tuple of batches as multiple calibration batches if the | |
| # first element looks like a batch itself. | |
| if isinstance(example_inputs_arg, (list, tuple)) and example_inputs_arg: | |
| first = example_inputs_arg[0] | |
| if isinstance(first, (torch.Tensor, list, tuple)): | |
| for batch in example_inputs_arg: | |
| if isinstance(batch, (list, tuple)): | |
| module(*batch) | |
| else: | |
| module(batch) | |
| return | |
| # Fallback: assume a single batch (original behavior). | |
| if isinstance(example_inputs_arg, (list, tuple)): | |
| module(*example_inputs_arg) | |
| else: | |
| module(example_inputs_arg) | |
| _run_calibration(m, example_inputs) |
| # Calibrate model using example inputs | ||
| # TODO: Add support for using a calibration dataset | ||
| prepared(*example_inputs) |
There was a problem hiding this comment.
Cortex-M quantization calibration now runs only once on example_inputs. If example_inputs is a single sample, this can produce poor activation ranges and lead to incorrect quantized numerics. Consider supporting an optional calibration dataset input (separate from the removed evaluation feature) so calibration can iterate over representative samples.
| logging.error( | ||
| "Model evaluation is no longer supported in this script. Ignore and continue." |
There was a problem hiding this comment.
The deprecation message for --evaluate/--evaluate_config is not very actionable and the script continues successfully, which can mislead users into thinking evaluation ran. Suggest either failing fast with a clear error that points to the replacement program/command, or downgrading to a warning while explicitly stating how to run evaluation now.
| logging.error( | |
| "Model evaluation is no longer supported in this script. Ignore and continue." | |
| raise SystemExit( | |
| "Model evaluation (--evaluate / --evaluate_config) is no longer supported in " | |
| "this script.\n" | |
| "This script is intended only for model export/compilation and does not run " | |
| "evaluation.\n" | |
| "Please run evaluation using the dedicated evaluation tooling or scripts for " | |
| "your project (see the project documentation for details)." |
Model evaluation, which means to compute and present a model's top-1/top-5 accuracy on ImageNet, as well as some other useful metrics, is to be moved into a new python program. Remove the evaluation feature from aot_arm_compiler.py, which is part of the migration. Signed-off-by: Martin Lindström <Martin.Lindstroem@arm.com>
Model evaluation, which means to compute and present a model's top-1/top-5 accuracy on ImageNet, as well as some other useful metrics, is to be moved into a new python program.
Remove the evaluation feature from aot_arm_compiler.py, which is part of the migration.
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell