Arm backend: Remove evaluation from aot_arm_compiler.py by martinlsm · Pull Request #18037 · pytorch/executorch

martinlsm · 2026-03-10T09:37:43Z

Model evaluation, which means to compute and present a model's top-1/top-5 accuracy on ImageNet, as well as some other useful metrics, is to be moved into a new python program.

Remove the evaluation feature from aot_arm_compiler.py, which is part of the migration.

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

Model evaluation, which means to compute and present a model's top-1/top-5 accuracy on ImageNet, as well as some other useful metrics, is to be moved into a new python program. Remove the evaluation feature from aot_arm_compiler.py, which is part of the migration. Signed-off-by: Martin Lindström <Martin.Lindstroem@arm.com> Change-Id: Idff2a670cd1ebdbf4426adddeff0edf246b0bb46

pytorch-bot · 2026-03-10T09:37:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18037

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Awaiting Approval, 3 New Failures

As of commit 46ee7f3 with merge base fde943a ():

AWAITING APPROVAL - The following workflow needs approval before CI can run:

Claude Code (gh)

NEW FAILURES - The following jobs have failed:

Copilot code review / Cleanup artifacts (gh)
Process completed with exit code 1.
pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t f509a35a7a3637c8339d8e17e483a5c53888ffb07bae796deba94e9cc8f2ee56 /exec failed with exit code 1
pull / test-samsung-quantmodels-linux / linux-job (gh)
RuntimeError: Command docker exec -t 06ba0bf3d982021b55c424c857ba8ca682eb3181970410c2bd0496c48d536d92 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

martinlsm · 2026-03-10T09:37:57Z

@pytorchbot label ciflow/trunk

pytorch-bot · 2026-03-10T09:38:03Z

To add these label(s) (ciflow/trunk) to the PR, please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

martinlsm · 2026-03-10T09:38:18Z

@pytorchbot label "partner: arm"

martinlsm · 2026-03-10T09:38:38Z

@pytorchbot label "release notes: arm"

Copilot

Pull request overview

Migrates Arm AOT compiler example script away from in-script model evaluation (accuracy/metrics), as part of moving evaluation into a separate program.

Changes:

Removed evaluator lazy-import logic and all evaluation execution paths from aot_arm_compiler.py.
Deprecated --evaluate/--evaluate_config CLI flags (hidden from --help) and added a runtime log when they’re supplied.
Simplified quantization calibration to run using example_inputs only (removed dataset/DataLoader-based calibration paths).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T09:41:17Z

+    # Calibrate model using example inputs
+    # TODO: Add support for using a calibration dataset
+    m(*example_inputs)



Quantization calibration previously supported using a calibration dataset (e.g., evaluator-provided data or per-model calibration inputs). Calibrating with a single call on example_inputs can significantly degrade quantization quality for real models and is a behavior change beyond removing evaluation. Consider adding a dedicated --calibration_data / --calibration_dataset option (independent of evaluation) or keeping a minimal dataset-loading path so calibration can run over multiple batches when needed.

Suggested change

# Calibrate model using example inputs

# TODO: Add support for using a calibration dataset

m(*example_inputs)

# Calibrate model using example inputs or a sequence of calibration batches.

# TODO: Add support for using a dedicated calibration dataset CLI option.

def _run_calibration(module, example_inputs_arg):

"""

Run calibration over one or more batches.

Supports the following shapes for example_inputs_arg:

- Single batch: a Tensor or a tuple/list of Tensors (original behavior).

- Multiple batches: a list/tuple where each element is a batch in one of the

above forms, allowing calibration over several input batches.

"""

# Treat a list/tuple of batches as multiple calibration batches if the

# first element looks like a batch itself.

if isinstance(example_inputs_arg, (list, tuple)) and example_inputs_arg:

first = example_inputs_arg[0]

if isinstance(first, (torch.Tensor, list, tuple)):

for batch in example_inputs_arg:

if isinstance(batch, (list, tuple)):

module(*batch)

else:

module(batch)

return

# Fallback: assume a single batch (original behavior).

if isinstance(example_inputs_arg, (list, tuple)):

module(*example_inputs_arg)

else:

module(example_inputs_arg)

_run_calibration(m, example_inputs)

Copilot · 2026-03-10T09:41:18Z

+        # Calibrate model using example inputs
+        # TODO: Add support for using a calibration dataset
+        prepared(*example_inputs)


Cortex-M quantization calibration now runs only once on example_inputs. If example_inputs is a single sample, this can produce poor activation ranges and lead to incorrect quantized numerics. Consider supporting an optional calibration dataset input (separate from the removed evaluation feature) so calibration can iterate over representative samples.

Copilot · 2026-03-10T09:41:18Z

+        logging.error(
+            "Model evaluation is no longer supported in this script. Ignore and continue."


The deprecation message for --evaluate/--evaluate_config is not very actionable and the script continues successfully, which can mislead users into thinking evaluation ran. Suggest either failing fast with a clear error that points to the replacement program/command, or downgrading to a warning while explicitly stating how to run evaluation now.

Suggested change

logging.error(

"Model evaluation is no longer supported in this script. Ignore and continue."

raise SystemExit(

"Model evaluation (--evaluate / --evaluate_config) is no longer supported in "

"this script.\n"

"This script is intended only for model export/compilation and does not run "

"evaluation.\n"

"Please run evaluation using the dedicated evaluation tooling or scripts for "

"your project (see the project documentation for details)."

zingo

Nice

Model evaluation, which means to compute and present a model's top-1/top-5 accuracy on ImageNet, as well as some other useful metrics, is to be moved into a new python program. Remove the evaluation feature from aot_arm_compiler.py, which is part of the migration. Signed-off-by: Martin Lindström <Martin.Lindstroem@arm.com>

martinlsm requested a review from digantdesai as a code owner March 10, 2026 09:37

Copilot AI review requested due to automatic review settings March 10, 2026 09:37

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 10, 2026

Copilot started reviewing on behalf of martinlsm March 10, 2026 09:38 View session

pytorch-bot Bot added the partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm label Mar 10, 2026

pytorch-bot Bot added the release notes: arm Changes to the ARM backend delegate label Mar 10, 2026

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Sebastian-Larsson added the ciflow/trunk label Mar 10, 2026

zingo approved these changes Mar 10, 2026

View reviewed changes

zingo added this to the 1.2.0 milestone Mar 10, 2026

Sebastian-Larsson merged commit e458023 into pytorch:main Mar 10, 2026
321 of 334 checks passed

martinlsm deleted the marlin-remove-evaluation branch March 10, 2026 13:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arm backend: Remove evaluation from aot_arm_compiler.py#18037

Arm backend: Remove evaluation from aot_arm_compiler.py#18037
Sebastian-Larsson merged 1 commit intopytorch:mainfrom
martinlsm:marlin-remove-evaluation

martinlsm commented Mar 10, 2026 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

martinlsm commented Mar 10, 2026

Uh oh!

pytorch-bot Bot commented Mar 10, 2026

Uh oh!

martinlsm commented Mar 10, 2026

Uh oh!

martinlsm commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

zingo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-    # Calibrate model using example inputs
-    # TODO: Add support for using a calibration dataset
-    m(*example_inputs)
+    # Calibrate model using example inputs or a sequence of calibration batches.
+    # TODO: Add support for using a dedicated calibration dataset CLI option.
+    def _run_calibration(module, example_inputs_arg):
+        """
+        Run calibration over one or more batches.
+        Supports the following shapes for example_inputs_arg:
+        - Single batch: a Tensor or a tuple/list of Tensors (original behavior).
+        - Multiple batches: a list/tuple where each element is a batch in one of the
+          above forms, allowing calibration over several input batches.
+        """
+        # Treat a list/tuple of batches as multiple calibration batches if the
+        # first element looks like a batch itself.
+        if isinstance(example_inputs_arg, (list, tuple)) and example_inputs_arg:
+            first = example_inputs_arg[0]
+            if isinstance(first, (torch.Tensor, list, tuple)):
+                for batch in example_inputs_arg:
+                    if isinstance(batch, (list, tuple)):
+                        module(*batch)
+                    else:
+                        module(batch)
+                return
+        # Fallback: assume a single batch (original behavior).
+        if isinstance(example_inputs_arg, (list, tuple)):
+            module(*example_inputs_arg)
+        else:
+            module(example_inputs_arg)
+    _run_calibration(m, example_inputs)

		logging.error(
		"Model evaluation is no longer supported in this script. Ignore and continue."

-        logging.error(
-            "Model evaluation is no longer supported in this script. Ignore and continue."
+        raise SystemExit(
+            "Model evaluation (--evaluate / --evaluate_config) is no longer supported in "
+            "this script.\n"
+            "This script is intended only for model export/compilation and does not run "
+            "evaluation.\n"
+            "Please run evaluation using the dedicated evaluation tooling or scripts for "
+            "your project (see the project documentation for details)."

Conversation

martinlsm commented Mar 10, 2026 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18037

❌ 1 Awaiting Approval, 3 New Failures

Uh oh!

martinlsm commented Mar 10, 2026

Uh oh!

pytorch-bot Bot commented Mar 10, 2026

Uh oh!

martinlsm commented Mar 10, 2026

Uh oh!

martinlsm commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

zingo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

martinlsm commented Mar 10, 2026 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Mar 10, 2026 •

edited

Loading