Enable Exynos backend Quatization #14464

Jiseong-oh · 2025-09-22T11:35:41Z

Summary

Implemented quantized strategies for enn-backend.
Added support for ENN's quantization strategies.
Successfully verified multiple quantized models.

Test plan

python -m executorch.examples.samsung.scripts.${MODEL_NAME} -c e9955 -p A8W8

cc @SS-JIA @digantdesai @kimishpatel

pytorch-bot · 2025-09-22T11:35:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14464

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 27 Pending, 3 Unrelated Failures

As of commit 3d58d14 with merge base d39992f ():

NEW FAILURES - The following jobs have failed:

pull / test-qnn-wheel-packages-linux (3.10) / linux-job (gh)
RuntimeError: Command docker exec -t 1f472b1b2dd1a78a05031f242f889d34fb5591c5ff1d273105c4eaf2af1a2d0c /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.11) / linux-job (gh)
RuntimeError: Command docker exec -t ccad68d912f84746c4fcd46584a13f88ad2f5b626c5f7e5fc5b4cbef41d96d66 /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.12) / linux-job (gh)
RuntimeError: Command docker exec -t e378c0441db5330f95ce1637e841d57957946b8a05ebc2e5bafb7b2afaafd797 /exec failed with exit code 1
pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t a6d9315bd243ea01a57f8e5a09b8af4a9945db28fe7309532d36945ade15adbd /exec failed with exit code 1
Test CUDA Builds / test-voxtral-cuda-e2e / linux-job (gh)
RuntimeError: Command docker exec -t f103b3d92ad74df1ad2e5d2da41d3d3083135994f949311be01aff472ad13c90 /exec failed with exit code 2

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / unittest / linux / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
pull / unittest-arm-backend-with-no-fvp (test_pytest_ops) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
trunk / test-qnn-model (fp32, mb) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

SS-JIA · 2025-09-23T18:32:49Z

backends/samsung/utils/export_utils.py

    )


+def get_enn_pass_list(edge_program: ExportedProgram) -> List[PassType]:


Why not apply these passes in enn_preprocess.py instead? With the current pattern, users will have to remember to call get_enn_pass_list() and pass the result to to_edge_transform_and_lower() which creates a point of failure if they forget to call it.

It seems to me that at the very least AnnotateQparamsPass should be moved to enn_preprocess.py, since it is required to preserve the quantization parameters. If it is not applied, then the FoldQDQPass() applied at the start of enn_preprocess.py will erase all quantization parameter information.

We want some passes to be done before to_backend, and we define to_edge_transform_and_lower_enn to make it easy to use. AnnotateQparamsPass could be moved to enn_preprocess.py, so we have moved it now. Thanks for your recommendation.

Thanks for pointing this out. I'll address this item in the next update.

backends/samsung/_passes/annotate_qparams.py

backends/samsung/_passes/remove_useless_ops.py

backends/samsung/serialization/enn_graph_schema.py

backends/samsung/enn_preprocess.py

SS-JIA · 2025-09-23T20:00:27Z

backends/samsung/serialization/enn_graph_schema.py

+            if need_quantize and data is not None:
+                if isinstance(data, np.ndarray):
+                    data = torch.tensor(data)
+                data = quantize_tensor(


I'm curious in what situations this is needed. Typically, after the pt2e quantization process, weight tensors should already be quantized and have int8 data type.

It seems constant tensors are not quantized after convert_pt2e, but just has quant/dequant node after the constant tensor nodes. So we quantize all the constant tensors here.

Would you happen to have an example of this behaviour? My understanding is that the activation tensor will have quant/dequant, but the weight tensor will have only a quant node. If you inspect the tensor data backing the weight tensor, it should have int8 type.

For instance, here are some logs I collected while debugging MobileNet V2 quantized with XNNPACK Quantizer that I got a while back:

# dq input node: torch.float32: torch.Size([1, 384, 14, 14]) = quantized_decomposed::dequantize_per_tensor(torch.int8: torch.Size([1, 384, 14, 14]), 0.06534181535243988, 16, -128, 127, torch.int8, ...) # dq weight node: torch.float32: torch.Size([96, 384, 1, 1]) = quantized_decomposed::dequantize_per_channel(torch.int8: torch.Size([96, 384, 1, 1]), torch.float32: torch.Size([96]), torch.int64: torch.Size([96]), 0, -127, 127, torch.int8, ...) # weight tensor: tensor: torch.Size([96, 384, 1, 1]), torch.int8, -127, 127 # scales tensor: tensor: torch.Size([96]), torch.float32, 0.004114292096346617, 0.009169017896056175 # zeros tensor: tensor: torch.Size([96]), torch.int64, 0, 0 # conv node: torch.float32: torch.Size([1, 96, 14, 14]) = aten::conv2d(torch.float32: torch.Size([1, 384, 14, 14]), torch.float32: torch.Size([96, 384, 1, 1]), torch.float32: torch.Size([96]), ...)

As you can see in the following code which is a part of mobilebert, the div has quantized input and a constant input.
The above code changes the constant number of 5.656854249492381 into a quantized one.
Because this changing is only for the internal operation with real HWs, the code can be moved to our backend codes.

%div : [num_users=1] = call_function[target=torch.ops.aten.div.Tensor](args = (%dequantize_per_tensor_default_424, 5.656854249492381), kwargs = {}) %quantize_per_tensor_default_425 : [num_users=1] = call_function[target=torch.ops.quantized_decomposed.quantize_per_tensor.default](args = (%div, 0.15715381503105164, -78, -128, 127, torch.int8), kwargs = {}) %dequantize_per_tensor_default_425 : [num_users=1] = call_function[target=torch.ops.quantized_decomposed.dequantize_per_tensor.default](args = (%quantize_per_tensor_default_425, 0.15715381503105164, -78, -128, 127, torch.int8), kwargs = {out_dtype: torch.float16})

If you think the additional quantization process after convert_pt2e is not appropriate according to the ExecuTorch code policy, we will move this code to our backend in the next update.

Understood. I figured it might have been for binary ops that operate with constant args. In that case LGTM!

backends/samsung/_passes/fold_qdq.py

SS-JIA · 2025-09-23T20:06:48Z

Also, please run lintrunner to fix the lint errors.

# install lintrunner dependencies
pip install lintrunner
pip install lintrunner_adapters

cd executorch
lintrunner -a --verbose

SS-JIA · 2025-09-30T21:31:17Z

Overall LGTM, but before I stamp:

Could you guys rebase to latest main and re-submit the PR? I want to see if the failures get resolved.
There's a remaining comment regarding manually quantizing constant tensors when serializing the model. To me, I don't think this should be required since the quantization workflow should be quantizing the constant tensor for you already.

SS-JIA

LGTM! Thanks for addressing the comments.

Would you mind rebasing one more time before merging? Just to ensure that the merge base is fairly up to date. Thanks!

SS-JIA

Sorry, one more thing. In addition to rebasing, can you include the fix for getting the AI_LITECORE_API_KEY that's implemented in #14866?

The fix was to move up when secrets-env is declared.

SS-JIA · 2025-10-07T21:31:25Z

I decided to just merge #14866, so please just rebase past it and this PR LGTM!

1. Add quant strategies of enn-backend 2. Add support for the enn's quant strategies 3. Provide example code of MV2 Co-authored-by: chen.zhao <chen03.zhao@samsung.com> Co-authored-by: sangsoo.Ko <sangsoo.ko@samsung.com>

Models contain: dlv3/edsr/iv3/iv4/mv3/resnet50/vit/w2l Co-authored-by: chen.zhao <chen03.zhao@samsung.com> Co-authored-by: sangsoo.Ko <sangsoo.ko@samsung.com>

Current models are supported in each script, execute these scripts to verify validness of quantization. Co-authored-by: chong-chen <chong.chen@samsung.com>

- This model need to be updated version of LiteCore.

- This sdk can support mv3 quant model.

Fix comments Co-authored-by: chen03.zhao <chen03.zhao@samsung.com>

As the title shows Co-authored-by: chong-chen <chong.chen@samsung.com>

- This model need to be updated version of LiteCore. Signed-off-by: jiseong.oh <jiseong.oh@samsung.com>

- This sdk can support mv3 quant model. Signed-off-by: jiseong.oh <jiseong.oh@samsung.com>

For ic4, image shape should be (299, 299), it don't need CenterCrop. Co-authored-by: xz-linghu <xz.linghu@samsung.com>

SS-JIA · 2025-10-10T03:21:47Z

Validated that samsung test is passing via #14977. I believe that this PR cannot access the repository API secret because the merge branch is from a fork.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 22, 2025

Jiseong-oh added partner: samsung For backend delegation, kernels, demo, etc. from the 3rd-party partner, Samsung release notes: exynos labels Sep 22, 2025

Jiseong-oh requested review from SS-JIA, digantdesai, kimishpatel and mergennachin September 22, 2025 11:38

mergennachin requested a review from metascroy September 22, 2025 13:15

SS-JIA reviewed Sep 23, 2025

View reviewed changes

Jiseong-oh force-pushed the exynos-quantize-support branch 2 times, most recently from 1eee0b5 to 1d1f37c Compare October 2, 2025 09:16

SS-JIA approved these changes Oct 7, 2025

View reviewed changes

SS-JIA requested changes Oct 7, 2025

View reviewed changes

Jiseong-oh force-pushed the exynos-quantize-support branch 2 times, most recently from 44353d3 to b8215f4 Compare October 9, 2025 06:50

Jiseong-oh and others added 10 commits October 10, 2025 08:43

Add quantization feature and example codes for MV2

3fceb58

1. Add quant strategies of enn-backend 2. Add support for the enn's quant strategies 3. Provide example code of MV2 Co-authored-by: chen.zhao <chen03.zhao@samsung.com> Co-authored-by: sangsoo.Ko <sangsoo.ko@samsung.com>

Add more example code for other quantized models

c4e5df6

Models contain: dlv3/edsr/iv3/iv4/mv3/resnet50/vit/w2l Co-authored-by: chen.zhao <chen03.zhao@samsung.com> Co-authored-by: sangsoo.Ko <sangsoo.ko@samsung.com>

Add quantized model test to workflow

6cdb7dc

Current models are supported in each script, execute these scripts to verify validness of quantization. Co-authored-by: chong-chen <chong.chen@samsung.com>

disable mobilenet v3

f4907fa

- This model need to be updated version of LiteCore.

update litecore version to 0.7

4a84a01

- This sdk can support mv3 quant model.

Fix review comments

4f079e0

Fix comments Co-authored-by: chen03.zhao <chen03.zhao@samsung.com>

Fix lint error

4bb162a

As the title shows Co-authored-by: chong-chen <chong.chen@samsung.com>

disable mobilenet v3

bf19df2

- This model need to be updated version of LiteCore. Signed-off-by: jiseong.oh <jiseong.oh@samsung.com>

update litecore version to 0.7

3372b99

- This sdk can support mv3 quant model. Signed-off-by: jiseong.oh <jiseong.oh@samsung.com>

delete CenterCrop for ic4

44a5e9e

For ic4, image shape should be (299, 299), it don't need CenterCrop. Co-authored-by: xz-linghu <xz.linghu@samsung.com>

Jiseong-oh force-pushed the exynos-quantize-support branch from d21a2ae to 44a5e9e Compare October 9, 2025 23:43

SS-JIA approved these changes Oct 10, 2025

View reviewed changes

Merge branch 'pytorch:main' into exynos-quantize-support

3d58d14

SS-JIA merged commit 8b67236 into pytorch:main Oct 10, 2025
260 of 269 checks passed

Jiseong-oh deleted the exynos-quantize-support branch October 10, 2025 04:26

		)


		def get_enn_pass_list(edge_program: ExportedProgram) -> List[PassType]:

Enable Exynos backend Quatization #14464

Enable Exynos backend Quatization #14464

Uh oh!

Conversation

Jiseong-oh commented Sep 22, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14464

❌ 5 New Failures, 27 Pending, 3 Unrelated Failures

Uh oh!

SS-JIA Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Chen03ZhaoSamsung Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Sangsooko Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SS-JIA Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Chen03ZhaoSamsung Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

SS-JIA Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Sangsooko Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SS-JIA Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SS-JIA commented Sep 23, 2025

Uh oh!

SS-JIA commented Sep 30, 2025

Uh oh!

SS-JIA left a comment

Choose a reason for hiding this comment

Uh oh!

SS-JIA left a comment

Choose a reason for hiding this comment

Uh oh!

SS-JIA commented Oct 7, 2025

Uh oh!

SS-JIA commented Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Jiseong-oh commented Sep 22, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Sep 22, 2025 •

edited

Loading

Sangsooko Oct 2, 2025 •

edited

Loading