Quantization folding pass #7240

per · 2024-12-09T12:10:00Z

Summary

Adds a folding pass to fold in q and dq nodes.

Test plan

Added test for the new pass

pytorch-bot · 2024-12-09T12:10:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7240

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 2a03d6f with merge base 3f7eb3b ():

NEW FAILURE - The following job has failed:

pull / unittest / macos / macos-job (gh)
examples/models/llama3_2_vision/text_decoder/test/test_text_decoder.py::TextDecoderTest::test_llama3_2_text_decoder_aoti

This comment was automatically generated by Dr. CI and updates every 15 minutes.

per · 2024-12-10T21:28:35Z

linux-job failing due to 404 when downloading https://sdk.lunarg.com/sdk/download/1.2.198.1/linux/vulkansdk-linux-x86_64-1.2.198.1.tar.gz
flaky test in Arm unittests not picking up @pytest.mark.flaky (not sure if flaky is installed) and not rerunning automatically.

digantdesai · 2024-12-11T19:17:18Z

backends/arm/operators/op_max.py

+            )
+
+            output.shape = tosa_shape(output.shape, output.dim_order)
+            min_output = tosa_graph.addIntermediate(output.shape, ts.DType.INT32)


Suggested change

min_output = tosa_graph.addIntermediate(output.shape, ts.DType.INT32)

max_output = tosa_graph.addIntermediate(output.shape, ts.DType.INT32)

digantdesai · 2024-12-11T19:18:35Z

backends/arm/operators/op_max.py

+            x_scale = input_qparams[0].scale
+            x_zp = input_qparams[0].zp
+
+            y_scale = input_qparams[1].scale
+            y_zp = input_qparams[1].zp
+
+            assert (
+                x_zp == y_zp
+            ), "Different zp for inputs, MAX should be quantized with shared quantization!"
+            assert (
+                x_scale == y_scale
+            ), "Different scale for input, MAX should be quantized with shared quantization!"


refactor this as a util to assert shared qconfigs across inputs?

Yes will fix it up.

backends/arm/_passes/fold_qdq_with_annotated_qparams_pass.py

digantdesai · 2024-12-11T19:39:10Z

backends/arm/test/passes/test_fold_qdq_pass.py

+
+class SimpleQuantizeModel(torch.nn.Module):
+    def forward(self, x):
+        return x + x


nit: may be make it slightly more complicated with >1 input tensors and >1 add-nodes? may be like max((x + x), (y + y))

Also and also chain of nodes i.e. q0->dq0->op1->q2->dq2->op2->q3-dq3 => q0->op1*->op2*->dq3

digantdesai · 2024-12-11T19:46:55Z

backends/arm/tosa_quant_utils.py

+        dim_order = tensor.dim_order
+        tensor.shape = [tensor.shape[i] for i in dim_order]
+
+    qargs = list(cast(dict[int, QuantArgs], node.meta["input_qparams"]).values())


Assert input_qparams in node.meta

digantdesai · 2024-12-11T19:47:06Z

backends/arm/tosa_quant_utils.py

+    """
+    assert len(node.meta["output_qparams"]) == 1
+
+    qargs_out = cast(dict[int, QuantArgs], node.meta["output_qparams"])[0]


digantdesai · 2024-12-11T19:48:35Z

backends/arm/tosa_quant_utils.py

+    return rescaled_nodes, min_scale
+
+
+def insert_rescale_node_back_to_int8(


Suggested change

def insert_rescale_node_back_to_int8(

def insert_rescale_node_to_int8(

backends/arm/tosa_quant_utils.py

Reuse the logic from the node visiting quantization handling, but replace the quantization parameter fetching from the node meta values. Signed-off-by: Per Åstrand <per.astrand@arm.com> Change-Id: I9a7bbf6384284e60118756ec5661f6b11847aba7

Fold DQ/Q nodes into the target operators specified to the pass. Signed-off-by: Per Åstrand <per.astrand@arm.com> Change-Id: I8a09dc0b887dd5f3915ca157f578ecf51772a1a2

Uses the fold DQ/Q pass to encapsulate the quantization information within the node. Signed-off-by: Per Åstrand <per.astrand@arm.com> Change-Id: I3adbab7e2a23a0208a03bbc423b38c15221a4959

Signed-off-by: Per Åstrand <per.astrand@arm.com> Change-Id: I9230209ed3d6cc0b5ec7a35512248648bb8380ee

Signed-off-by: Per Åstrand <per.astrand@arm.com> Change-Id: I6154e13a5a6b75549862709d632ee6dd5c8b0e7f

Adds a helper function to retrieve QuantArgs from node.meta and cleanup the handling a bit by introducing the __eq__ operator for QuantArgs. Signed-off-by: Per Åstrand <per.astrand@arm.com> Change-Id: I519a9a286a36a278f40ffb6c679192a54d9f940d

Signed-off-by: Per Åstrand <per.astrand@arm.com> Change-Id: I2d133f4347d9999c770e5337162c222368c212f2

per · 2024-12-13T13:23:48Z

pull / unittest / macos / macos-job (pull_request) failing seems to be unrelated (test_flamingo_vision_encoder)

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 9, 2024

per requested a review from digantdesai December 9, 2024 12:14

per added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk topic: not user facing labels Dec 9, 2024

digantdesai reviewed Dec 11, 2024

View reviewed changes

per added 7 commits December 13, 2024 12:19

Add functions for usage with DQ/Q folding pass

6967ade

Reuse the logic from the node visiting quantization handling, but replace the quantization parameter fetching from the node meta values. Signed-off-by: Per Åstrand <per.astrand@arm.com> Change-Id: I9a7bbf6384284e60118756ec5661f6b11847aba7

Introduce a quantization folding pass with annotations

86777b1

Fold DQ/Q nodes into the target operators specified to the pass. Signed-off-by: Per Åstrand <per.astrand@arm.com> Change-Id: I8a09dc0b887dd5f3915ca157f578ecf51772a1a2

Add lowering of TOSA.MIN and TOSA.MAX

0386b23

Uses the fold DQ/Q pass to encapsulate the quantization information within the node. Signed-off-by: Per Åstrand <per.astrand@arm.com> Change-Id: I3adbab7e2a23a0208a03bbc423b38c15221a4959

Add ADD to qdq pass handling

a8daea5

Signed-off-by: Per Åstrand <per.astrand@arm.com> Change-Id: I9230209ed3d6cc0b5ec7a35512248648bb8380ee

Add test for fold qdq pass annotation

2cbf05a

Signed-off-by: Per Åstrand <per.astrand@arm.com> Change-Id: I6154e13a5a6b75549862709d632ee6dd5c8b0e7f

Add helper functions for Q/DQ folding pass

ed236c3

Adds a helper function to retrieve QuantArgs from node.meta and cleanup the handling a bit by introducing the __eq__ operator for QuantArgs. Signed-off-by: Per Åstrand <per.astrand@arm.com> Change-Id: I519a9a286a36a278f40ffb6c679192a54d9f940d

Update Q/DQ Folding pass test to sequence of ops

2a03d6f

Signed-off-by: Per Åstrand <per.astrand@arm.com> Change-Id: I2d133f4347d9999c770e5337162c222368c212f2

per force-pushed the quantization_folding branch from 4a46eec to 2a03d6f Compare December 13, 2024 12:06

per requested a review from digantdesai December 13, 2024 13:23

digantdesai approved these changes Dec 13, 2024

View reviewed changes

per merged commit 99d5b80 into pytorch:main Dec 16, 2024
105 of 106 checks passed

per deleted the quantization_folding branch December 16, 2024 08:45

	min_output = tosa_graph.addIntermediate(output.shape, ts.DType.INT32)
	max_output = tosa_graph.addIntermediate(output.shape, ts.DType.INT32)

		return rescaled_nodes, min_scale


		def insert_rescale_node_back_to_int8(

	def insert_rescale_node_back_to_int8(
	def insert_rescale_node_to_int8(

Quantization folding pass #7240

Quantization folding pass #7240

Uh oh!

Conversation

per commented Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7240

❌ 1 New Failure

Uh oh!

per commented Dec 10, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

per commented Dec 13, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

per commented Dec 9, 2024 •

edited

Loading

pytorch-bot bot commented Dec 9, 2024 •

edited

Loading