Precompute T1 offset for quantized conv2d NHWC in TIE kernel (#18960)#18960
Precompute T1 offset for quantized conv2d NHWC in TIE kernel (#18960)#18960meta-codesync[bot] merged 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18960
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@abeakkas has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100690813. |
This PR needs a
|
…#18960) Summary: Move the zero-point correction term `t1[oc] = -input_zero_point * sum(weight[oc])` from runtime (malloc + compute_t1_..._DWH + free per inference) to compile time via a new PrecomputeForQuantizedConvPass, mirroring the existing linear pass. The precomputed offset is threaded through a new optional "offset" parameter on cadence::quantized_conv2d_nhwc.per_tensor (defaults to None for backwards compatibility). The now-dead compute_t1_..._DWH functions are removed. The TIE kernels assume the existence of the offset parameter similar to quantized_linear case. Differential Revision: D100690813
90e7476 to
a687c43
Compare
Summary:
Move the zero-point correction term
t1[oc] = -input_zero_point * sum(weight[oc])from runtime (malloc + compute_t1_...DWH + free per inference) to compile time via a new PrecomputeForQuantizedConvPass, mirroring the existing linear pass. The precomputed offset is threaded through a new optional "offset" parameter on cadence::quantized_conv2d_nhwc.per_tensor (defaults to None for backwards compatibility). The now-dead compute_t1..._DWH functions are removed.The TIE kernels assume the existence of the offset parameter similar to quantized_linear case.
Differential Revision: D100690813