memory efficient per-channel fq: use it everywhere, delete old version #51265

Summary: This PR is the cleanup after #51159. High level, we make the new definition of fake_quant per channel be the definition used by autograd, but keep the old function around as a thin wrapper to keep the user facing API the same. In detail: 1. point fake_quantize_per_channel_affine's implementation to be fake_quantize_per_channel_affine_cachemask 2. delete the fake_quantize_per_channel_affine backward, autograd will automatically use the cachemask backward 3. delete all the fake_quantize_per_channel_affine kernels, since they are no longer used by anything Test Plan: ``` python test/test_quantization.py TestFakeQuantize ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

… old version" Summary: This PR is the cleanup after #51159. High level, we make the new definition of fake_quant per channel be the definition used by autograd, but keep the old function around as a thin wrapper to keep the user facing API the same. In detail: 1. point fake_quantize_per_channel_affine's implementation to be fake_quantize_per_channel_affine_cachemask 2. delete the fake_quantize_per_channel_affine backward, autograd will automatically use the cachemask backward 3. delete all the fake_quantize_per_channel_affine kernels, since they are no longer used by anything Test Plan: ``` python test/test_quantization.py TestFakeQuantize ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory efficient per-channel fq: use it everywhere, delete old version #51265

memory efficient per-channel fq: use it everywhere, delete old version #51265

Commits on Jan 28, 2021