[qconv/qlinear] Enabling intra-op parallelism #26692

dskhudia · 2019-09-24T00:06:31Z

Stack from ghstack:

[qconv/qlinear] Enabling intra-op parallelism #26692 [qconv/qlinear] Enabling intra-op parallelism

Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

~~TODO: Performance numbers.~~
qconv performance ( I had to set manually with at::set_num_threads(4)). The op idxs which see the same performance are groupwise convolutions and parallelism for groupwise is not yet supported.

Differential Revision: D17540567

Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear TODO: Performance numbers. Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/) [ghstack-poisoned]

Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear TODO: Performance numbers. Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/) ghstack-source-id: 90635816 Pull Request resolved: #26692

jamesr66a · 2019-09-24T00:10:54Z

Yeah seems reasonable, but definitely need perf numbers before approval

Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear ~~TODO: Performance numbers.~~ qconv performance ( I had to set manually with at::set_num_threads(4)). The op idxs which see the same performance are groupwise convolutions and parallelism for groupwise is not yet supported. ![Resnext101-32x4d shapes](https://user-images.githubusercontent.com/37562707/65555703-2ff3e900-dee2-11e9-8e27-d61d05ebf9b9.png) Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/) [ghstack-poisoned]

Pull Request resolved: #26692 Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear TODO: Performance numbers. ghstack-source-id: 90712827 Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/)

Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear ~~TODO: Performance numbers.~~ qconv performance ( I had to set manually with at::set_num_threads(4)). The op idxs which see the same performance are groupwise convolutions and parallelism for groupwise is not yet supported. ![Resnext101-32x4d shapes](https://user-images.githubusercontent.com/37562707/65555703-2ff3e900-dee2-11e9-8e27-d61d05ebf9b9.png) Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/) [ghstack-poisoned]

Pull Request resolved: #26692 Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear TODO: Performance numbers. ghstack-source-id: 90776466 Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/)

Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear ~~TODO: Performance numbers.~~ qconv performance ( I had to set manually with at::set_num_threads(4)). The op idxs which see the same performance are groupwise convolutions and parallelism for groupwise is not yet supported. ![Resnext101-32x4d shapes](https://user-images.githubusercontent.com/37562707/65555703-2ff3e900-dee2-11e9-8e27-d61d05ebf9b9.png) Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/) [ghstack-poisoned]

Pull Request resolved: #26692 Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear TODO: Performance numbers. ghstack-source-id: 90976812 Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/)

Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear ~~TODO: Performance numbers.~~ qconv performance ( I had to set manually with at::set_num_threads(4)). The op idxs which see the same performance are groupwise convolutions and parallelism for groupwise is not yet supported. ![Resnext101-32x4d shapes](https://user-images.githubusercontent.com/37562707/65555703-2ff3e900-dee2-11e9-8e27-d61d05ebf9b9.png) Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/) [ghstack-poisoned]

jianyuh

LGTM!

Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear ~~TODO: Performance numbers.~~ qconv performance ( I had to set manually with at::set_num_threads(4)). The op idxs which see the same performance are groupwise convolutions and parallelism for groupwise is not yet supported. ![Resnext101-32x4d shapes](https://user-images.githubusercontent.com/37562707/65555703-2ff3e900-dee2-11e9-8e27-d61d05ebf9b9.png) Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/) [ghstack-poisoned]

Pull Request resolved: #26692 Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear TODO: Performance numbers. ghstack-source-id: 91050435 Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/)

Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear ~~TODO: Performance numbers.~~ qconv performance ( I had to set manually with at::set_num_threads(4)). The op idxs which see the same performance are groupwise convolutions and parallelism for groupwise is not yet supported. ![Resnext101-32x4d shapes](https://user-images.githubusercontent.com/37562707/65555703-2ff3e900-dee2-11e9-8e27-d61d05ebf9b9.png) Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/) [ghstack-poisoned]

Pull Request resolved: #26692 Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear TODO: Performance numbers. ghstack-source-id: 91135613 Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/)

facebook-github-bot · 2019-10-02T20:43:48Z

This pull request has been merged in 3eefc54.

Summary: Pull Request resolved: pytorch/pytorch#26692 Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear TODO: Performance numbers. ghstack-source-id: 91135613 Test Plan: export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear Differential Revision: D17540567 fbshipit-source-id: e9962bdf0c25fd3ac4bd0673eee1edd697924406

Summary: Pull Request resolved: #26692 Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear TODO: Performance numbers. ghstack-source-id: 91135613 Test Plan: export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear Differential Revision: D17540567 fbshipit-source-id: e9962bdf0c25fd3ac4bd0673eee1edd697924406

Similar to #26692, we would like to enable the intra-op parallelism for dynamic Linear op. Differential Revision: [D18074757](https://our.internmc.facebook.com/intern/diff/D18074757/) [ghstack-poisoned]

… operator" Similar to #26692, we would like to enable the intra-op parallelism for dynamic Linear op. Differential Revision: [D18074757](https://our.internmc.facebook.com/intern/diff/D18074757/) [ghstack-poisoned]

Pull Request resolved: #28477 Similar to #26692, we would like to enable the intra-op parallelism for dynamic Linear op. ghstack-source-id: 92419573 Differential Revision: [D18074757](https://our.internmc.facebook.com/intern/diff/D18074757/)

…28477) Summary: Pull Request resolved: #28477 Similar to #26692, we would like to enable the intra-op parallelism for dynamic Linear op. ghstack-source-id: 92419573 Test Plan: CI Test Benchmark: ``` import time import torch K, N = 1024, 1024 print('M', 'nthread=1', 'nthread=2', 'nthread=4', 'nthread=8', 'nthread=16', sep=', ') for M in range(512, 2049, 512): print(M, sep=',', end=', ') for num_threads in (1, 2, 4, 8, 16,): torch.set_num_threads(num_threads) x = torch.rand(M, K) w = torch.rand(K, N) NITER = 20 # Test dynamic quantized q_w = torch.quantize_per_tensor(w, 0.01, 0, dtype=torch.qint8) packed_w = torch.ops.quantized.linear_prepack(q_w, None) s = time.time() for i in range(NITER): torch.ops.quantized.linear_dynamic(x, packed_w) elapsed_per_iter_dyn_quant = (time.time() - s) / NITER print("{:0.2f}".format(2.0*M*N*K/elapsed_per_iter_dyn_quant/1E9), end=', ') print("\n", end='') ``` Before this Diff: ``` (base) [root@rtptest10054.frc2 ~/jhuang_test/dynamic_quant]# python benchmark_quantize_dynamic.py M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16 512, 119.28, 139.50, 141.66, 141.58, 141.42, 1024, 122.42, 141.21, 123.09, 141.85, 123.03, 1536, 122.80, 122.18, 141.39, 123.25, 141.35, 2048, 123.41, 141.34, 123.62, 140.55, 123.76, ``` After this Diff: ``` (base) [root@rtptest10054.frc2 ~/jhuang_test/dynamic_quant]# python benchmark_quantize_dynamic.py M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16 512, 123.29, 271.99, 508.66, 882.83, 1295.07, 1024, 126.05, 273.15, 515.42, 914.11, 877.63, 1536, 142.48, 236.85, 524.10, 481.32, 970.81, 2048, 124.76, 279.03, 433.73, 958.67, 1045.82, ``` Differential Revision: D18074757 fbshipit-source-id: ad5b43477d2187c818c137093c6d6af02d5ca1d5

…#28477) Summary: Pull Request resolved: pytorch/pytorch#28477 Similar to pytorch/pytorch#26692, we would like to enable the intra-op parallelism for dynamic Linear op. ghstack-source-id: 92419573 Test Plan: CI Test Benchmark: ``` import time import torch K, N = 1024, 1024 print('M', 'nthread=1', 'nthread=2', 'nthread=4', 'nthread=8', 'nthread=16', sep=', ') for M in range(512, 2049, 512): print(M, sep=',', end=', ') for num_threads in (1, 2, 4, 8, 16,): torch.set_num_threads(num_threads) x = torch.rand(M, K) w = torch.rand(K, N) NITER = 20 # Test dynamic quantized q_w = torch.quantize_per_tensor(w, 0.01, 0, dtype=torch.qint8) packed_w = torch.ops.quantized.linear_prepack(q_w, None) s = time.time() for i in range(NITER): torch.ops.quantized.linear_dynamic(x, packed_w) elapsed_per_iter_dyn_quant = (time.time() - s) / NITER print("{:0.2f}".format(2.0*M*N*K/elapsed_per_iter_dyn_quant/1E9), end=', ') print("\n", end='') ``` Before this Diff: ``` (base) [root@rtptest10054.frc2 ~/jhuang_test/dynamic_quant]# python benchmark_quantize_dynamic.py M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16 512, 119.28, 139.50, 141.66, 141.58, 141.42, 1024, 122.42, 141.21, 123.09, 141.85, 123.03, 1536, 122.80, 122.18, 141.39, 123.25, 141.35, 2048, 123.41, 141.34, 123.62, 140.55, 123.76, ``` After this Diff: ``` (base) [root@rtptest10054.frc2 ~/jhuang_test/dynamic_quant]# python benchmark_quantize_dynamic.py M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16 512, 123.29, 271.99, 508.66, 882.83, 1295.07, 1024, 126.05, 273.15, 515.42, 914.11, 877.63, 1536, 142.48, 236.85, 524.10, 481.32, 970.81, 2048, 124.76, 279.03, 433.73, 958.67, 1045.82, ``` Differential Revision: D18074757 fbshipit-source-id: ad5b43477d2187c818c137093c6d6af02d5ca1d5

Summary: Pull Request resolved: pytorch#26692 Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear TODO: Performance numbers. ghstack-source-id: 91135613 Test Plan: export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear Differential Revision: D17540567 fbshipit-source-id: e9962bdf0c25fd3ac4bd0673eee1edd697924406

pytorchbot added module: operators oncall: quantization Quantization support in PyTorch labels Sep 24, 2019

dskhudia requested review from jianyuh, dzhulgakov and raghuramank100 and removed request for jianyuh September 24, 2019 00:07

dskhudia requested a review from ilia-cher September 24, 2019 00:25

dskhudia added this to the 1.3 milestone Sep 24, 2019

facebook-github-bot closed this in pytorch/FBGEMM@f8da6e6 Sep 25, 2019

facebook-github-bot added the merged label Sep 25, 2019

dskhudia reopened this Sep 25, 2019

dskhudia removed the merged label Sep 25, 2019

dskhudia mentioned this pull request Sep 28, 2019

[operator_benchmarks] Set number of threads for operator_benchmarks #27010

Closed

jianyuh approved these changes Sep 30, 2019

View reviewed changes

ilia-cher approved these changes Sep 30, 2019

View reviewed changes

facebook-github-bot closed this in 3eefc54 Oct 2, 2019

facebook-github-bot added the merged label Oct 2, 2019

jianyuh mentioned this pull request Oct 22, 2019

Enabling intra-op parallelism for dynamic quantized Linear operator #28477

Closed

facebook-github-bot deleted the gh/dskhudia/11/head branch October 28, 2019 22:08

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[qconv/qlinear] Enabling intra-op parallelism #26692

[qconv/qlinear] Enabling intra-op parallelism #26692

Uh oh!

dskhudia commented Sep 24, 2019 •

edited

Loading

Uh oh!

jamesr66a commented Sep 24, 2019

Uh oh!

jianyuh left a comment

Uh oh!

facebook-github-bot commented Oct 2, 2019

Uh oh!

Uh oh!

[qconv/qlinear] Enabling intra-op parallelism #26692

[qconv/qlinear] Enabling intra-op parallelism #26692

Uh oh!

Conversation

dskhudia commented Sep 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jamesr66a commented Sep 24, 2019

Uh oh!

jianyuh left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Oct 2, 2019

Uh oh!

Uh oh!

dskhudia commented Sep 24, 2019 •

edited

Loading