New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pt][quant] Optimized qadd_scalar #34925
Conversation
Optimized path for qadd scalar. qadd_scalar time goes down from 55.840ms for a model to 4.637ms. ### Before ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.12% 155.807us 0.12% 155.807us 155.807us 1 quantized::conv2d 25.50% 31.981ms 25.50% 31.981ms 273.343us 117 quantized::add_scalar 44.53% 55.840ms 44.53% 55.840ms 809.281us 69 quantized::relu6 1.25% 1.570ms 1.25% 1.570ms 22.749us 69 quantized::mul_scalar 10.73% 13.449ms 10.73% 13.449ms 194.914us 69 quantized::mul 16.67% 20.904ms 16.67% 20.904ms 227.220us 92 adaptive_avg_pool2d 0.03% 41.713us 0.69% 862.922us 35.955us 24 _adaptive_avg_pool2d 0.65% 821.209us 0.65% 821.209us 34.217us 24 sigmoid 0.15% 182.344us 0.15% 182.344us 7.928us 23 quantized::add 0.34% 431.939us 0.34% 431.939us 26.996us 16 dropout 0.00% 1.936us 0.00% 1.936us 1.936us 1 view 0.01% 10.281us 0.01% 10.281us 10.281us 1 dequantize 0.00% 4.562us 0.00% 4.562us 4.562us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 125.394ms ``` ### After ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.18% 130.534us 0.18% 130.534us 130.534us 1 quantized::conv2d 42.29% 31.267ms 42.29% 31.267ms 267.243us 117 quantized::add_scalar 6.27% 4.637ms 6.27% 4.637ms 67.205us 69 quantized::relu6 1.77% 1.312ms 1.77% 1.312ms 19.008us 69 quantized::mul_scalar 18.92% 13.991ms 18.92% 13.991ms 202.768us 69 quantized::mul 28.49% 21.059ms 28.49% 21.059ms 228.904us 92 adaptive_avg_pool2d 0.06% 45.242us 1.27% 942.522us 39.272us 24 _adaptive_avg_pool2d 1.21% 897.280us 1.21% 897.280us 37.387us 24 sigmoid 0.22% 160.282us 0.22% 160.282us 6.969us 23 quantized::add 0.56% 416.276us 0.56% 416.276us 26.017us 16 dropout 0.00% 1.245us 0.00% 1.245us 1.245us 1 view 0.01% 7.122us 0.01% 7.122us 7.122us 1 dequantize 0.01% 5.952us 0.01% 5.952us 5.952us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 73.930ms ``` Differential Revision: [D20500848](https://our.internmc.facebook.com/intern/diff/D20500848/) [ghstack-poisoned]
Optimized path for qadd scalar. qadd_scalar time goes down from 55.840ms for a model to 4.637ms. ### Before ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.12% 155.807us 0.12% 155.807us 155.807us 1 quantized::conv2d 25.50% 31.981ms 25.50% 31.981ms 273.343us 117 quantized::add_scalar 44.53% 55.840ms 44.53% 55.840ms 809.281us 69 quantized::relu6 1.25% 1.570ms 1.25% 1.570ms 22.749us 69 quantized::mul_scalar 10.73% 13.449ms 10.73% 13.449ms 194.914us 69 quantized::mul 16.67% 20.904ms 16.67% 20.904ms 227.220us 92 adaptive_avg_pool2d 0.03% 41.713us 0.69% 862.922us 35.955us 24 _adaptive_avg_pool2d 0.65% 821.209us 0.65% 821.209us 34.217us 24 sigmoid 0.15% 182.344us 0.15% 182.344us 7.928us 23 quantized::add 0.34% 431.939us 0.34% 431.939us 26.996us 16 dropout 0.00% 1.936us 0.00% 1.936us 1.936us 1 view 0.01% 10.281us 0.01% 10.281us 10.281us 1 dequantize 0.00% 4.562us 0.00% 4.562us 4.562us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 125.394ms ``` ### After ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.18% 130.534us 0.18% 130.534us 130.534us 1 quantized::conv2d 42.29% 31.267ms 42.29% 31.267ms 267.243us 117 quantized::add_scalar 6.27% 4.637ms 6.27% 4.637ms 67.205us 69 quantized::relu6 1.77% 1.312ms 1.77% 1.312ms 19.008us 69 quantized::mul_scalar 18.92% 13.991ms 18.92% 13.991ms 202.768us 69 quantized::mul 28.49% 21.059ms 28.49% 21.059ms 228.904us 92 adaptive_avg_pool2d 0.06% 45.242us 1.27% 942.522us 39.272us 24 _adaptive_avg_pool2d 1.21% 897.280us 1.21% 897.280us 37.387us 24 sigmoid 0.22% 160.282us 0.22% 160.282us 6.969us 23 quantized::add 0.56% 416.276us 0.56% 416.276us 26.017us 16 dropout 0.00% 1.245us 0.00% 1.245us 1.245us 1 view 0.01% 7.122us 0.01% 7.122us 7.122us 1 dequantize 0.01% 5.952us 0.01% 5.952us 5.952us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 73.930ms ``` Differential Revision: [D20500848](https://our.internmc.facebook.com/intern/diff/D20500848/) ghstack-source-id: 100341353 Pull Request resolved: #34925
💊 CircleCI build failures summary and remediationsAs of commit 82de5b5 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no CircleCI failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 13 times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
return Vec256<c10::qint32>::loadu(result_vals); | ||
#endif | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these be implemented as members on Vec256c10:;qint32 instead of free functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are implemented the same way for float. I am not sure if there is a reason for them to be free functions.
https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/cpu/vec256/vec256_float.h#L244-L262
"Only per tensor affine is supported for now!!"); | ||
TORCH_CHECK( | ||
self.qscheme() == kPerTensorAffine, | ||
"Only per tensor affine is supported for now!!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dskhudia I think the code in the comment below is no longer relevant with your changes. Can you update that as well to reflect the new requant flow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Good catch. I forgot about it.
Optimized path for qadd scalar. qadd_scalar time goes down from 55.840ms for a model to 4.637ms. ### Before ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.12% 155.807us 0.12% 155.807us 155.807us 1 quantized::conv2d 25.50% 31.981ms 25.50% 31.981ms 273.343us 117 quantized::add_scalar 44.53% 55.840ms 44.53% 55.840ms 809.281us 69 quantized::relu6 1.25% 1.570ms 1.25% 1.570ms 22.749us 69 quantized::mul_scalar 10.73% 13.449ms 10.73% 13.449ms 194.914us 69 quantized::mul 16.67% 20.904ms 16.67% 20.904ms 227.220us 92 adaptive_avg_pool2d 0.03% 41.713us 0.69% 862.922us 35.955us 24 _adaptive_avg_pool2d 0.65% 821.209us 0.65% 821.209us 34.217us 24 sigmoid 0.15% 182.344us 0.15% 182.344us 7.928us 23 quantized::add 0.34% 431.939us 0.34% 431.939us 26.996us 16 dropout 0.00% 1.936us 0.00% 1.936us 1.936us 1 view 0.01% 10.281us 0.01% 10.281us 10.281us 1 dequantize 0.00% 4.562us 0.00% 4.562us 4.562us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 125.394ms ``` ### After ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.18% 130.534us 0.18% 130.534us 130.534us 1 quantized::conv2d 42.29% 31.267ms 42.29% 31.267ms 267.243us 117 quantized::add_scalar 6.27% 4.637ms 6.27% 4.637ms 67.205us 69 quantized::relu6 1.77% 1.312ms 1.77% 1.312ms 19.008us 69 quantized::mul_scalar 18.92% 13.991ms 18.92% 13.991ms 202.768us 69 quantized::mul 28.49% 21.059ms 28.49% 21.059ms 228.904us 92 adaptive_avg_pool2d 0.06% 45.242us 1.27% 942.522us 39.272us 24 _adaptive_avg_pool2d 1.21% 897.280us 1.21% 897.280us 37.387us 24 sigmoid 0.22% 160.282us 0.22% 160.282us 6.969us 23 quantized::add 0.56% 416.276us 0.56% 416.276us 26.017us 16 dropout 0.00% 1.245us 0.00% 1.245us 1.245us 1 view 0.01% 7.122us 0.01% 7.122us 7.122us 1 dequantize 0.01% 5.952us 0.01% 5.952us 5.952us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 73.930ms ``` Differential Revision: [D20500848](https://our.internmc.facebook.com/intern/diff/D20500848/) [ghstack-poisoned]
Pull Request resolved: #34925 Optimized path for qadd scalar. qadd_scalar time goes down from 55.840ms for a model to 4.637ms. ### Before ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.12% 155.807us 0.12% 155.807us 155.807us 1 quantized::conv2d 25.50% 31.981ms 25.50% 31.981ms 273.343us 117 quantized::add_scalar 44.53% 55.840ms 44.53% 55.840ms 809.281us 69 quantized::relu6 1.25% 1.570ms 1.25% 1.570ms 22.749us 69 quantized::mul_scalar 10.73% 13.449ms 10.73% 13.449ms 194.914us 69 quantized::mul 16.67% 20.904ms 16.67% 20.904ms 227.220us 92 adaptive_avg_pool2d 0.03% 41.713us 0.69% 862.922us 35.955us 24 _adaptive_avg_pool2d 0.65% 821.209us 0.65% 821.209us 34.217us 24 sigmoid 0.15% 182.344us 0.15% 182.344us 7.928us 23 quantized::add 0.34% 431.939us 0.34% 431.939us 26.996us 16 dropout 0.00% 1.936us 0.00% 1.936us 1.936us 1 view 0.01% 10.281us 0.01% 10.281us 10.281us 1 dequantize 0.00% 4.562us 0.00% 4.562us 4.562us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 125.394ms ``` ### After ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.18% 130.534us 0.18% 130.534us 130.534us 1 quantized::conv2d 42.29% 31.267ms 42.29% 31.267ms 267.243us 117 quantized::add_scalar 6.27% 4.637ms 6.27% 4.637ms 67.205us 69 quantized::relu6 1.77% 1.312ms 1.77% 1.312ms 19.008us 69 quantized::mul_scalar 18.92% 13.991ms 18.92% 13.991ms 202.768us 69 quantized::mul 28.49% 21.059ms 28.49% 21.059ms 228.904us 92 adaptive_avg_pool2d 0.06% 45.242us 1.27% 942.522us 39.272us 24 _adaptive_avg_pool2d 1.21% 897.280us 1.21% 897.280us 37.387us 24 sigmoid 0.22% 160.282us 0.22% 160.282us 6.969us 23 quantized::add 0.56% 416.276us 0.56% 416.276us 26.017us 16 dropout 0.00% 1.245us 0.00% 1.245us 1.245us 1 view 0.01% 7.122us 0.01% 7.122us 7.122us 1 dequantize 0.01% 5.952us 0.01% 5.952us 5.952us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 73.930ms ``` ghstack-source-id: 100470384 Differential Revision: [D20500848](https://our.internmc.facebook.com/intern/diff/D20500848/)
Optimized path for qadd scalar. qadd_scalar time goes down from 55.840ms for a model to 4.637ms. ### Before ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.12% 155.807us 0.12% 155.807us 155.807us 1 quantized::conv2d 25.50% 31.981ms 25.50% 31.981ms 273.343us 117 quantized::add_scalar 44.53% 55.840ms 44.53% 55.840ms 809.281us 69 quantized::relu6 1.25% 1.570ms 1.25% 1.570ms 22.749us 69 quantized::mul_scalar 10.73% 13.449ms 10.73% 13.449ms 194.914us 69 quantized::mul 16.67% 20.904ms 16.67% 20.904ms 227.220us 92 adaptive_avg_pool2d 0.03% 41.713us 0.69% 862.922us 35.955us 24 _adaptive_avg_pool2d 0.65% 821.209us 0.65% 821.209us 34.217us 24 sigmoid 0.15% 182.344us 0.15% 182.344us 7.928us 23 quantized::add 0.34% 431.939us 0.34% 431.939us 26.996us 16 dropout 0.00% 1.936us 0.00% 1.936us 1.936us 1 view 0.01% 10.281us 0.01% 10.281us 10.281us 1 dequantize 0.00% 4.562us 0.00% 4.562us 4.562us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 125.394ms ``` ### After ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.18% 130.534us 0.18% 130.534us 130.534us 1 quantized::conv2d 42.29% 31.267ms 42.29% 31.267ms 267.243us 117 quantized::add_scalar 6.27% 4.637ms 6.27% 4.637ms 67.205us 69 quantized::relu6 1.77% 1.312ms 1.77% 1.312ms 19.008us 69 quantized::mul_scalar 18.92% 13.991ms 18.92% 13.991ms 202.768us 69 quantized::mul 28.49% 21.059ms 28.49% 21.059ms 228.904us 92 adaptive_avg_pool2d 0.06% 45.242us 1.27% 942.522us 39.272us 24 _adaptive_avg_pool2d 1.21% 897.280us 1.21% 897.280us 37.387us 24 sigmoid 0.22% 160.282us 0.22% 160.282us 6.969us 23 quantized::add 0.56% 416.276us 0.56% 416.276us 26.017us 16 dropout 0.00% 1.245us 0.00% 1.245us 1.245us 1 view 0.01% 7.122us 0.01% 7.122us 7.122us 1 dequantize 0.01% 5.952us 0.01% 5.952us 5.952us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 73.930ms ``` Differential Revision: [D20500848](https://our.internmc.facebook.com/intern/diff/D20500848/) [ghstack-poisoned]
Optimized path for qadd scalar. qadd_scalar time goes down from 55.840ms for a model to 4.637ms. ### Before ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.12% 155.807us 0.12% 155.807us 155.807us 1 quantized::conv2d 25.50% 31.981ms 25.50% 31.981ms 273.343us 117 quantized::add_scalar 44.53% 55.840ms 44.53% 55.840ms 809.281us 69 quantized::relu6 1.25% 1.570ms 1.25% 1.570ms 22.749us 69 quantized::mul_scalar 10.73% 13.449ms 10.73% 13.449ms 194.914us 69 quantized::mul 16.67% 20.904ms 16.67% 20.904ms 227.220us 92 adaptive_avg_pool2d 0.03% 41.713us 0.69% 862.922us 35.955us 24 _adaptive_avg_pool2d 0.65% 821.209us 0.65% 821.209us 34.217us 24 sigmoid 0.15% 182.344us 0.15% 182.344us 7.928us 23 quantized::add 0.34% 431.939us 0.34% 431.939us 26.996us 16 dropout 0.00% 1.936us 0.00% 1.936us 1.936us 1 view 0.01% 10.281us 0.01% 10.281us 10.281us 1 dequantize 0.00% 4.562us 0.00% 4.562us 4.562us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 125.394ms ``` ### After ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.18% 130.534us 0.18% 130.534us 130.534us 1 quantized::conv2d 42.29% 31.267ms 42.29% 31.267ms 267.243us 117 quantized::add_scalar 6.27% 4.637ms 6.27% 4.637ms 67.205us 69 quantized::relu6 1.77% 1.312ms 1.77% 1.312ms 19.008us 69 quantized::mul_scalar 18.92% 13.991ms 18.92% 13.991ms 202.768us 69 quantized::mul 28.49% 21.059ms 28.49% 21.059ms 228.904us 92 adaptive_avg_pool2d 0.06% 45.242us 1.27% 942.522us 39.272us 24 _adaptive_avg_pool2d 1.21% 897.280us 1.21% 897.280us 37.387us 24 sigmoid 0.22% 160.282us 0.22% 160.282us 6.969us 23 quantized::add 0.56% 416.276us 0.56% 416.276us 26.017us 16 dropout 0.00% 1.245us 0.00% 1.245us 1.245us 1 view 0.01% 7.122us 0.01% 7.122us 7.122us 1 dequantize 0.01% 5.952us 0.01% 5.952us 5.952us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 73.930ms ``` Differential Revision: [D20500848](https://our.internmc.facebook.com/intern/diff/D20500848/) [ghstack-poisoned]
Pull Request resolved: #34925 Optimized path for qadd scalar. qadd_scalar time goes down from 55.840ms for a model to 4.637ms. ### Before ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.12% 155.807us 0.12% 155.807us 155.807us 1 quantized::conv2d 25.50% 31.981ms 25.50% 31.981ms 273.343us 117 quantized::add_scalar 44.53% 55.840ms 44.53% 55.840ms 809.281us 69 quantized::relu6 1.25% 1.570ms 1.25% 1.570ms 22.749us 69 quantized::mul_scalar 10.73% 13.449ms 10.73% 13.449ms 194.914us 69 quantized::mul 16.67% 20.904ms 16.67% 20.904ms 227.220us 92 adaptive_avg_pool2d 0.03% 41.713us 0.69% 862.922us 35.955us 24 _adaptive_avg_pool2d 0.65% 821.209us 0.65% 821.209us 34.217us 24 sigmoid 0.15% 182.344us 0.15% 182.344us 7.928us 23 quantized::add 0.34% 431.939us 0.34% 431.939us 26.996us 16 dropout 0.00% 1.936us 0.00% 1.936us 1.936us 1 view 0.01% 10.281us 0.01% 10.281us 10.281us 1 dequantize 0.00% 4.562us 0.00% 4.562us 4.562us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 125.394ms ``` ### After ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.18% 130.534us 0.18% 130.534us 130.534us 1 quantized::conv2d 42.29% 31.267ms 42.29% 31.267ms 267.243us 117 quantized::add_scalar 6.27% 4.637ms 6.27% 4.637ms 67.205us 69 quantized::relu6 1.77% 1.312ms 1.77% 1.312ms 19.008us 69 quantized::mul_scalar 18.92% 13.991ms 18.92% 13.991ms 202.768us 69 quantized::mul 28.49% 21.059ms 28.49% 21.059ms 228.904us 92 adaptive_avg_pool2d 0.06% 45.242us 1.27% 942.522us 39.272us 24 _adaptive_avg_pool2d 1.21% 897.280us 1.21% 897.280us 37.387us 24 sigmoid 0.22% 160.282us 0.22% 160.282us 6.969us 23 quantized::add 0.56% 416.276us 0.56% 416.276us 26.017us 16 dropout 0.00% 1.245us 0.00% 1.245us 1.245us 1 view 0.01% 7.122us 0.01% 7.122us 7.122us 1 dequantize 0.01% 5.952us 0.01% 5.952us 5.952us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 73.930ms ``` ghstack-source-id: 100489126 Differential Revision: [D20500848](https://our.internmc.facebook.com/intern/diff/D20500848/)
Optimized path for qadd scalar. qadd_scalar time goes down from 55.840ms for a model to 4.637ms. ### Before ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.12% 155.807us 0.12% 155.807us 155.807us 1 quantized::conv2d 25.50% 31.981ms 25.50% 31.981ms 273.343us 117 quantized::add_scalar 44.53% 55.840ms 44.53% 55.840ms 809.281us 69 quantized::relu6 1.25% 1.570ms 1.25% 1.570ms 22.749us 69 quantized::mul_scalar 10.73% 13.449ms 10.73% 13.449ms 194.914us 69 quantized::mul 16.67% 20.904ms 16.67% 20.904ms 227.220us 92 adaptive_avg_pool2d 0.03% 41.713us 0.69% 862.922us 35.955us 24 _adaptive_avg_pool2d 0.65% 821.209us 0.65% 821.209us 34.217us 24 sigmoid 0.15% 182.344us 0.15% 182.344us 7.928us 23 quantized::add 0.34% 431.939us 0.34% 431.939us 26.996us 16 dropout 0.00% 1.936us 0.00% 1.936us 1.936us 1 view 0.01% 10.281us 0.01% 10.281us 10.281us 1 dequantize 0.00% 4.562us 0.00% 4.562us 4.562us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 125.394ms ``` ### After ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.18% 130.534us 0.18% 130.534us 130.534us 1 quantized::conv2d 42.29% 31.267ms 42.29% 31.267ms 267.243us 117 quantized::add_scalar 6.27% 4.637ms 6.27% 4.637ms 67.205us 69 quantized::relu6 1.77% 1.312ms 1.77% 1.312ms 19.008us 69 quantized::mul_scalar 18.92% 13.991ms 18.92% 13.991ms 202.768us 69 quantized::mul 28.49% 21.059ms 28.49% 21.059ms 228.904us 92 adaptive_avg_pool2d 0.06% 45.242us 1.27% 942.522us 39.272us 24 _adaptive_avg_pool2d 1.21% 897.280us 1.21% 897.280us 37.387us 24 sigmoid 0.22% 160.282us 0.22% 160.282us 6.969us 23 quantized::add 0.56% 416.276us 0.56% 416.276us 26.017us 16 dropout 0.00% 1.245us 0.00% 1.245us 1.245us 1 view 0.01% 7.122us 0.01% 7.122us 7.122us 1 dequantize 0.01% 5.952us 0.01% 5.952us 5.952us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 73.930ms ``` Differential Revision: [D20500848](https://our.internmc.facebook.com/intern/diff/D20500848/) [ghstack-poisoned]
Pull Request resolved: #34925 Optimized path for qadd scalar. qadd_scalar time goes down from 55.840ms for a model to 4.637ms. ### Before ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.12% 155.807us 0.12% 155.807us 155.807us 1 quantized::conv2d 25.50% 31.981ms 25.50% 31.981ms 273.343us 117 quantized::add_scalar 44.53% 55.840ms 44.53% 55.840ms 809.281us 69 quantized::relu6 1.25% 1.570ms 1.25% 1.570ms 22.749us 69 quantized::mul_scalar 10.73% 13.449ms 10.73% 13.449ms 194.914us 69 quantized::mul 16.67% 20.904ms 16.67% 20.904ms 227.220us 92 adaptive_avg_pool2d 0.03% 41.713us 0.69% 862.922us 35.955us 24 _adaptive_avg_pool2d 0.65% 821.209us 0.65% 821.209us 34.217us 24 sigmoid 0.15% 182.344us 0.15% 182.344us 7.928us 23 quantized::add 0.34% 431.939us 0.34% 431.939us 26.996us 16 dropout 0.00% 1.936us 0.00% 1.936us 1.936us 1 view 0.01% 10.281us 0.01% 10.281us 10.281us 1 dequantize 0.00% 4.562us 0.00% 4.562us 4.562us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 125.394ms ``` ### After ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.18% 130.534us 0.18% 130.534us 130.534us 1 quantized::conv2d 42.29% 31.267ms 42.29% 31.267ms 267.243us 117 quantized::add_scalar 6.27% 4.637ms 6.27% 4.637ms 67.205us 69 quantized::relu6 1.77% 1.312ms 1.77% 1.312ms 19.008us 69 quantized::mul_scalar 18.92% 13.991ms 18.92% 13.991ms 202.768us 69 quantized::mul 28.49% 21.059ms 28.49% 21.059ms 228.904us 92 adaptive_avg_pool2d 0.06% 45.242us 1.27% 942.522us 39.272us 24 _adaptive_avg_pool2d 1.21% 897.280us 1.21% 897.280us 37.387us 24 sigmoid 0.22% 160.282us 0.22% 160.282us 6.969us 23 quantized::add 0.56% 416.276us 0.56% 416.276us 26.017us 16 dropout 0.00% 1.245us 0.00% 1.245us 1.245us 1 view 0.01% 7.122us 0.01% 7.122us 7.122us 1 dequantize 0.01% 5.952us 0.01% 5.952us 5.952us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 73.930ms ``` ghstack-source-id: 100559730 Differential Revision: [D20500848](https://our.internmc.facebook.com/intern/diff/D20500848/)
Optimized path for qadd scalar. qadd_scalar time goes down from 55.840ms for a model to 4.637ms. ### Before ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.12% 155.807us 0.12% 155.807us 155.807us 1 quantized::conv2d 25.50% 31.981ms 25.50% 31.981ms 273.343us 117 quantized::add_scalar 44.53% 55.840ms 44.53% 55.840ms 809.281us 69 quantized::relu6 1.25% 1.570ms 1.25% 1.570ms 22.749us 69 quantized::mul_scalar 10.73% 13.449ms 10.73% 13.449ms 194.914us 69 quantized::mul 16.67% 20.904ms 16.67% 20.904ms 227.220us 92 adaptive_avg_pool2d 0.03% 41.713us 0.69% 862.922us 35.955us 24 _adaptive_avg_pool2d 0.65% 821.209us 0.65% 821.209us 34.217us 24 sigmoid 0.15% 182.344us 0.15% 182.344us 7.928us 23 quantized::add 0.34% 431.939us 0.34% 431.939us 26.996us 16 dropout 0.00% 1.936us 0.00% 1.936us 1.936us 1 view 0.01% 10.281us 0.01% 10.281us 10.281us 1 dequantize 0.00% 4.562us 0.00% 4.562us 4.562us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 125.394ms ``` ### After ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.18% 130.534us 0.18% 130.534us 130.534us 1 quantized::conv2d 42.29% 31.267ms 42.29% 31.267ms 267.243us 117 quantized::add_scalar 6.27% 4.637ms 6.27% 4.637ms 67.205us 69 quantized::relu6 1.77% 1.312ms 1.77% 1.312ms 19.008us 69 quantized::mul_scalar 18.92% 13.991ms 18.92% 13.991ms 202.768us 69 quantized::mul 28.49% 21.059ms 28.49% 21.059ms 228.904us 92 adaptive_avg_pool2d 0.06% 45.242us 1.27% 942.522us 39.272us 24 _adaptive_avg_pool2d 1.21% 897.280us 1.21% 897.280us 37.387us 24 sigmoid 0.22% 160.282us 0.22% 160.282us 6.969us 23 quantized::add 0.56% 416.276us 0.56% 416.276us 26.017us 16 dropout 0.00% 1.245us 0.00% 1.245us 1.245us 1 view 0.01% 7.122us 0.01% 7.122us 7.122us 1 dequantize 0.01% 5.952us 0.01% 5.952us 5.952us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 73.930ms ``` Differential Revision: [D20500848](https://our.internmc.facebook.com/intern/diff/D20500848/) [ghstack-poisoned]
Pull Request resolved: #34925 Optimized path for qadd scalar. qadd_scalar time goes down from 55.840ms for a model to 4.637ms. ### Before ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.12% 155.807us 0.12% 155.807us 155.807us 1 quantized::conv2d 25.50% 31.981ms 25.50% 31.981ms 273.343us 117 quantized::add_scalar 44.53% 55.840ms 44.53% 55.840ms 809.281us 69 quantized::relu6 1.25% 1.570ms 1.25% 1.570ms 22.749us 69 quantized::mul_scalar 10.73% 13.449ms 10.73% 13.449ms 194.914us 69 quantized::mul 16.67% 20.904ms 16.67% 20.904ms 227.220us 92 adaptive_avg_pool2d 0.03% 41.713us 0.69% 862.922us 35.955us 24 _adaptive_avg_pool2d 0.65% 821.209us 0.65% 821.209us 34.217us 24 sigmoid 0.15% 182.344us 0.15% 182.344us 7.928us 23 quantized::add 0.34% 431.939us 0.34% 431.939us 26.996us 16 dropout 0.00% 1.936us 0.00% 1.936us 1.936us 1 view 0.01% 10.281us 0.01% 10.281us 10.281us 1 dequantize 0.00% 4.562us 0.00% 4.562us 4.562us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 125.394ms ``` ### After ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.18% 130.534us 0.18% 130.534us 130.534us 1 quantized::conv2d 42.29% 31.267ms 42.29% 31.267ms 267.243us 117 quantized::add_scalar 6.27% 4.637ms 6.27% 4.637ms 67.205us 69 quantized::relu6 1.77% 1.312ms 1.77% 1.312ms 19.008us 69 quantized::mul_scalar 18.92% 13.991ms 18.92% 13.991ms 202.768us 69 quantized::mul 28.49% 21.059ms 28.49% 21.059ms 228.904us 92 adaptive_avg_pool2d 0.06% 45.242us 1.27% 942.522us 39.272us 24 _adaptive_avg_pool2d 1.21% 897.280us 1.21% 897.280us 37.387us 24 sigmoid 0.22% 160.282us 0.22% 160.282us 6.969us 23 quantized::add 0.56% 416.276us 0.56% 416.276us 26.017us 16 dropout 0.00% 1.245us 0.00% 1.245us 1.245us 1 view 0.01% 7.122us 0.01% 7.122us 7.122us 1 dequantize 0.01% 5.952us 0.01% 5.952us 5.952us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 73.930ms ``` ghstack-source-id: 100595212 Differential Revision: [D20500848](https://our.internmc.facebook.com/intern/diff/D20500848/)
This pull request has been merged in 506996c. |
Stack from ghstack:
Optimized path for qadd scalar. qadd_scalar time goes down from 55.840ms for a model to 4.637ms.
Before
After
Differential Revision: D20500848