New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quantized tensor: add support for advanced indexing #49129
Conversation
Summary: Implements support for the indexing of quantized tensors with lists of dims, such as ``` xq_slice = xq[:, [0], :, :] ``` At least a few things need to happen before this being ready for review: 1. Verify if the general idea of `TensorAdditionalOptions` is the right abstraction 2. Verify if `TensorAdditionalOptions` needs the same style as `TensorOptions` (no optionals, etc) Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Implements support for the indexing of quantized tensors with lists of dims, such as ``` xq_slice = xq[:, [0], :, :] ``` At least a few things need to happen before this being ready for review: 1. Verify if the general idea of `TensorAdditionalOptions` is the right abstraction 2. Verify if `TensorAdditionalOptions` needs the same style as `TensorOptions` (no optionals, etc) Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9242ec9ae63fdb54d07fe4e5d450515028cf86c2 Pull Request resolved: #49129
💊 CI failures summary and remediationsAs of commit a75af59 (more details on the Dr. CI page):
🕵️ 3 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages: pytorch_linux_xenial_py3_6_gcc5_4_build (1/3)Step: "Build" (full log | diagnosis details | 🔁 rerun)
|
Summary: Implements support for the indexing of quantized tensors with lists of dims, such as ``` xq_slice = xq[:, [0], :, :] ``` At least a few things need to happen before this being ready for review: 1. Verify if the general idea of `TensorAdditionalOptions` is the right abstraction 2. Verify if `TensorAdditionalOptions` needs the same style as `TensorOptions` (no optionals, etc) Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25451651](https://our.internmc.facebook.com/intern/diff/D25451651) [ghstack-poisoned]
Summary: Implements support for the indexing of quantized tensors with lists of dims, such as ``` xq_slice = xq[:, [0], :, :] ``` At least a few things need to happen before this being ready for review: 1. Verify if the general idea of `TensorAdditionalOptions` is the right abstraction 2. Verify if `TensorAdditionalOptions` needs the same style as `TensorOptions` (no optionals, etc) Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 10e51c59583e07ae5c42307c8879d6dd6a368422 Pull Request resolved: #49129
if (options.dtype() == at::kQInt8 || options.dtype() == at::kQUInt8 || | ||
options.dtype() == at::kQInt32) { | ||
// get the first input and copy its quantization parameters | ||
const auto& first_input = operands_[num_outputs_]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what if the quantization parameters on the two tensors are different? That seems like a totally reasonable situation, and then copying the quantization parameters from the first input seems like a totally arbitrary choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the correct quantization of the output tensor is something that varies from operator to operator, it may make sense to force kernels to do the computation of the quantization. There is some precedent for doing this in TensorIterator with the static_shape
parameter, where kernels can say "Hey, I know you've got some built-in shape computation, but I actually happen to know what the output shape is, please use this." I could easily imagine something similar for quantization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I don't think the current solution is super clean, I'm also not sure if the alternative would be better.
The two high level ways I saw to do this were:
- add a quantized path to the current fp32 indexing logic (current PR). This reuses the size calculations and the kernels, but has to add some metadata for quantized tensors in places where it previously did not exist.
- dispatch to something like
quantized_index
, which would have to reimplement the indexing logic (with some pieces potentially reusable from fp32, but likely not all)
Thoughts on any options I might be missing here? Does 1 sound like the right call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now, it's option 1, with the non-relevant parts asserted out, but I'm flexible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vkuzo The nuance here is that the code paths you are editing in TensorIterator aren't just for fp32 indexing: they are for every unary, binary and reduction operation in PyTorch. So if you want to add something here, there is an obligation to at least have some plausible argument why the logic here is appropriate for all other operations.
I don't mind option one, but this PR ain't it. An implementation that more closely follows the idea of (1) is to take the existing indexing kernel (not in TensorIterator) and program TensorIterator on what the intended quantization result is.
} else { | ||
// non-quantized path | ||
op.tensor = at::empty(sizes, options); | ||
} | ||
} else { | ||
op.tensor = at::empty_strided(sizes, strides, options); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below, there is logic for handling what happens in the out=
case. Imagine that you do indexing into a pre-existing quantized tensor. What should the quantization parameters be? What if the tensor is to be resized in this situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asserted it out for now
The change to MetaBase::set_output looks incomplete. Oddly, the CI is passing, this is probably because none of the quantized operations have been made structured yet. |
Summary: Implements support for the indexing of quantized tensors with lists of dims, such as ``` xq_slice = xq[:, [0], :, :] ``` If helpful for reviewers, the things originally broken were, in order: 1. `computeDeviceType` did not handle `DispatchKey::QuantizedCPU` (fix: added) 2. quantization params were not present in `TensorIterator::set_output` so they can be used to properly create the quantized tensor (fix: created `TensorQuantizationOptions` and threaded it through the relevant places) 3. `index` kernel was not enabled for quantized dtypes (fix: enable it) Note: this PR only handles per-Tensor qparams. We don't expect to need this for per-channel qparams any time soon, ideally we can pay the eng cost for enabling that when it is needed. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25451651](https://our.internmc.facebook.com/intern/diff/D25451651) [ghstack-poisoned]
to clarify, do you mean accounting for the changes in #48718? Yes, that was not in the original PR, but is there in the latest update. |
Summary: Implements support for the indexing of quantized tensors with lists of dims, such as ``` xq_slice = xq[:, [0], :, :] ``` If helpful for reviewers, the things originally broken were, in order: 1. `computeDeviceType` did not handle `DispatchKey::QuantizedCPU` (fix: added) 2. quantization params were not present in `TensorIterator::set_output` so they can be used to properly create the quantized tensor (fix: created `TensorQuantizationOptions` and threaded it through the relevant places) 3. `index` kernel was not enabled for quantized dtypes (fix: enable it) Note: this PR only handles per-Tensor qparams. We don't expect to need this for per-channel qparams any time soon, ideally we can pay the eng cost for enabling that when it is needed. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25451651](https://our.internmc.facebook.com/intern/diff/D25451651) [ghstack-poisoned]
Summary: Implements support for the indexing of quantized tensors with lists of dims, such as ``` xq_slice = xq[:, [0], :, :] ``` At least a few things need to happen before this being ready for review: 1. Verify if the general idea of `TensorAdditionalOptions` is the right abstraction 2. Verify if `TensorAdditionalOptions` needs the same style as `TensorOptions` (no optionals, etc) Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 0966cb15742d8e6084cc7d7755de14b0e40b7d76 Pull Request resolved: #49129
…nced indexing" Summary: Implements support for the indexing of quantized tensors with lists of dims, such as ``` xq_slice = xq[:, [0], :, :] ``` If helpful for reviewers, the things originally broken were, in order: 1. `computeDeviceType` did not handle `DispatchKey::QuantizedCPU` (fix: added) 2. quantization params were not present in `TensorIterator::set_output` so they can be used to properly create the quantized tensor (fix: created `TensorQuantizationOptions` and threaded it through the relevant places) 3. `index` kernel was not enabled for quantized dtypes (fix: enable it) Note: this PR only handles per-Tensor qparams. We don't expect to need this for per-channel qparams any time soon, ideally we can pay the eng cost for enabling that when it is needed. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25451651](https://our.internmc.facebook.com/intern/diff/D25451651) [ghstack-poisoned]
Summary: Implements support for the indexing of quantized tensors with lists of dims, such as ``` xq_slice = xq[:, [0], :, :] ``` At least a few things need to happen before this being ready for review: 1. Verify if the general idea of `TensorAdditionalOptions` is the right abstraction 2. Verify if `TensorAdditionalOptions` needs the same style as `TensorOptions` (no optionals, etc) Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: d4ed3defbd5be7729f5466f50c108438ef4fe77e Pull Request resolved: #49129
Summary: This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…for advanced indexing, try 2" Summary: This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25539365](https://our.internmc.facebook.com/intern/diff/D25539365) [ghstack-poisoned]
…dexing, try 2" Summary: This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25539365](https://our.internmc.facebook.com/intern/diff/D25539365) [ghstack-poisoned]
Summary: This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9540251c238e7eb1a61db4037b189493a6cca18d Pull Request resolved: #49346
…ed indexing, try 2" Summary: This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25539365](https://our.internmc.facebook.com/intern/diff/D25539365) [ghstack-poisoned]
Summary: This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c319b525c3d8640027b7c0796331a4682559e2ab Pull Request resolved: #49346
…exing, try 2" Summary: This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25539365](https://our.internmc.facebook.com/intern/diff/D25539365) [ghstack-poisoned]
…exing, try 2" Summary: This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25539365](https://our.internmc.facebook.com/intern/diff/D25539365) [ghstack-poisoned]
Summary: This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 7591b5f90438a20fc92b962887092834e220c14a Pull Request resolved: #49346
#49346) Summary: Pull Request resolved: #49346 This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25539365 fbshipit-source-id: 98485875aaaf5743e1a940e170258057691be4fa
thanks for the review! We went with a simpler and less performant approach of using the fp32 kernel for now in #49346, we might revisit supporting this natively in the future. |
pytorch#49346) Summary: Pull Request resolved: pytorch#49346 This is less ambitious redo of pytorch#49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25539365 fbshipit-source-id: 98485875aaaf5743e1a940e170258057691be4fa
Stack from ghstack:
Summary:
Implements support for the indexing of quantized tensors with lists of
dims, such as
If helpful for reviewers, the things originally broken were, in order:
computeDeviceType
did not handleDispatchKey::QuantizedCPU
(fix: added)TensorIterator::set_output
so they can be used to properly create the quantized tensor (fix: createdTensorQuantizationOptions
and threaded it through the relevant places)index
kernel was not enabled for quantized dtypes (fix: enable it)Note: this PR only handles per-Tensor qparams. We don't expect to need this for per-channel qparams any time soon, ideally we can pay the eng cost for enabling that when it is needed.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D25451651