New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add division overload with rounding_mode selection #50280
Conversation
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 71b0cfe (more details on the Dr. CI page):
🕵️ 3 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages: pytorch_xla_linux_bionic_py3_6_clang9_build (1/3)Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)
|
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. [ghstack-poisoned]
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. [ghstack-poisoned]
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. [ghstack-poisoned]
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. [ghstack-poisoned]
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. [ghstack-poisoned]
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. [ghstack-poisoned]
@@ -1129,6 +1129,12 @@ TORCH_LIBRARY_IMPL(aten, Batched, m) { | |||
BINARY_POINTWISE_VA(rsub, Scalar); | |||
BINARY_POINTWISE(mul); | |||
BINARY_POINTWISE(div); | |||
{ | |||
using Binop = Tensor (*)(const Tensor&, const Tensor&, std::string); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rzou would you take a look here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was, thanks @peterbell10. Darn autocomplete!
cc @zou3519
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this lgtm!
} else if (isIntegralType(dtype, /*includeBool*/ false)) { | ||
// There's no SIMD integer division, so don't try to vectorize it. | ||
// TODO: if the divisor is a scalar, rewrite as multiplication by a constant. | ||
AT_DISPATCH_INTEGRAL_TYPES(iter.common_dtype(), "div_floor_cpu", [&]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inconsistent between using dtype
and iter.common_dtype()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed all uses of iter.dtype()
. If instead you meant the variable dtype
, then I would note that it's assigned from iter.common_dtype()
above. Just a bit less to type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize the value is the same, just for readability the code might want to stick to either dtype
or iter.common_dtype()
. No big deal either way.
}); | ||
}); | ||
} else { | ||
AT_DISPATCH_FLOATING_TYPES_AND2(kBFloat16, kHalf, iter.common_dtype(), "div_floor_cpu", [&]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same dtype
vs iter.common_dtype
here, too.
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Differential Revision: [D26123271](https://our.internmc.facebook.com/intern/diff/D26123271) [ghstack-poisoned]
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Differential Revision: [D26123271](https://our.internmc.facebook.com/intern/diff/D26123271) [ghstack-poisoned]
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Differential Revision: [D26123271](https://our.internmc.facebook.com/intern/diff/D26123271) [ghstack-poisoned]
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Differential Revision: [D26123271](https://our.internmc.facebook.com/intern/diff/D26123271) [ghstack-poisoned]
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Differential Revision: [D26123271](https://our.internmc.facebook.com/intern/diff/D26123271) [ghstack-poisoned]
Hi @mruberry, I think I can get pt/xla pr ready this week. I will ping you when that is ready. |
@mruberry PTAL when you can. Have addressed your comments and rebased. |
torch/_torch_docs.py
Outdated
* ``"true"`` - default behavior. Performs no rounding and, if both :attr:`input` and | ||
:attr:`other` are integer types, promotes the inputs to the default scalar type. | ||
Equivalent to true division in Python (the ``/`` operator) and NumPy's ``np.true_divide``. | ||
* ``"trunc"`` - rounds the results of the division down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description for trunc and floor are identical, "rounds the results of the division down".
For trunc I think we can say rounds towards zero?
@@ -641,8 +641,10 @@ def freeze_rng_state(): | |||
def set_default_dtype(dtype): | |||
saved_dtype = torch.get_default_dtype() | |||
torch.set_default_dtype(dtype) | |||
yield | |||
torch.set_default_dtype(saved_dtype) | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One doc nit, otherwise looks awesome.
@JackCaoG let us know when this is safe to merge.
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Differential Revision: [D26123271](https://our.internmc.facebook.com/intern/diff/D26123271) [ghstack-poisoned]
As mentioned in pytorchgh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. ghstack-source-id: 372c4beabf4db06072fd3608947fc50c9c1718bd Pull Request resolved: pytorch#50280
One of the test failures is real: test_div_rounding_numpy_cuda_bfloat16. We can skip the test for simplicity to unblock this landing. @JackCaoG's PR is ready to go so I'd like to land this during PST business hours on Tuesday, February 2nd. |
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Differential Revision: [D26123271](https://our.internmc.facebook.com/intern/diff/D26123271) [ghstack-poisoned]
In hindsight, the BFloat16 comparison with random test data is unlikely to work perfectly. It's sensitive to exact rounding and since NumPy is rounding to float32 precision, it will occasionally get different answers. |
We can disable the test for now. In the future we could use a fixture to ensure there are no rounding issues. |
} | ||
return floordiv; | ||
}, | ||
[](Vec256<scalar_t> a, Vec256<scalar_t> b) -> Vec256<scalar_t>{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is triggering some internal build issues. Adding a vectorized function can be a little tricky because we often have to stub them out on some platforms, like Android.
Since we're so close to the branch cut, I propose removing the copysign implementation and this vectorized implementation. We can file an issue and add them back in a later PR where we can take our time and focus on that issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mruberry this should be good now. Removed all Vec256 changes and unvectorized floor_divide
.
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Differential Revision: [D26123271](https://our.internmc.facebook.com/intern/diff/D26123271) [ghstack-poisoned]
FYI there's another internal issue that I'm reviewing now. I'll keep this updated. |
Update: hacking through internal infra issues still. |
Update: blocking infra team confirms it's fixed its issue. This should land today. |
Landed. Some changes had to be made internally:
|
Stack from ghstack:
As mentioned in gh-43874, this adds a
rounding_mode={'true', 'trunc', 'floor'}
argument so
torch.div
can be used as a replacement forfloor_divide
duringthe transitional period.
I've included dedicated kernels for truncated and floor division which
aren't strictly necessary for float, but do perform significantly better (~2x) than
doing true division followed by a separate rounding kernel.
Note: I introduce new overloads for
aten::div
instead of just adding a defaultrounding_mode
because various JIT passes rely on the exact operator schema.Differential Revision: D26123271