-
Notifications
You must be signed in to change notification settings - Fork 22.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] [Type Promotion] Add floating type promotion support for (some of the) Unary Floating UFuncs #33322
Conversation
💊 CircleCI build failures summary and remediationsAs of commit 313b668 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following build failures do not appear to be due to upstream breakages: pytorch_linux_xenial_py3_6_gcc5_4_build (1/1)Step: "Build" (full log | pattern match details) <confirmed not flaky by 2 failures>
|
Note: I am working on splitting this for ops which need this promotion and which don't (looks like not all the Unary Ops should have this feature, like |
aten/src/ATen/native/UnaryOps.cpp
Outdated
Tensor result = at::empty({0}, self.options()); | ||
return out_impl(result, self); | ||
// This enables int-to-float implicit dtype conversions | ||
ScalarType promoted_dtype = promoteIntToFloats(self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll need a way to differentiate which ops shouldn't promote int->float, e.g. torch.abs()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the checks I did on the numpy ops, there are 4 types of conversions on the Unary Ops:
- Type - 1: int8 - float16, int16 - float32, int32 - float64, int64 - float64, bool - float16
- Type - 2: (int8, int16, int32, int64, bool) - float64
- Type - 3: bool - int8
- Type - 4: bool - float16
Also, some ops are not defined in NumPy (like sigmoid
, polygamma
, mvlgamma
). What do you suggest? Should we follow the type promotion similar to NumPy's strategy? Which, in case, will lead us to have conditions for the ops to go through a specific type (1/2/3/4) of dtype promotion.
In case you are interested to know which ops use what type of dtype promotion, here is a list of the ops (not all of them are listed):
- Ops with no upcasting:
abs
,real
,imag
,bitwise_not
,clip
,neg
- Ops with changes (Type 1):
ceil
,expm1
,floor
,log
,log10
,log1p
,log2
,sin
,sinh
,sqrt
,trunc
,atan
,cos
,tan
- Ops with changes (Type 2):
angle
- Ops with changes (Type 3):
conjugate
,reciprocal
,square
- Ops with changes (Type 4):
round
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gchanan do you think we'd be willing to deviate from NumPy on some of these UnaryOp promotions?
- (not promoting) these seem reasonable.
- Seems reasonable.
- This case for one operator (angle) is expecting a complex argument. The angle of any real argument is 0. These ops currently handle (most) integral types without promoting, and I'm not convinced we should change that behavior as part of this ticket.
bool -> int8
for conjugate/reciprocal/square seem a bit weird to me. E.g. I'd expectt.square()
to be the same ast * t
. It's also possible we'd want to skip boolean tensors for these operators, since ops like bool_tensor.reciprocal() would crash if they contain any False values.round
This inconsistent behavior seems odd since none of the other integral types promote for that op. I'd be tempted to deviate from np on this one, or just not implement for bool.
The latest commit is an example code for allowing dtype promotion matching to NumPy on sample functions. Here are the results on sample ops:
I welcome inputs and suggestions on this prototype. @nairbv @mcarilli |
I've added sample test functions for an op of each type/category, in |
aten/src/ATen/native/UnaryOps.cpp
Outdated
dtype = (self.device().type() == DeviceType::CPU) ? kFloat : kHalf; | ||
break; | ||
case kShort: | ||
dtype = kFloat; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be simpler to just put the return statements in each case here instead of saving and breaking to return later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This should be resolved here: 2203eb9
1. enum TypePromotionStrategy had TypeN format. This has changed to more descriptive names (IntBoolToFloats, IntBoolToFloat64, BoolToInt8, BoolToFloat16) as suggested by @nairbv 2. Functions do not need to store dtypes and then break in switch-case statements, they can directly return dtypes as suggested by @nairbv 3. Extra lines removed, they just take more space. As suggested by @mcarilli
This commit: 1. Adds more descriptive names in the Enum for TypePromotionStrategy (earlier TypeN format used), as suggested by @nairbv 2. Returns dtypes directly from switch case (earlier: store, break and return strategy used), as suggested by @nairbv 3. Removes extra lines, as suggested by @mcarilli Merge branch 'temp-unary-ops' into int-to-float-unary-ops
This commit also removes the unary floating ops from simple_unary_ops, adds those in new list unary_floating_ufuncs.
if (isIntegralType(self.scalar_type(), /*includeBool=*/ true)) { | ||
const auto scalar_type = typeMetaToScalarType(c10::get_default_dtype()); | ||
Tensor result = at::empty({0}, self.options().dtype(scalar_type)); | ||
return out_impl(result, self.to(scalar_type)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of the loop for a bit fighting with CI for autocasting... @mruberry git blame (or credit!) says it's your work (from a merged branch) to split unary...impl
into two versions. The other version (unary_floating_ufunc_op_impl
), for integer tensor input, creates result as the default type and sends self and result along to TensorIterator as they are. This version precasts self to match the output type, so TensorIterator doesn't need to handle mismatched types internally.
Splitting those two versions is a good way to deal with the fact that some backend functions may handle mismatched self+result types and some may not. Afaict IMPLEMENT_UNARY_FLOATING_UFUNC_OP_VEC
macros are the only place unary_floating_ufunc_cast_op_impl
is used, ie, the ops handled by those macros are the ones you decided should receive precasting.
I'd like to make sure I understand how you and @krshrimali decided which ops receive precasting. The ops handled by IMPLEMENT_UNARY_FLOATING_UFUNC_OP_VEC
match the ops from this (possibly outdated) comment, with the addition of erfinv
and lgamma
. If i'm reading the comment correctly, ops listed there don't support internal int->default_type promotion when called on CUDA tensors, which is a reasonable criterion to route them through unary_floating_ufunc_cast_op_impl
rather than unary_floating_ufunc_op_impl
.
However (if im reading it correctly) those same ops do support internal promote when called with CPU inputs. Right now the _impl
each op calls is based only on the op itself. Since internal promote support depends on both the op and self
's device, do you think it makes sense to have unary_floating_ufunc_cast_op_impl
choose not to precast if is_cuda()
is false?
side note: unary_floating_ufunc_cast_op_impl
only precasts integer inputs, in other words, floating point inputs never precast no matter which _impl
they route through, and always rely on internal TensorIterator promotion. That's great for GPU performance, and will be great for performance with the hoped-for output dtype=...
argument as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree to what you suggest here @mcarilli. For the functions using the helper macros, adding a simple if-else condition works fine with all the tests:
#define IMPLEMENT_UNARY_FLOATING_UFUNC_OP_CORE(op) \
Tensor op(const Tensor& self) { \
if (!self.is_cuda) \
return unary_floating_ufunc_op_impl(self, at::op##_out); \
else \
return unary_floating_ufunc_cast_op_impl(self, at::op##_out); \
}
@mruberry - What are your views?
aten/src/ATen/native/UnaryOps.cpp
Outdated
Tensor& expm1_(Tensor& self) { return unary_op_impl_(self, at::expm1_out); } | ||
Tensor& expm1_out(Tensor& result, const Tensor& self) { return unary_floating_ufunc_op_impl_out(result, self, expm1_stub); } | ||
Tensor expm1(const Tensor& self) { return unary_floating_ufunc_op_impl(self, at::expm1_out); } | ||
Tensor& expm1_(Tensor& self) { return unary_floating_ufunc_op_impl_(self, at::expm1_out); } | ||
|
||
Tensor& frac_out(Tensor& result, const Tensor& self) { return unary_op_impl_out(result, self, frac_stub); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frac is a composite that can alternatively be implemented as
(self - torch.floor(torch.abs(self)) * torch.sign(self))
The call to floor should promote to floating. Seems like frac should be a unary floating ufunc, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added frac as unary floating ufunc (and in the tests). It does not support complex types on both CPUs and CUDAs.
It's not there in the ONNX operator lists (https://github.com/onnx/onnx/blob/master/docs/Operators.md) - so no changes needed there. Also added in the JIT shape_analysis.cpp
file as unary floating ufunc.
aten/src/ATen/native/UnaryOps.cpp
Outdated
Tensor result = at::empty({0}, self.options()); \ | ||
at::op##_out(result, self); \ | ||
return result; \ | ||
return unary_floating_ufunc_cast_op_impl(self, at::op##_out); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs to be updated to cast if not CPU. The query should be if (CPU) ... else ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! 71e23c6
{ | ||
"aten::acos(Tensor self) -> Tensor", | ||
"aten::angle(Tensor self) -> Tensor", | ||
"aten::tanh(Tensor self) -> Tensor", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tanh appears twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my bad. Fixed here: 71e23c6
|
||
# Test for out= code paths | ||
for in_type in my_float_types + int_types: | ||
if self.device_type not in unary_ufuncs[op_str].complex_on: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this break premature? That is, isn't this saying if you don't support complex we should test the out= code path at all?
This PR proposes to add type promotion (Int and Bool Type to Default Floating Type) to the Unary Ops which are appropriate for this kind of type promotion (called
Unary Floating UFuncs
here).Some context: (Also see #28703 issue) NumPy follows type promotion for it's Unary Ufuncs as mentioned below:
Type - 1: int8 - float16, int16 - float32, int32 - float64, int64 - float64, bool - float16
Type - 2: (int8, int16, int32, int64, bool) - float64
Type - 3: bool - int8
Type - 4: bool - float16
After offline and online discussions with @nairbv, @mruberry, @mcarilli, it was decided that we don't want to promote all the ops listed above (see why: #33322 (comment)). JAX's NumPy rules for type promotion of Unary UFuncs were also explored, and after consensus, we decided to promote Int & Bool Tensors to Default Floating Type Tensors for the following ops:
Note: This PR does not enable floating type promotion for in-place code paths for obvious reasons.
Comprehensive list of changes proposed in this PR:
test_torch.py
file. This generates tests for all the unary floating ufuncs. The tests also cover complex types and float to complex type promotion for the appropriate ops.test_pytorch_onnx_onnxruntime.py
file.cc: @mruberry, @nairbv, @mcarilli