Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] [Type Promotion] Add floating type promotion support for (some of the) Unary Floating UFuncs #33322

Closed
wants to merge 85 commits into from

Conversation

krshrimali
Copy link
Contributor

@krshrimali krshrimali commented Feb 13, 2020

This PR proposes to add type promotion (Int and Bool Type to Default Floating Type) to the Unary Ops which are appropriate for this kind of type promotion (called Unary Floating UFuncs here).

Some context: (Also see #28703 issue) NumPy follows type promotion for it's Unary Ufuncs as mentioned below:

Type - 1: int8 - float16, int16 - float32, int32 - float64, int64 - float64, bool - float16
Type - 2: (int8, int16, int32, int64, bool) - float64
Type - 3: bool - int8
Type - 4: bool - float16

  • Ops with no upcasting: abs, real, imag, bitwise_not, clip, neg
  • Ops with changes (Type 1): ceil, expm1, floor, log, log10, log1p, log2, sin, sinh, sqrt, trunc, atan, cos, tan
  • Ops with changes (Type 2): angle
  • Ops with changes (Type 3): conjugate, reciprocal, square
  • Ops with changes (Type 4): round

After offline and online discussions with @nairbv, @mruberry, @mcarilli, it was decided that we don't want to promote all the ops listed above (see why: #33322 (comment)). JAX's NumPy rules for type promotion of Unary UFuncs were also explored, and after consensus, we decided to promote Int & Bool Tensors to Default Floating Type Tensors for the following ops:

sin, cos, tan, asin, acos, atan, sinh, cosh, tanh, ceil, floor, exp, expm1, log, log10, log1p, log2, sqrt, rsqrt, trunc, erf, erfc, erfinv, lgamma, digamma, sigmoid

Note: This PR does not enable floating type promotion for in-place code paths for obvious reasons.

Comprehensive list of changes proposed in this PR:

  1. Add floating type promotion (see list of floating unary ufuncs and rules above) to appropriate Unary Universal Functions.
  2. Add a test class for Unary Floating UFuncs in test_torch.py file. This generates tests for all the unary floating ufuncs. The tests also cover complex types and float to complex type promotion for the appropriate ops.
  3. Add tests for Unary Floating UFuncs in test_pytorch_onnx_onnxruntime.py file.

cc: @mruberry, @nairbv, @mcarilli

@dr-ci
Copy link

dr-ci bot commented Feb 13, 2020

💊 CircleCI build failures summary and remediations

As of commit 313b668 (more details on the Dr. CI page):


  • 1/4 failures introduced in this PR

  • 3/4 broken upstream at merge base ada4077 on Mar 26 from 2:55pm to 4:31pm (3 commits; f7f7c4e - 3622e1c)

    Please rebase on the viable/strict branch (expand for instructions)

    Since your merge base is older than viable/strict, run these commands:

    git fetch https://github.com/pytorch/pytorch viable/strict
    git rebase FETCH_HEAD
    

    Check out the recency history of this "viable master" tracking branch.


🕵️ 1 new failure recognized by patterns

The following build failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (1/1)

Step: "Build" (full log | pattern match details) <confirmed not flaky by 2 failures>

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py 
Auto-merging .circleci/cimodel/data/pytorch_build_data.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/dimensions.py 
Auto-merging .circleci/cimodel/data/dimensions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/caffe2_build_definitions.py 
Auto-merging .circleci/cimodel/data/caffe2_build_definitions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/caffe2_build_data.py 
Auto-merging .circleci/cimodel/data/caffe2_build_data.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/binary_build_data.py 
Auto-merging .circleci/cimodel/data/binary_build_data.py 
Automatic merge failed; fix conflicts and then commit the result. 

🚧 3 upstream failures:

These were probably caused by upstream breakages:


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 198 times.

@ngimel ngimel added the module: numpy Related to numpy support, and also numpy compatibility of our operators label Feb 13, 2020
@krshrimali krshrimali changed the title [Type Promotion] Allow int to float dtype promotion for Unary Ops [WIP] [Do not review yet] [Type Promotion] Allow int to float dtype promotion for Unary Ops Feb 14, 2020
@krshrimali
Copy link
Contributor Author

krshrimali commented Feb 14, 2020

Note: I am working on splitting this for ops which need this promotion and which don't (looks like not all the Unary Ops should have this feature, like abs). Adding tests for the modifications, is also in the TODO list.

@krshrimali krshrimali changed the title [WIP] [Do not review yet] [Type Promotion] Allow int to float dtype promotion for Unary Ops [WIP] [Type Promotion] Allow int to float dtype promotion for Unary Ops Feb 14, 2020
Tensor result = at::empty({0}, self.options());
return out_impl(result, self);
// This enables int-to-float implicit dtype conversions
ScalarType promoted_dtype = promoteIntToFloats(self);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll need a way to differentiate which ops shouldn't promote int->float, e.g. torch.abs()

Copy link
Contributor Author

@krshrimali krshrimali Feb 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the checks I did on the numpy ops, there are 4 types of conversions on the Unary Ops:

  1. Type - 1: int8 - float16, int16 - float32, int32 - float64, int64 - float64, bool - float16
  2. Type - 2: (int8, int16, int32, int64, bool) - float64
  3. Type - 3: bool - int8
  4. Type - 4: bool - float16

Also, some ops are not defined in NumPy (like sigmoid, polygamma, mvlgamma). What do you suggest? Should we follow the type promotion similar to NumPy's strategy? Which, in case, will lead us to have conditions for the ops to go through a specific type (1/2/3/4) of dtype promotion.

In case you are interested to know which ops use what type of dtype promotion, here is a list of the ops (not all of them are listed):

  • Ops with no upcasting: abs, real, imag, bitwise_not, clip, neg
  • Ops with changes (Type 1): ceil, expm1, floor, log, log10, log1p, log2, sin, sinh, sqrt, trunc, atan, cos, tan
  • Ops with changes (Type 2): angle
  • Ops with changes (Type 3): conjugate, reciprocal, square
  • Ops with changes (Type 4): round

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gchanan do you think we'd be willing to deviate from NumPy on some of these UnaryOp promotions?

  1. (not promoting) these seem reasonable.
  2. Seems reasonable.
  3. This case for one operator (angle) is expecting a complex argument. The angle of any real argument is 0. These ops currently handle (most) integral types without promoting, and I'm not convinced we should change that behavior as part of this ticket.
  4. bool -> int8 for conjugate/reciprocal/square seem a bit weird to me. E.g. I'd expect t.square() to be the same as t * t. It's also possible we'd want to skip boolean tensors for these operators, since ops like bool_tensor.reciprocal() would crash if they contain any False values.
  5. round This inconsistent behavior seems odd since none of the other integral types promote for that op. I'd be tempted to deviate from np on this one, or just not implement for bool.

@krshrimali krshrimali changed the title [WIP] [Type Promotion] Allow int to float dtype promotion for Unary Ops [WIP] [Type Promotion] Allow Unary Ops dtype promotion to match NumPy Feb 18, 2020
@krshrimali
Copy link
Contributor Author

The latest commit is an example code for allowing dtype promotion matching to NumPy on sample functions. Here are the results on sample ops:

>>> x = torch.tensor(2, device="cuda")
>>> y = torch.tensor(False, device="cuda") 

>>> torch.abs(x).dtype # No change
torch.int64

>>> torch.ceil(x).dtype # Type 1 (int64 to float64)
torch.float64
>>> torch.angle(x).dtype # Type 2 (int64 to float64)
torch.float64
>>> torch.conj(y).dtype # Type 3 (bool to int8)
torch.int8
>>> torch.round(y).dtype # Type 4 (bool to fp16 if cuda, else fp32)
torch.float16

I welcome inputs and suggestions on this prototype. @nairbv @mcarilli

@krshrimali
Copy link
Contributor Author

I've added sample test functions for an op of each type/category, in test_torch.py. Adding for each op will have a lot of code repetition, and to avoid that, we can have a test function per each category run for all the respective ops. I'm figuring out the best way to do that, will appreciate any inputs.

test/test_torch.py Outdated Show resolved Hide resolved
dtype = (self.device().type() == DeviceType::CPU) ? kFloat : kHalf;
break;
case kShort:
dtype = kFloat;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be simpler to just put the return statements in each case here instead of saving and breaking to return later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This should be resolved here: 2203eb9

1. enum TypePromotionStrategy had TypeN format. This has changed to more descriptive names (IntBoolToFloats, IntBoolToFloat64, BoolToInt8, BoolToFloat16) as suggested by @nairbv
2. Functions do not need to store dtypes and then break in switch-case statements, they can directly return dtypes as suggested by @nairbv
3. Extra lines removed, they just take more space. As suggested by @mcarilli
This commit:

1. Adds more descriptive names in the Enum for TypePromotionStrategy (earlier TypeN format used), as suggested by @nairbv
2. Returns dtypes directly from switch case (earlier: store, break and return strategy used), as suggested by @nairbv
3. Removes extra lines, as suggested by @mcarilli

Merge branch 'temp-unary-ops' into int-to-float-unary-ops
@krshrimali krshrimali changed the title [Type Promotion] Add floating type promotion support for (some of the) Unary Floating UFuncs [WIP] [Type Promotion] Add floating type promotion support for (some of the) Unary Floating UFuncs Mar 21, 2020
This commit also removes the unary floating ops from simple_unary_ops, 
adds those in new list unary_floating_ufuncs.
if (isIntegralType(self.scalar_type(), /*includeBool=*/ true)) {
const auto scalar_type = typeMetaToScalarType(c10::get_default_dtype());
Tensor result = at::empty({0}, self.options().dtype(scalar_type));
return out_impl(result, self.to(scalar_type));
Copy link
Collaborator

@mcarilli mcarilli Mar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of the loop for a bit fighting with CI for autocasting... @mruberry git blame (or credit!) says it's your work (from a merged branch) to split unary...impl into two versions. The other version (unary_floating_ufunc_op_impl), for integer tensor input, creates result as the default type and sends self and result along to TensorIterator as they are. This version precasts self to match the output type, so TensorIterator doesn't need to handle mismatched types internally.

Splitting those two versions is a good way to deal with the fact that some backend functions may handle mismatched self+result types and some may not. Afaict IMPLEMENT_UNARY_FLOATING_UFUNC_OP_VEC macros are the only place unary_floating_ufunc_cast_op_impl is used, ie, the ops handled by those macros are the ones you decided should receive precasting.

I'd like to make sure I understand how you and @krshrimali decided which ops receive precasting. The ops handled by IMPLEMENT_UNARY_FLOATING_UFUNC_OP_VEC match the ops from this (possibly outdated) comment, with the addition of erfinv and lgamma. If i'm reading the comment correctly, ops listed there don't support internal int->default_type promotion when called on CUDA tensors, which is a reasonable criterion to route them through unary_floating_ufunc_cast_op_impl rather than unary_floating_ufunc_op_impl.

However (if im reading it correctly) those same ops do support internal promote when called with CPU inputs. Right now the _impl each op calls is based only on the op itself. Since internal promote support depends on both the op and self's device, do you think it makes sense to have unary_floating_ufunc_cast_op_impl choose not to precast if is_cuda() is false?

side note: unary_floating_ufunc_cast_op_impl only precasts integer inputs, in other words, floating point inputs never precast no matter which _impl they route through, and always rely on internal TensorIterator promotion. That's great for GPU performance, and will be great for performance with the hoped-for output dtype=... argument as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to what you suggest here @mcarilli. For the functions using the helper macros, adding a simple if-else condition works fine with all the tests:

#define IMPLEMENT_UNARY_FLOATING_UFUNC_OP_CORE(op)                     \
  Tensor op(const Tensor& self) {                                      \
    if (!self.is_cuda) \
      return unary_floating_ufunc_op_impl(self, at::op##_out); \
    else \
      return unary_floating_ufunc_cast_op_impl(self, at::op##_out);      \
  }

@mruberry - What are your views?

Tensor& expm1_(Tensor& self) { return unary_op_impl_(self, at::expm1_out); }
Tensor& expm1_out(Tensor& result, const Tensor& self) { return unary_floating_ufunc_op_impl_out(result, self, expm1_stub); }
Tensor expm1(const Tensor& self) { return unary_floating_ufunc_op_impl(self, at::expm1_out); }
Tensor& expm1_(Tensor& self) { return unary_floating_ufunc_op_impl_(self, at::expm1_out); }

Tensor& frac_out(Tensor& result, const Tensor& self) { return unary_op_impl_out(result, self, frac_stub); }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frac is a composite that can alternatively be implemented as

(self - torch.floor(torch.abs(self)) * torch.sign(self))

The call to floor should promote to floating. Seems like frac should be a unary floating ufunc, too.

Copy link
Contributor Author

@krshrimali krshrimali Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added frac as unary floating ufunc (and in the tests). It does not support complex types on both CPUs and CUDAs.

It's not there in the ONNX operator lists (https://github.com/onnx/onnx/blob/master/docs/Operators.md) - so no changes needed there. Also added in the JIT shape_analysis.cpp file as unary floating ufunc.

Tensor result = at::empty({0}, self.options()); \
at::op##_out(result, self); \
return result; \
return unary_floating_ufunc_cast_op_impl(self, at::op##_out); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be updated to cast if not CPU. The query should be if (CPU) ... else ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! 71e23c6

{
"aten::acos(Tensor self) -> Tensor",
"aten::angle(Tensor self) -> Tensor",
"aten::tanh(Tensor self) -> Tensor",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tanh appears twice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my bad. Fixed here: 71e23c6


# Test for out= code paths
for in_type in my_float_types + int_types:
if self.device_type not in unary_ufuncs[op_str].complex_on:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this break premature? That is, isn't this saying if you don't support complex we should test the out= code path at all?

@krshrimali
Copy link
Contributor Author

Closing this since this was taken up in another PR. Thank you @mcarilli @mruberry for being patient and helping me with this PR. Really appreciate it, apologies that it couldn't make it to the upstream.

@krshrimali krshrimali closed this Jun 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed module: numpy Related to numpy support, and also numpy compatibility of our operators module: type promotion Related to semantics of type promotion open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants