Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[te] Fix bugs with shift operators #49271

Closed
wants to merge 6 commits into from

Conversation

bertmaher
Copy link
Contributor

@bertmaher bertmaher commented Dec 12, 2020

Stack from ghstack:

Two things:

  1. These throw exceptions in their constructor, which causes a segfault (*), so
    move the exceptions to ::make.
  2. They technically support FP types but the rules are complicated so let's not
    bother.

(*) The reason for the segfault: all Exprs including these inherit from
KernelScopedObject, whose constructor adds the object to a list for destruction
at the end of the containing KernelArena's lifetime. But if the derived-class
constructor throws, the object is deleted even though it's still in the
KernelArena's list. So when the KernelArena is itself deleted, it double-frees
the pointer and dies. I've also fixed And, Or, and Xor in this diff.

Differential Revision: D25512052

Two things:

1. These throw exceptions in their constructor, which causes a segfault (*), so
   move the exceptions to ::make.
2. They technically support FP types but the rules are complicated so let's not
   bother.

(*) The reason for the segfault: all Exprs including these inherit from
KernelScopedObject, whose constructor adds the object to a list for destruction
at the end of the containing KernelArena's lifetime.  But if the derived-class
constructor throws, the object is deleted even though it's still in the
KernelArena's list.  So when the KernelArena is itself deleted, it double-frees
the pointer and dies.  I've also fixed And, Or, and Xor in this diff.

Differential Revision: [D25512052](https://our.internmc.facebook.com/intern/diff/D25512052/)

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added cla signed oncall: jit Add this issue/PR to JIT oncall triage queue labels Dec 12, 2020
bertmaher added a commit that referenced this pull request Dec 12, 2020
Two things:

1. These throw exceptions in their constructor, which causes a segfault (*), so
   move the exceptions to ::make.
2. They technically support FP types but the rules are complicated so let's not
   bother.

(*) The reason for the segfault: all Exprs including these inherit from
KernelScopedObject, whose constructor adds the object to a list for destruction
at the end of the containing KernelArena's lifetime.  But if the derived-class
constructor throws, the object is deleted even though it's still in the
KernelArena's list.  So when the KernelArena is itself deleted, it double-frees
the pointer and dies.  I've also fixed And, Or, and Xor in this diff.

Differential Revision: [D25512052](https://our.internmc.facebook.com/intern/diff/D25512052/)

ghstack-source-id: 118453866
Pull Request resolved: #49271
@bertmaher bertmaher requested review from Krovatkin, eellison, ZolotukhinM, penguinwu and zheng-xq and removed request for Krovatkin December 12, 2020 00:21
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Dec 12, 2020

💊 CI failures summary and remediations

As of commit a7bd0b6 (more details on the Dr. CI page):


  • 4/4 failures possibly* introduced in this PR
    • 1/4 non-CircleCI failure(s)

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (1/3)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at a7bd0b6357 fix interp on "[te] Fix bugs with shift operators"
+ git reset --hard a7bd0b6357ee1ea8eaf434d227b622309380a6bf
HEAD is now at a7bd0b6357 fix interp on "[te] Fix bugs with shift operators"
+ git merge --allow-unrelated-histories --no-edit --no-ff 25bc90628109699ebd72a5a4bf68750f98f9251c
Auto-merging torch/csrc/jit/passes/tensorexpr_fuser.cpp
CONFLICT (content): Merge conflict in torch/csrc/jit/passes/tensorexpr_fuser.cpp
Auto-merging test/test_jit_fuser_te.py
CONFLICT (content): Merge conflict in test/test_jit_fuser_te.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

See CircleCI build pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc5_4_build (2/3)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at a7bd0b6357 fix interp on "[te] Fix bugs with shift operators"
+ git reset --hard a7bd0b6357ee1ea8eaf434d227b622309380a6bf
HEAD is now at a7bd0b6357 fix interp on "[te] Fix bugs with shift operators"
+ git merge --allow-unrelated-histories --no-edit --no-ff 25bc90628109699ebd72a5a4bf68750f98f9251c
Auto-merging torch/csrc/jit/passes/tensorexpr_fuser.cpp
CONFLICT (content): Merge conflict in torch/csrc/jit/passes/tensorexpr_fuser.cpp
Auto-merging test/test_jit_fuser_te.py
CONFLICT (content): Merge conflict in test/test_jit_fuser_te.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (3/3)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at a7bd0b6357 fix interp on "[te] Fix bugs with shift operators"
+ git reset --hard a7bd0b6357ee1ea8eaf434d227b622309380a6bf
HEAD is now at a7bd0b6357 fix interp on "[te] Fix bugs with shift operators"
+ git merge --allow-unrelated-histories --no-edit --no-ff 25bc90628109699ebd72a5a4bf68750f98f9251c
Auto-merging torch/csrc/jit/passes/tensorexpr_fuser.cpp
CONFLICT (content): Merge conflict in torch/csrc/jit/passes/tensorexpr_fuser.cpp
Auto-merging test/test_jit_fuser_te.py
CONFLICT (content): Merge conflict in test/test_jit_fuser_te.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 35 times.

Copy link
Contributor

@Krovatkin Krovatkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

if (lhs->dtype() != rhs->dtype()) {
throw malformed_input("bad dtype in And");
if (lhs.dtype() != rhs.dtype()) {
throw malformed_input("lhs/rhs dtype mismatch");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity: these throws used to be in a c-tor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, throwing in an Expr ctor is bad news because of the way KernelScopedObject and KernelArena work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

if (lhs->dtype() != rhs->dtype()) {
throw malformed_input("bad dtype in And");
if (lhs.dtype() != rhs.dtype()) {
throw malformed_input("lhs/rhs dtype mismatch");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to double check, we can't do `(uint64_t)42 << (uint8_t)5 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the TE dsl requires operand types to match, and when coming from pytorch we enforce that at the kernel translation layer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Two things:

1. These throw exceptions in their constructor, which causes a segfault (*), so
   move the exceptions to ::make.
2. They technically support FP types but the rules are complicated so let's not
   bother.

(*) The reason for the segfault: all Exprs including these inherit from
KernelScopedObject, whose constructor adds the object to a list for destruction
at the end of the containing KernelArena's lifetime.  But if the derived-class
constructor throws, the object is deleted even though it's still in the
KernelArena's list.  So when the KernelArena is itself deleted, it double-frees
the pointer and dies.  I've also fixed And, Or, and Xor in this diff.

Differential Revision: [D25512052](https://our.internmc.facebook.com/intern/diff/D25512052/)

[ghstack-poisoned]
bertmaher added a commit that referenced this pull request Dec 12, 2020
Pull Request resolved: #49271

Two things:

1. These throw exceptions in their constructor, which causes a segfault (*), so
   move the exceptions to ::make.
2. They technically support FP types but the rules are complicated so let's not
   bother.

(*) The reason for the segfault: all Exprs including these inherit from
KernelScopedObject, whose constructor adds the object to a list for destruction
at the end of the containing KernelArena's lifetime.  But if the derived-class
constructor throws, the object is deleted even though it's still in the
KernelArena's list.  So when the KernelArena is itself deleted, it double-frees
the pointer and dies.  I've also fixed And, Or, and Xor in this diff.
ghstack-source-id: 118467480

Differential Revision: [D25512052](https://our.internmc.facebook.com/intern/diff/D25512052/)
throw malformed_input("bad dtype in Lshift");
}
}
: BitwiseOpNode(lhs, rhs, IRNodeType::kLshift) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where can I find the definition of whatever operation that maps to kLshift? Intuitively I thought shift operators would take a tensor as 1st operand and an integer as 2nd operand (e.g., tensorB = tensorA << 1). Or does it really support sth like tensorB = tensorA << tensorC?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

native_functions.yaml is a good source - it lists all (or majority?) of the ops that we have in pytorch.
Another useful list is what ops we accept from the fuser: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/passes/tensorexpr_fuser.cpp#L137-L140
And one more useful place is what we know how to lower to TE:
https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/kernel.cpp#L879

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ZolotukhinM lshift is defined in native_functions.yaml. And it supports the 2nd operand to be of tensor type or scalar type.

- func: __ilshift__.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
use_c10_dispatcher: full
variants: method
dispatch:
CPU, CUDA: __ilshift__
- func: __ilshift__.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
use_c10_dispatcher: full
variants: method
dispatch:
CPU, CUDA: __ilshift__

In the constructor of kLshift, we assume both operands to be tensor types. What happens if the 2nd operand is a scalar? Is it promoted to tensor types somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Lshift constructor actually just takes Exprs, which can either be a scalar or a tensor element. For the scalar case we'd generate something like lhs(i) << scalar, for the tensor case we'd have lhs(i) << rhs(i). (where i is standing in for whatever indexing expression we have for the tensor).

In building the TE from a jit::Graph (kernel.cpp) we have a lookup function called tensorOrConstant that will lookup a jit::Value in a map and return the appropriate Expr, whether it's a tensor or a constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, also, I don't think we handle non-constant scalars in the TE codegen. I don't remember why but it wouldn't surprise me if there are a ton of corner cases since scalars are coming from the python type system.

Two things:

1. These throw exceptions in their constructor, which causes a segfault (*), so
   move the exceptions to ::make.
2. They technically support FP types but the rules are complicated so let's not
   bother.

(*) The reason for the segfault: all Exprs including these inherit from
KernelScopedObject, whose constructor adds the object to a list for destruction
at the end of the containing KernelArena's lifetime.  But if the derived-class
constructor throws, the object is deleted even though it's still in the
KernelArena's list.  So when the KernelArena is itself deleted, it double-frees
the pointer and dies.  I've also fixed And, Or, and Xor in this diff.

Differential Revision: [D25512052](https://our.internmc.facebook.com/intern/diff/D25512052/)

[ghstack-poisoned]
Two things:

1. These throw exceptions in their constructor, which causes a segfault (*), so
   move the exceptions to ::make.
2. They technically support FP types but the rules are complicated so let's not
   bother.

(*) The reason for the segfault: all Exprs including these inherit from
KernelScopedObject, whose constructor adds the object to a list for destruction
at the end of the containing KernelArena's lifetime.  But if the derived-class
constructor throws, the object is deleted even though it's still in the
KernelArena's list.  So when the KernelArena is itself deleted, it double-frees
the pointer and dies.  I've also fixed And, Or, and Xor in this diff.

Differential Revision: [D25512052](https://our.internmc.facebook.com/intern/diff/D25512052/)

[ghstack-poisoned]
bertmaher added a commit that referenced this pull request Dec 14, 2020
Pull Request resolved: #49271

Two things:

1. These throw exceptions in their constructor, which causes a segfault (*), so
   move the exceptions to ::make.
2. They technically support FP types but the rules are complicated so let's not
   bother.

(*) The reason for the segfault: all Exprs including these inherit from
KernelScopedObject, whose constructor adds the object to a list for destruction
at the end of the containing KernelArena's lifetime.  But if the derived-class
constructor throws, the object is deleted even though it's still in the
KernelArena's list.  So when the KernelArena is itself deleted, it double-frees
the pointer and dies.  I've also fixed And, Or, and Xor in this diff.
ghstack-source-id: 118556876

Differential Revision: [D25512052](https://our.internmc.facebook.com/intern/diff/D25512052/)
Two things:

1. These throw exceptions in their constructor, which causes a segfault (*), so
   move the exceptions to ::make.
2. They technically support FP types but the rules are complicated so let's not
   bother.

(*) The reason for the segfault: all Exprs including these inherit from
KernelScopedObject, whose constructor adds the object to a list for destruction
at the end of the containing KernelArena's lifetime.  But if the derived-class
constructor throws, the object is deleted even though it's still in the
KernelArena's list.  So when the KernelArena is itself deleted, it double-frees
the pointer and dies.  I've also fixed And, Or, and Xor in this diff.

Differential Revision: [D25512052](https://our.internmc.facebook.com/intern/diff/D25512052/)

[ghstack-poisoned]
Two things:

1. These throw exceptions in their constructor, which causes a segfault (*), so
   move the exceptions to ::make.
2. They technically support FP types but the rules are complicated so let's not
   bother.

(*) The reason for the segfault: all Exprs including these inherit from
KernelScopedObject, whose constructor adds the object to a list for destruction
at the end of the containing KernelArena's lifetime.  But if the derived-class
constructor throws, the object is deleted even though it's still in the
KernelArena's list.  So when the KernelArena is itself deleted, it double-frees
the pointer and dies.  I've also fixed And, Or, and Xor in this diff.

Differential Revision: [D25512052](https://our.internmc.facebook.com/intern/diff/D25512052/)

[ghstack-poisoned]
bertmaher added a commit that referenced this pull request Dec 15, 2020
Pull Request resolved: #49271

Two things:

1. These throw exceptions in their constructor, which causes a segfault (*), so
   move the exceptions to ::make.
2. They technically support FP types but the rules are complicated so let's not
   bother.

(*) The reason for the segfault: all Exprs including these inherit from
KernelScopedObject, whose constructor adds the object to a list for destruction
at the end of the containing KernelArena's lifetime.  But if the derived-class
constructor throws, the object is deleted even though it's still in the
KernelArena's list.  So when the KernelArena is itself deleted, it double-frees
the pointer and dies.  I've also fixed And, Or, and Xor in this diff.
ghstack-source-id: 118594998

Differential Revision: [D25512052](https://our.internmc.facebook.com/intern/diff/D25512052/)
bertmaher added a commit to bertmaher/pytorch that referenced this pull request Dec 15, 2020
Summary:
Pull Request resolved: pytorch#49271

Two things:

1. These throw exceptions in their constructor, which causes a segfault (*), so
   move the exceptions to ::make.
2. They technically support FP types but the rules are complicated so let's not
   bother.

(*) The reason for the segfault: all Exprs including these inherit from
KernelScopedObject, whose constructor adds the object to a list for destruction
at the end of the containing KernelArena's lifetime.  But if the derived-class
constructor throws, the object is deleted even though it's still in the
KernelArena's list.  So when the KernelArena is itself deleted, it double-frees
the pointer and dies.  I've also fixed And, Or, and Xor in this diff.
ghstack-source-id: 118594998

Test Plan: `buck test //caffe2/test:jit`

Differential Revision: D25512052

fbshipit-source-id: f3ca16f208c427cd3d740e8971302d8d504240fb
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in f4e15c4.

@facebook-github-bot facebook-github-bot deleted the gh/bertmaher/51/head branch December 19, 2020 15:18
hwangdeyu pushed a commit to hwangdeyu/pytorch that referenced this pull request Jan 6, 2021
Summary:
Pull Request resolved: pytorch#49396

Pull Request resolved: pytorch#49271

Two things:

1. These throw exceptions in their constructor, which causes a segfault (*), so
   move the exceptions to ::make.
2. They technically support FP types but the rules are complicated so let's not
   bother.

(*) The reason for the segfault: all Exprs including these inherit from
KernelScopedObject, whose constructor adds the object to a list for destruction
at the end of the containing KernelArena's lifetime.  But if the derived-class
constructor throws, the object is deleted even though it's still in the
KernelArena's list.  So when the KernelArena is itself deleted, it double-frees
the pointer and dies.  I've also fixed And, Or, and Xor in this diff.
ghstack-source-id: 118594998

Test Plan: `buck test //caffe2/test:jit`

Reviewed By: bwasti

Differential Revision: D25512052

fbshipit-source-id: 42670b3be0cc1600dc5cda6811f7f270a2c88bba
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: jit Add this issue/PR to JIT oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants