Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dataclasses in TorchScript #72901

Closed
norabelrose opened this issue Feb 16, 2022 · 25 comments
Closed

Support dataclasses in TorchScript #72901

norabelrose opened this issue Feb 16, 2022 · 25 comments
Labels
oncall: jit Add this issue/PR to JIT oncall triage queue
Projects

Comments

@norabelrose
Copy link
Contributor

🚀 The feature, motivation and pitch

Currently, TorchScript supports namedtuples as well as custom Python classes that meet certain requirements. Unfortunately, it's not possible to subclass a namedtuple to add extra methods (since inheritance is not supported), nor is it possible to use dataclasses (since the @DataClass decorator adds implementations of __init__ and __eq__ whose source code is not accessible with inspect.getsource). This means that anyone wanting to use custom data structures in TorchScript has to write them "the old-fashioned way," manually implementing all the magic methods they'd like to use. In this respect, TorchScript is less user friendly than JAX + Flax, which has a flax.struct.dataclass decorator that allows you to use (immutable) dataclass-like objects in jitted JAX code.

It seems like there are two possible ways to address this problem. First, we could add a new dataclass-like decorator, perhaps torch.jit.dataclass, which would transform a class just like the usual @dataclass decorator does but in a way that is compatible with TorchScript:

@torch.jit.dataclass
class Normal:
    mu: Tensor
    sigma: Tensor

    def log_prob(self, value: Tensor) -> Tensor:
        ...

The other approach would be to add a special case into the TorchScript frontend that recognizes dataclasses, resulting in a double-decorator syntax like this:

@torch.jit.script
@dataclass
class Normal:
    mu: Tensor
    sigma: Tensor

    def log_prob(self, value: Tensor) -> Tensor:
        ...

I think I'm partial to the first approach, since it is slightly more compact and might be a bit easier to implement, and it's also more similar to the Flax implementation, but I'm open to the latter approach as well. I'm also interested in contributing a PR to implement this feature if it seems that there's a consensus that this is a good idea.

Alternatives

No response

Additional context

No response

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Feb 16, 2022
@github-actions github-actions bot added this to Need triage in JIT Triage Feb 16, 2022
@qihqi
Copy link
Contributor

qihqi commented Feb 16, 2022

Hi Nora,

This is a good feature to have. Thanks for proposing and offering to contribute PR. I think the second approach is preferable if feasible to implement, given that it does not add extra APIs to torch.jit namespace and is more flexible.

I guess implementation-wise it would be: 1. detect is a dataclass; 2. strip @DataClass generated methods that are not torchscript compatible; 3. add new ones that is torchscript compatible? Is that what you have in mind?

@vadimkantorov
Copy link
Contributor

Maybe related proposal: #61686 - remove the need of official Module's super call and make initialization of module's attributes lazy in getattr

@norabelrose
Copy link
Contributor Author

norabelrose commented Feb 17, 2022

@qihqi On reflection I think you're right, the second approach would probably be better. I've started reading through the relevant parts of the PyTorch source as well as the CPython implementation of dataclasses tonight to figure out the best way to implement this. It doesn't seem like it should be too hard; we can use the dataclasses.is_dataclass() function to detect whether a class is a dataclass and patch in our own implementations of the relevant magic methods, if and only if calling inspect.getsourcefile() on them fails. If inspect.getsourcefile() succeeds, it probably means the user manually implemented the method and we wouldn't want to override that.

@vadimkantorov That's also a nice proposal, but I don't think it's directly related to this one. We could implement dataclasses in TorchScript without removing the need to call super().__init__() in nn.Module subclasses.

I do think it would also be nice to support subclassing/inheritance in TorchScript, and if it looks like it wouldn't be that hard to implement, I might go ahead and make a PR for that too. One thing at a time though.

@vadimkantorov
Copy link
Contributor

vadimkantorov commented Feb 17, 2022

If I understand, dataclasses auto-implement __init__, so to have a module-dataclass, one has to call base Module's __init__ in dervided __post_init__. Not sure if attributes can be set by auto-generated __init__ before Module's base __init__ is called...

Yep, my comments mainly concern regular eager PyTorch, not even TorchScript

@qihqi
Copy link
Contributor

qihqi commented Feb 18, 2022

@norabelrose Feel free to start and please add me to PR review when ready.

@vadimkantorov Q: Why do you want a dataclass that is subclass of nn.Module? I think Nora is proposing adding dataclass without the need to subclass from nn.Module.

@norabelrose
Copy link
Contributor Author

norabelrose commented Feb 18, 2022

After looking at the source code a little bit, it seems like the tricky part of implementing this feature is that the Python frontend assumes and enforces the invariant that all JIT-able code has to have a definite source file, presumably for the purpose of generating error messages. If we synthesize the magic methods for dataclasses then these methods will (probably?) not have a source file, so it looks like we need to get rid of that assumption. As a nice side effect, this should allow for JIT compiling classes inside of IPython notebooks, which currently fails. I'll work on this tonight and over the weekend.

EDIT: Never mind about the IPython notebook thing, it seems that there are other reasons why that still wouldn't work with this fix.

@qihqi
Copy link
Contributor

qihqi commented Feb 18, 2022

After looking at the source code a little bit, it seems like the tricky part of implementing this feature is that the Python frontend assumes and enforces the invariant that all JIT-able code has to have a definite source file, presumably for the purpose of generating error messages. If we synthesize the magic methods for dataclasses then these methods will (probably?) not have a source file, so it looks like we need to get rid of that assumption. As a nice side effect, this should allow for JIT compiling classes inside of IPython notebooks, which currently fails. I'll work on this tonight and over the weekend.

EDIT: Never mind about the IPython notebook thing, it seems that there are other reasons why that still wouldn't work with this fix.

Btw we can also generate source from torchscript IR once we have them. This codepath is used when saving a module to .pt files.

@vadimkantorov
Copy link
Contributor

vadimkantorov commented Feb 19, 2022

Q: Why do you want a dataclass that is subclass of nn.Module

This is frequent for modules that are composed of other modules, so their constructor is trivially setting the passed arguments into the fields. Another instance when mandatory init call is hurting is when we want to change some existing base class to derive from Module (e.g. Optimizer) without forcing init call to be introduced in all derived modules.

Another instance is making Transform into Modules post-factum: #31009

@norabelrose
Copy link
Contributor Author

I do agree with @vadimkantorov that supporting nn.Module subclasses as dataclass would be useful. It's mainly to cut down on repetitive/boilerplate __init__ methods. I think once dataclasses are supported in TorchScript, there will be one less barrier to allowing dataclass nn.Modules.

@norabelrose
Copy link
Contributor Author

norabelrose commented Feb 20, 2022

@qihqi I think my PR is ready for you to take a look at. I've implemented all the basic features and added some unit tests.

facebook-github-bot pushed a commit that referenced this issue Feb 28, 2022
Summary:
Fixes #72901.

Since we can't get access to the source code for synthesized magic methods on dataclasses, we have to synthesize our own versions. `torch/jit/_dataclass_impls.py` has the code that does this.

What's supported
- Synthesized `__init__`, `__eq__`, and the comparison magic methods when `order=True` is set on the dataclass decorator
- Default values for fields
- `__post_init__`, including using `InitVar` fields inside of `__post_init__`, on Python 3.8+
- Overriding `__eq__` or any of the comparison magic methods to provide your own implementation

What's not supported
- Default factory initializers for fields
- Frozen dataclasses
- `InitVar` on Python 3.7
- `__repr__` and `__hash__` (these are actually implemented, but the TorchScript interpreter won't call them)
- Using the `!=` operator on dataclasses inside TorchScript; this is because TorchScript requires that you implement `__ne__` to use this operator, whereas in regular Python the `!=` operator will resolve to the negation of whatever is returned by `__eq__` if there's no `__ne__`. Dataclasses don't actually synthesize an `__ne__` method for this reason. I've been toying with different ways to fix this but `!=` is not working in this PR at the moment.

qihqi

Pull Request resolved: #73066

Reviewed By: mrshenli

Differential Revision: D34398107

Pulled By: qihqi

fbshipit-source-id: f5a792555c88f3631f97837a96687e4890660a32
JIT Triage automation moved this from Need triage to Done Feb 28, 2022
cyyever pushed a commit to cyyever/pytorch_private that referenced this issue Mar 3, 2022
Summary:
Fixes pytorch/pytorch#72901.

Since we can't get access to the source code for synthesized magic methods on dataclasses, we have to synthesize our own versions. `torch/jit/_dataclass_impls.py` has the code that does this.

What's supported
- Synthesized `__init__`, `__eq__`, and the comparison magic methods when `order=True` is set on the dataclass decorator
- Default values for fields
- `__post_init__`, including using `InitVar` fields inside of `__post_init__`, on Python 3.8+
- Overriding `__eq__` or any of the comparison magic methods to provide your own implementation

What's not supported
- Default factory initializers for fields
- Frozen dataclasses
- `InitVar` on Python 3.7
- `__repr__` and `__hash__` (these are actually implemented, but the TorchScript interpreter won't call them)
- Using the `!=` operator on dataclasses inside TorchScript; this is because TorchScript requires that you implement `__ne__` to use this operator, whereas in regular Python the `!=` operator will resolve to the negation of whatever is returned by `__eq__` if there's no `__ne__`. Dataclasses don't actually synthesize an `__ne__` method for this reason. I've been toying with different ways to fix this but `!=` is not working in this PR at the moment.

qihqi

Pull Request resolved: pytorch/pytorch#73066

Reviewed By: mrshenli

Differential Revision: D34398107

Pulled By: qihqi

fbshipit-source-id: f5a792555c88f3631f97837a96687e4890660a32
(cherry picked from commit ea7f077dc49a4ee75ca0d1409aedd85228952881)
cyyever pushed a commit to cyyever/pytorch_private that referenced this issue Mar 3, 2022
Summary:
Fixes pytorch/pytorch#72901.

Since we can't get access to the source code for synthesized magic methods on dataclasses, we have to synthesize our own versions. `torch/jit/_dataclass_impls.py` has the code that does this.

What's supported
- Synthesized `__init__`, `__eq__`, and the comparison magic methods when `order=True` is set on the dataclass decorator
- Default values for fields
- `__post_init__`, including using `InitVar` fields inside of `__post_init__`, on Python 3.8+
- Overriding `__eq__` or any of the comparison magic methods to provide your own implementation

What's not supported
- Default factory initializers for fields
- Frozen dataclasses
- `InitVar` on Python 3.7
- `__repr__` and `__hash__` (these are actually implemented, but the TorchScript interpreter won't call them)
- Using the `!=` operator on dataclasses inside TorchScript; this is because TorchScript requires that you implement `__ne__` to use this operator, whereas in regular Python the `!=` operator will resolve to the negation of whatever is returned by `__eq__` if there's no `__ne__`. Dataclasses don't actually synthesize an `__ne__` method for this reason. I've been toying with different ways to fix this but `!=` is not working in this PR at the moment.

qihqi

Pull Request resolved: pytorch/pytorch#73066

Reviewed By: mrshenli

Differential Revision: D34398107

Pulled By: qihqi

fbshipit-source-id: f5a792555c88f3631f97837a96687e4890660a32
(cherry picked from commit ea7f077dc49a4ee75ca0d1409aedd85228952881)
@norabelrose
Copy link
Contributor Author

My PR fixing this issue got merged, but then reverted a few days later with a commit stating that it "broke fluent2 tests." I don't know what fluent2 is- I'm assuming some internal Meta thing- but I'm honestly a bit upset that this was reverted without any explanation as to how I could fix the issue. If @qihqi or anyone else could help me get this re-landed I'd greatly appreciate it. I don't want my work to have gone to waste.

@qihqi
Copy link
Contributor

qihqi commented Mar 8, 2022

Hi @norabelrose; Yes sorry about that. Yes I'll try to get it relanded.

I'm honestly a bit upset that this was reverted without any explanation

Totally understand... I am also in the process of figuring out what tickled it (it was a segfault in their end, which is super wierd given this is a pure python change). Anyways I'll keep you posted. Apologies again. I'll keep this issue open to track progress.

@qihqi qihqi reopened this Mar 8, 2022
JIT Triage automation moved this from Done to In progress Mar 8, 2022
@norabelrose
Copy link
Contributor Author

@qihqi Thank you for this, I really appreciate it.

@qihqi
Copy link
Contributor

qihqi commented Mar 14, 2022

Update so far:
The failure test involved a dataclass that looks like this:

class A(Enum):
    INPUT = ["input"]

    MANIFOLD = [
        "input",
        "before_fusion_projection",
        "after_fusion_projection",
        "after_classifier_projection",
    ]


@dataclass
class B:
    def __init__(self, alpha: float = 0.125, scheme: A = A.INPUT):
        self.alpha = alpha
        self.scheme = scheme

I am in process of isolating a test case. My hunch is that either the fact that an Enum is involved or that B has no declared members has something to do with it.

@norabelrose
Copy link
Contributor Author

@qihqi Oh if I had to guess the problem is that B is a dataclass but it's overriding __init__, which you're not supposed to do. That's a pretty odd use case that I'm not sure we should support; maybe the test case should be modified? What happens if you just remove the dataclass decorator from B?

@qihqi
Copy link
Contributor

qihqi commented Mar 16, 2022

hmmm weird, so removing @dataclass didn't work. removing try-except in this bit:

    try:
        src = inspect.getsource(fn)
    except OSError:
        return {}

made the test pass. It's like they explicitly want the OSError somehow...

@qihqi
Copy link
Contributor

qihqi commented Mar 17, 2022

Hi @norabelrose I think this works: #74353 please take a look. If you prefer to patch it submit as your PR it's OK too.

Long story short on the findings:

  1. We used to throw OSError when we can't find source for things. Some upstream functions depends on it to signify 'nothing to see, move along'.
  2. We added support for dataclasses, now every model that happens to declare some dataclasses as input automatically have more stuff processed by jit.
  3. Among them there is this Enum that uses list for values (which is weird). Then this line:
    return torch._C.unify_type_list(ir_types)
    that claimed never return None (but should AnyType in worst case) actually returned a None. This None is passed to here verbatim:
    auto enum_value = toIValue(enum_name_value.attr("value"), value_type);
    with value_type == nullptr and it segfaulted.

Just patching the bug in annotations.py is not sufficient as the user had list as Enum values and torchscript only supports String / Int / Float. It will give an RuntimeError instead which is still much better than crashing.

Current solution works because, if user don't explicitly call torch.jit.script on a dataclass (via decorator or not) then we will continue raising OSError on processing of it. Preserving old behavior.

@norabelrose
Copy link
Contributor Author

@qihqi Thanks for your work on this, this seems like a fine solution. I'm happy to let your version of the PR be merged.

@qihqi
Copy link
Contributor

qihqi commented May 2, 2022

@qihqi Are there any updates on this? It looks like internal Meta tests caused the PR to get reverted again.

This time it's a github CI: https://github.com/pytorch/pytorch/runs/5763579312?check_suite_focus=true

It has to do with gpu tests, I am still in process of figuring out what happened (didn't manage to repro locally yet).

qihqi added a commit to qihqi/pytorch that referenced this issue May 3, 2022
…orch#74353) (pytorch#76771)

Summary:
Pull Request resolved: pytorch#76771

Pull Request resolved: pytorch#74353

Repatched `d00de0d43598522b8f6ab2de553b6aaf6768faa5` by Nora Belrose (norabelrose). With following changes:
* Register fake source of generated methods in linecache so that inspect.get_source will succeed.
* this patching is only triggered if the given dataclass passed to torch.jit.script previously. Effectively we make this feature opt-in.

## Original Summary:
Fixes pytorch#72901.

Since we can't get access to the source code for synthesized magic methods on dataclasses, we have to synthesize our own versions. torch/jit/_dataclass_impls.py has the code that does this.

What's supported

Synthesized __init__, __eq__, and the comparison magic methods when order=True is set on the dataclass decorator
Default values for fields
__post_init__, including using InitVar fields inside of __post_init__, on Python 3.8+
Overriding __eq__ or any of the comparison magic methods to provide your own implementation
What's not supported

Default factory initializers for fields
Frozen dataclasses
InitVar on Python 3.7
__repr__ and __hash__ (these are actually implemented, but the TorchScript interpreter won't call them)
Using the != operator on dataclasses inside TorchScript; this is because TorchScript requires that you implement __ne__ to use this operator, whereas in regular Python the != operator will resolve to the negation of whatever is returned by __eq__ if there's no __ne__. Dataclasses don't actually synthesize an __ne__ method for this reason. I've been toying with different ways to fix this but != is not working in this PR at the moment.

Pull Request resolved: pytorch#74889

Test Plan:
unittest

Also run previously failed test:
```
buck test mode/dev-nosan //fblearner/flow/projects/fluent2/definition/transformers/contrib/faim/test:tests -- --exact 'fblearner/flow/projects/fluent2/definition/transformers/contrib/faim/test:tests - test_mixmatch_multiclass (fblearner.flow.projects.fluent2.definition.transformers.contrib.faim.test.faim_mixmatch_test.TestFaimTransformerMixMatch)'
```
passes

Reviewed By: zhxchen17

Differential Revision: D35206262

Pulled By: qihqi

fbshipit-source-id: ed96735d11b1c9a996dc68c2cfc5bdff7797822b
qihqi added a commit to qihqi/pytorch that referenced this issue May 4, 2022
…orch#74353) (pytorch#76771)

Summary:
Pull Request resolved: pytorch#76771

Pull Request resolved: pytorch#74353

Repatched `d00de0d43598522b8f6ab2de553b6aaf6768faa5` by Nora Belrose (norabelrose). With following changes:
* Register fake source of generated methods in linecache so that inspect.get_source will succeed.
* this patching is only triggered if the given dataclass passed to torch.jit.script previously. Effectively we make this feature opt-in.

## Original Summary:
Fixes pytorch#72901.

Since we can't get access to the source code for synthesized magic methods on dataclasses, we have to synthesize our own versions. torch/jit/_dataclass_impls.py has the code that does this.

What's supported

Synthesized __init__, __eq__, and the comparison magic methods when order=True is set on the dataclass decorator
Default values for fields
__post_init__, including using InitVar fields inside of __post_init__, on Python 3.8+
Overriding __eq__ or any of the comparison magic methods to provide your own implementation
What's not supported

Default factory initializers for fields
Frozen dataclasses
InitVar on Python 3.7
__repr__ and __hash__ (these are actually implemented, but the TorchScript interpreter won't call them)
Using the != operator on dataclasses inside TorchScript; this is because TorchScript requires that you implement __ne__ to use this operator, whereas in regular Python the != operator will resolve to the negation of whatever is returned by __eq__ if there's no __ne__. Dataclasses don't actually synthesize an __ne__ method for this reason. I've been toying with different ways to fix this but != is not working in this PR at the moment.

Pull Request resolved: pytorch#74889

Test Plan:
unittest

Also run previously failed test:
```
buck test mode/dev-nosan //fblearner/flow/projects/fluent2/definition/transformers/contrib/faim/test:tests -- --exact 'fblearner/flow/projects/fluent2/definition/transformers/contrib/faim/test:tests - test_mixmatch_multiclass (fblearner.flow.projects.fluent2.definition.transformers.contrib.faim.test.faim_mixmatch_test.TestFaimTransformerMixMatch)'
```
passes

Reviewed By: zhxchen17

Differential Revision: D35206262

Pulled By: qihqi

fbshipit-source-id: f25380d72306205cfaf41cd99b2d6c98830e9bed
qihqi added a commit to qihqi/pytorch that referenced this issue May 4, 2022
…orch#74353) (pytorch#76771)

Summary:
Pull Request resolved: pytorch#76771

Pull Request resolved: pytorch#74353

Repatched `d00de0d43598522b8f6ab2de553b6aaf6768faa5` by Nora Belrose (norabelrose). With following changes:
* Register fake source of generated methods in linecache so that inspect.get_source will succeed.
* this patching is only triggered if the given dataclass passed to torch.jit.script previously. Effectively we make this feature opt-in.

## Original Summary:
Fixes pytorch#72901.

Since we can't get access to the source code for synthesized magic methods on dataclasses, we have to synthesize our own versions. torch/jit/_dataclass_impls.py has the code that does this.

What's supported

Synthesized __init__, __eq__, and the comparison magic methods when order=True is set on the dataclass decorator
Default values for fields
__post_init__, including using InitVar fields inside of __post_init__, on Python 3.8+
Overriding __eq__ or any of the comparison magic methods to provide your own implementation
What's not supported

Default factory initializers for fields
Frozen dataclasses
InitVar on Python 3.7
__repr__ and __hash__ (these are actually implemented, but the TorchScript interpreter won't call them)
Using the != operator on dataclasses inside TorchScript; this is because TorchScript requires that you implement __ne__ to use this operator, whereas in regular Python the != operator will resolve to the negation of whatever is returned by __eq__ if there's no __ne__. Dataclasses don't actually synthesize an __ne__ method for this reason. I've been toying with different ways to fix this but != is not working in this PR at the moment.

Pull Request resolved: pytorch#74889

Test Plan:
unittest

Also run previously failed test:
```
buck test mode/dev-nosan //fblearner/flow/projects/fluent2/definition/transformers/contrib/faim/test:tests -- --exact 'fblearner/flow/projects/fluent2/definition/transformers/contrib/faim/test:tests - test_mixmatch_multiclass (fblearner.flow.projects.fluent2.definition.transformers.contrib.faim.test.faim_mixmatch_test.TestFaimTransformerMixMatch)'
```
passes

Reviewed By: zhxchen17

Differential Revision: D35206262

Pulled By: qihqi

fbshipit-source-id: 3b72e24907f792592a74f3ac922684d4c6db73a2
@leg0m4n
Copy link

leg0m4n commented May 21, 2022

Hi, are there any updates on this issue or any workaround? Torchscript isn't working on my NN which has a dataclass.

qihqi added a commit to qihqi/pytorch that referenced this issue May 21, 2022
…orch#74353) (pytorch#76771)

Summary:
Pull Request resolved: pytorch#76771

Pull Request resolved: pytorch#74353

Repatched `d00de0d43598522b8f6ab2de553b6aaf6768faa5` by Nora Belrose (norabelrose). With following changes:
* Register fake source of generated methods in linecache so that inspect.get_source will succeed.
* this patching is only triggered if the given dataclass passed to torch.jit.script previously. Effectively we make this feature opt-in.

Fixes pytorch#72901.

Since we can't get access to the source code for synthesized magic methods on dataclasses, we have to synthesize our own versions. torch/jit/_dataclass_impls.py has the code that does this.

What's supported

Synthesized __init__, __eq__, and the comparison magic methods when order=True is set on the dataclass decorator
Default values for fields
__post_init__, including using InitVar fields inside of __post_init__, on Python 3.8+
Overriding __eq__ or any of the comparison magic methods to provide your own implementation
What's not supported

Default factory initializers for fields
Frozen dataclasses
InitVar on Python 3.7
__repr__ and __hash__ (these are actually implemented, but the TorchScript interpreter won't call them)
Using the != operator on dataclasses inside TorchScript; this is because TorchScript requires that you implement __ne__ to use this operator, whereas in regular Python the != operator will resolve to the negation of whatever is returned by __eq__ if there's no __ne__. Dataclasses don't actually synthesize an __ne__ method for this reason. I've been toying with different ways to fix this but != is not working in this PR at the moment.

Pull Request resolved: pytorch#74889

Test Plan:
unittest

Also run previously failed test:
```
buck test mode/dev-nosan //fblearner/flow/projects/fluent2/definition/transformers/contrib/faim/test:tests -- --exact 'fblearner/flow/projects/fluent2/definition/transformers/contrib/faim/test:tests - test_mixmatch_multiclass (fblearner.flow.projects.fluent2.definition.transformers.contrib.faim.test.faim_mixmatch_test.TestFaimTransformerMixMatch)'
```
passes

Reviewed By: zhxchen17

Differential Revision: D35206262

Pulled By: qihqi

fbshipit-source-id: 3b72e24907f792592a74f3ac922684d4c6db73a2
@qihqi
Copy link
Contributor

qihqi commented Jun 6, 2022

@leg0m4n @norabelrose @mostafarohani
Hey folks here is the current state of things:
PR is still open at #76771 (comment) and currently failing this CI:
https://github.com/pytorch/pytorch/runs/6534842970?check_suite_focus=true
Error message is:

  File "test/run_test.py", line 1074, in <module>
    main()
  File "test/run_test.py", line 1052, in main
    raise RuntimeError(err_message)
RuntimeError: test_jit_fuser_te failed! Received signal: SIGSEGV

Running test_jit_fuser_te.py locally passes (with some skipped test).
There are other CI failures on the PR too but I think they are the same root cause.

History

  • @norabelrose landed first version on GH but got reverted from phab when a meta-internal test failed
  • I tried to fix meta-internal tests and seems successful, landed that one, but then the trunk/ CI failure pointed that as culprit. So it got reverted again
  • I reapplied changes to see if rebases helps but it seems pretty consistent.

What's next?

I will make posts in internal groups / pytorch forums to incite help, as I feel it is falling out of my area of expertise.

I want this feature, what can I do to help

Few things can be helpful:

  • Pull this patch and try to reproduce the CI failures locally, if you do then share commands on how to reproduce locally
  • Help figure out how to get better error messages from the above error, share the error message
  • Commenting that you really want this also helps as it makes other people more willing to help

@davidberard98
Copy link
Contributor

davidberard98 commented Jun 6, 2022

@qihqi For local testing, are you running with PYTORCH_TEST_WITH_SLOW=1?

I was able to repro a double free with PYTORCH_TEST_WITH_SLOW=1 python test/test_jit_fuser_te.py -k test_binary_tensor_scalar_ops. I ran with gdb and got this:

#0  0x00007ffff70aea4f in raise () from /lib64/libc.so.6
#1  0x00007ffff7081db5 in abort () from /lib64/libc.so.6
#2  0x00007ffff70f1057 in __libc_message () from /lib64/libc.so.6
#3  0x00007ffff70f81bc in malloc_printerr () from /lib64/libc.so.6
#4  0x00007ffff70f9abc in _int_free () from /lib64/libc.so.6
#5  0x00007fff613def9a in torch::Library::~Library (this=0x7fff614d4fe0 <vision::TORCH_LIBRARY_FRAGMENT_static_init_torchvision_1>,
    __in_chrg=<optimized out>) at /data/users/dberard/pytorch/torch/include/torch/library.h:532
#6  0x00007fff613f47ee in torch::detail::TorchLibraryInit::~TorchLibraryInit (
    this=0x7fff614d4fe0 <vision::TORCH_LIBRARY_FRAGMENT_static_init_torchvision_1>, __in_chrg=<optimized out>)
    at /data/users/dberard/pytorch/torch/include/torch/library.h:824
#7  0x00007ffff70b11ec in __run_exit_handlers () from /lib64/libc.so.6
#8  0x00007ffff70b1320 in exit () from /lib64/libc.so.6
#9  0x00005555557bce88 in Py_Exit (sts=0) at /tmp/build/80754af9/python-split_1629314979626/work/Python/pylifecycle.c:2437
#10 0x00005555557bcebb in handle_system_exit () at /tmp/build/80754af9/python-split_1629314979626/work/Python/pythonrun.c:722
#11 0x00005555557bcf00 in _PyErr_PrintEx (tstate=0x5555559139d0, set_sys_last_vars=1)
    at /tmp/build/80754af9/python-split_1629314979626/work/Python/pythonrun.c:732
#12 0x00005555557c11f2 in pyrun_simple_file (flags=0x7fffffffd6d8, closeit=<optimized out>, filename=0x7ffff7ede8f0, fp=<optimized out>)
    at /tmp/build/80754af9/python-split_1629314979626/work/Python/pythonrun.c:455
#13 PyRun_SimpleFileExFlags (fp=<optimized out>, filename=<optimized out>, closeit=<optimized out>, flags=0x7fffffffd6d8)
    at /tmp/build/80754af9/python-split_1629314979626/work/Python/pythonrun.c:482
#14 0x00005555557c191c in pymain_run_file (cf=0x7fffffffd6d8, config=0x555555914300)
    at /tmp/build/80754af9/python-split_1629314979626/work/Modules/main.c:379
#15 pymain_run_python (exitcode=0x7fffffffd6d0) at /tmp/build/80754af9/python-split_1629314979626/work/Modules/main.c:604
#16 Py_RunMain () at /tmp/build/80754af9/python-split_1629314979626/work/Modules/main.c:683
#17 0x00005555557c1aa9 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>)
    at /tmp/build/80754af9/python-split_1629314979626/work/Modules/main.c:1129
#18 0x00007ffff709aca3 in __libc_start_main () from /lib64/libc.so.6
#19 0x0000555555743ec4 in _start (

qihqi added a commit to qihqi/pytorch that referenced this issue Jun 6, 2022
…orch#74353) (pytorch#76771)

Summary:
Pull Request resolved: pytorch#76771

Pull Request resolved: pytorch#74353

Repatched `d00de0d43598522b8f6ab2de553b6aaf6768faa5` by Nora Belrose (norabelrose). With following changes:
* Register fake source of generated methods in linecache so that inspect.get_source will succeed.
* this patching is only triggered if the given dataclass passed to torch.jit.script previously. Effectively we make this feature opt-in.

Fixes pytorch#72901.

Since we can't get access to the source code for synthesized magic methods on dataclasses, we have to synthesize our own versions. torch/jit/_dataclass_impls.py has the code that does this.

What's supported

Synthesized __init__, __eq__, and the comparison magic methods when order=True is set on the dataclass decorator
Default values for fields
__post_init__, including using InitVar fields inside of __post_init__, on Python 3.8+
Overriding __eq__ or any of the comparison magic methods to provide your own implementation
What's not supported

Default factory initializers for fields
Frozen dataclasses
InitVar on Python 3.7
__repr__ and __hash__ (these are actually implemented, but the TorchScript interpreter won't call them)
Using the != operator on dataclasses inside TorchScript; this is because TorchScript requires that you implement __ne__ to use this operator, whereas in regular Python the != operator will resolve to the negation of whatever is returned by __eq__ if there's no __ne__. Dataclasses don't actually synthesize an __ne__ method for this reason. I've been toying with different ways to fix this but != is not working in this PR at the moment.

Pull Request resolved: pytorch#74889

Test Plan:
unittest

Also run previously failed test:
```
buck test mode/dev-nosan //fblearner/flow/projects/fluent2/definition/transformers/contrib/faim/test:tests -- --exact 'fblearner/flow/projects/fluent2/definition/transformers/contrib/faim/test:tests - test_mixmatch_multiclass (fblearner.flow.projects.fluent2.definition.transformers.contrib.faim.test.faim_mixmatch_test.TestFaimTransformerMixMatch)'
```
passes

Reviewed By: zhxchen17

Differential Revision: D35206262

Pulled By: qihqi

fbshipit-source-id: 3b72e24907f792592a74f3ac922684d4c6db73a2
qihqi added a commit to qihqi/pytorch that referenced this issue Jun 7, 2022
…orch#74353) (pytorch#76771)

Summary:
Pull Request resolved: pytorch#76771

Pull Request resolved: pytorch#74353

Repatched `d00de0d43598522b8f6ab2de553b6aaf6768faa5` by Nora Belrose (norabelrose). With following changes:
* Register fake source of generated methods in linecache so that inspect.get_source will succeed.
* this patching is only triggered if the given dataclass passed to torch.jit.script previously. Effectively we make this feature opt-in.

Fixes pytorch#72901.

Since we can't get access to the source code for synthesized magic methods on dataclasses, we have to synthesize our own versions. torch/jit/_dataclass_impls.py has the code that does this.

What's supported

Synthesized __init__, __eq__, and the comparison magic methods when order=True is set on the dataclass decorator
Default values for fields
__post_init__, including using InitVar fields inside of __post_init__, on Python 3.8+
Overriding __eq__ or any of the comparison magic methods to provide your own implementation
What's not supported

Default factory initializers for fields
Frozen dataclasses
InitVar on Python 3.7
__repr__ and __hash__ (these are actually implemented, but the TorchScript interpreter won't call them)
Using the != operator on dataclasses inside TorchScript; this is because TorchScript requires that you implement __ne__ to use this operator, whereas in regular Python the != operator will resolve to the negation of whatever is returned by __eq__ if there's no __ne__. Dataclasses don't actually synthesize an __ne__ method for this reason. I've been toying with different ways to fix this but != is not working in this PR at the moment.

Pull Request resolved: pytorch#74889

Test Plan:
unittest

Also run previously failed test:
```
buck test mode/dev-nosan //fblearner/flow/projects/fluent2/definition/transformers/contrib/faim/test:tests -- --exact 'fblearner/flow/projects/fluent2/definition/transformers/contrib/faim/test:tests - test_mixmatch_multiclass (fblearner.flow.projects.fluent2.definition.transformers.contrib.faim.test.faim_mixmatch_test.TestFaimTransformerMixMatch)'
```
passes

Reviewed By: zhxchen17

Differential Revision: D35206262

Pulled By: qihqi

fbshipit-source-id: 3b72e24907f792592a74f3ac922684d4c6db73a2
qihqi added a commit to qihqi/pytorch that referenced this issue Jun 7, 2022
…orch#74353) (pytorch#76771)

Summary:
Pull Request resolved: pytorch#76771

Pull Request resolved: pytorch#74353

Repatched `d00de0d43598522b8f6ab2de553b6aaf6768faa5` by Nora Belrose (norabelrose). With following changes:
* Register fake source of generated methods in linecache so that inspect.get_source will succeed.
* this patching is only triggered if the given dataclass passed to torch.jit.script previously. Effectively we make this feature opt-in.

Fixes pytorch#72901.

Since we can't get access to the source code for synthesized magic methods on dataclasses, we have to synthesize our own versions. torch/jit/_dataclass_impls.py has the code that does this.

What's supported

Synthesized __init__, __eq__, and the comparison magic methods when order=True is set on the dataclass decorator
Default values for fields
__post_init__, including using InitVar fields inside of __post_init__, on Python 3.8+
Overriding __eq__ or any of the comparison magic methods to provide your own implementation
What's not supported

Default factory initializers for fields
Frozen dataclasses
InitVar on Python 3.7
__repr__ and __hash__ (these are actually implemented, but the TorchScript interpreter won't call them)
Using the != operator on dataclasses inside TorchScript; this is because TorchScript requires that you implement __ne__ to use this operator, whereas in regular Python the != operator will resolve to the negation of whatever is returned by __eq__ if there's no __ne__. Dataclasses don't actually synthesize an __ne__ method for this reason. I've been toying with different ways to fix this but != is not working in this PR at the moment.

Pull Request resolved: pytorch#74889

Test Plan:
unittest

Also run previously failed test:
```
buck test mode/dev-nosan //fblearner/flow/projects/fluent2/definition/transformers/contrib/faim/test:tests -- --exact 'fblearner/flow/projects/fluent2/definition/transformers/contrib/faim/test:tests - test_mixmatch_multiclass (fblearner.flow.projects.fluent2.definition.transformers.contrib.faim.test.faim_mixmatch_test.TestFaimTransformerMixMatch)'
```
passes

Reviewed By: zhxchen17

Differential Revision: D35206262

Pulled By: qihqi

fbshipit-source-id: 3b72e24907f792592a74f3ac922684d4c6db73a2
qihqi added a commit to qihqi/pytorch that referenced this issue Jun 7, 2022
…orch#74353) (pytorch#76771)

Summary:
Pull Request resolved: pytorch#76771

Pull Request resolved: pytorch#74353

Repatched `d00de0d43598522b8f6ab2de553b6aaf6768faa5` by Nora Belrose (norabelrose). With following changes:
* Register fake source of generated methods in linecache so that inspect.get_source will succeed.
* this patching is only triggered if the given dataclass passed to torch.jit.script previously. Effectively we make this feature opt-in.

Fixes pytorch#72901.

Since we can't get access to the source code for synthesized magic methods on dataclasses, we have to synthesize our own versions. torch/jit/_dataclass_impls.py has the code that does this.

What's supported

Synthesized __init__, __eq__, and the comparison magic methods when order=True is set on the dataclass decorator
Default values for fields
__post_init__, including using InitVar fields inside of __post_init__, on Python 3.8+
Overriding __eq__ or any of the comparison magic methods to provide your own implementation
What's not supported

Default factory initializers for fields
Frozen dataclasses
InitVar on Python 3.7
__repr__ and __hash__ (these are actually implemented, but the TorchScript interpreter won't call them)
Using the != operator on dataclasses inside TorchScript; this is because TorchScript requires that you implement __ne__ to use this operator, whereas in regular Python the != operator will resolve to the negation of whatever is returned by __eq__ if there's no __ne__. Dataclasses don't actually synthesize an __ne__ method for this reason. I've been toying with different ways to fix this but != is not working in this PR at the moment.

Pull Request resolved: pytorch#74889

Test Plan:
unittest

Also run previously failed test:
```
buck test mode/dev-nosan //fblearner/flow/projects/fluent2/definition/transformers/contrib/faim/test:tests -- --exact 'fblearner/flow/projects/fluent2/definition/transformers/contrib/faim/test:tests - test_mixmatch_multiclass (fblearner.flow.projects.fluent2.definition.transformers.contrib.faim.test.faim_mixmatch_test.TestFaimTransformerMixMatch)'
```
passes

Reviewed By: zhxchen17

Differential Revision: D35206262

Pulled By: qihqi

fbshipit-source-id: 3b72e24907f792592a74f3ac922684d4c6db73a2
@qihqi
Copy link
Contributor

qihqi commented Jun 7, 2022

Thanks @davidberard98 with this hint I managed to fix it. I'll try get it landed soon.

JIT Triage automation moved this from In progress to Done Jun 7, 2022
facebook-github-bot pushed a commit that referenced this issue Jun 8, 2022
…#76771) (#74353)

Summary:
Pull Request resolved: #74353

Repatched `d00de0d43598522b8f6ab2de553b6aaf6768faa5` by Nora Belrose (norabelrose). With following changes:
* Register fake source of generated methods in linecache so that inspect.get_source will succeed.
* this patching is only triggered if the given dataclass passed to torch.jit.script previously. Effectively we make this feature opt-in.

## Original Summary:
Fixes #72901.

Since we can't get access to the source code for synthesized magic methods on dataclasses, we have to synthesize our own versions. torch/jit/_dataclass_impls.py has the code that does this.

What's supported

Synthesized __init__, __eq__, and the comparison magic methods when order=True is set on the dataclass decorator
Default values for fields
__post_init__, including using InitVar fields inside of __post_init__, on Python 3.8+
Overriding __eq__ or any of the comparison magic methods to provide your own implementation
What's not supported

Default factory initializers for fields
Frozen dataclasses
InitVar on Python 3.7
__repr__ and __hash__ (these are actually implemented, but the TorchScript interpreter won't call them)
Using the != operator on dataclasses inside TorchScript; this is because TorchScript requires that you implement __ne__ to use this operator, whereas in regular Python the != operator will resolve to the negation of whatever is returned by __eq__ if there's no __ne__. Dataclasses don't actually synthesize an __ne__ method for this reason. I've been toying with different ways to fix this but != is not working in this PR at the moment.
#74889
#76771
Approved by: https://github.com/seemethere

Test Plan:
contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/13dff3b2c2f2420f6458804dcc80ded5598636ab

Test plan from GitHub:
unittest

Also run previously failed test:
```
buck test mode/dev-nosan //fblearner/flow/projects/fluent2/definition/transformers/contrib/faim/test:tests -- --exact 'fblearner/flow/projects/fluent2/definition/transformers/contrib/faim/test:tests - test_mixmatch_multiclass (fblearner.flow.projects.fluent2.definition.transformers.contrib.faim.test.faim_mixmatch_test.TestFaimTransformerMixMatch)'
```
passes

Reviewed By: seemethere, osalpekar, zhxchen17

Differential Revision: D35206262

Pulled By: qihqi

fbshipit-source-id: 4b43477930f99ca2dd26073c0ec60f19bb4c8372
@WillJStone
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment