-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
ATen operator API versioning #38973
Comments
A few thoughts:
Also cc @ailzhang for some perspective from XLA. |
Thanks, @ezyang, Let me clarify my understanding. Actually, we have two issues,
Regarding issue 1, we understand it is hard to provide a wholly compatible solution to end-user, otherwise, either we may pay a lot of efforts to maintain several extension versions, or PyTorch provide compatible API. In my mind, issue 1 might be a long term talk. So we hope, at current stage, issue 2 could be solved. We think there may be a clear high-level warning noticed to end-user, not varied and detailed warning for different ATen op. Detailed warning is good enough for developer in debug mode, but not clear to end-user in release mode. |
Yea I agree that it's probably too much to push for compatibility across multiple Pytorch versions for now (although it's a good long term goal), and a compatibility check/warning might be good enough and feasible. Although from XLA's experience, C++ API level changes are mostly compile errors (like function signature changes), @arthuryuan1987 can you provide a few examples of runtime errors due to API changes as well? Thanks! |
Error message trick in #38739 might be relevant here |
We thought ATen operator API includes two parts, operator signature and operator dispatch strategy. Let me show you error log separately,
Of course, if we try to rebase extension to support PyTorch v1.5, we will generate registration code automatically, and get a compilation error, mismatching between generated code and our native implementation. That is another talk (development mode or debugging mode). Here what we want to talk is the error log is confused for end-user (release mode).
We thought an API version warning may be more clear to end-user. |
@arthuryuan1987 I see, |
@arthuryuan1987 I imagine there are some simple rewordings of these error messages which could make things more clear for users. Do you want to submit a PR doing this? Add me as reviewer. |
@ailzhang , For 1), to release our extension packages with corresponding @ezyang , you see, call stack ( in 1). ) might be wordy for end-user. In addtion, end-user will get different call stack, if we have several ATen operator API changes. Yes, I can submit a PR. |
Being able to release a single package for multiple minor versions of ATen is going to be a hard path to go down. We historically have made ZERO abi compatibility guarantees, even across minor versions, and infrastructurally speaking we're not setup to do this in the future. If it makes you feel better, we don't release minor versions that often, so it is essentially just having to do major version releases. |
Agree with what you talk on compatibility among minor versions. If ATen API compatibility only breaks on major version releases, it will be good to backend extensions. We always release extension packages separately for each major versions (PyTorch v1.4, v1.5, v1.6). |
馃殌 Feature
When implementing a new out-of-source ATen backend extension for PyTorch, we find ATen operator APIs are incompatible from version to version (even among minor versions, for example v1.5.x).
We expect
ATen operator API versioning
could be provided to improve user experience, when out-of-source extension does not match with PyTorch (ATen operator API).Motivation
End-user may get runtime error about ATen operator API mismatching, when they try some different PyTorch minor versions with a given Intel Extension for PyTorch. For example,
Extension v0.1 is based on PyTorch v1.5.0.
End-user may try extension v0.1 on PyTorch v1.5.3+, and get ATen runtime error, due to ATen operator API changes.
In addition, different workloads may get different ATen runtime error (different operators API change). ATen runtime error is good enough for extension developer, but is not friendly enough for end-user.
So intuitively, we want to raise a warning ahead of all at runtime, if some ATen operator APIs change, which is more friendly to users, and may not bring risks.
Pitch
We expect to have
ATen operator API versioning
for runtime check and raise a warning at extension loading time, if PyTorch ATen operator API version is not supported by extension.P.S. We thought of checking PyTorch version only. But it would take huge efforts to investigate ATen operator API changes on all PyTorch versions (including all minor versions).
The text was updated successfully, but these errors were encountered: