-
Notifications
You must be signed in to change notification settings - Fork 31.3k
DeBERTa/DeBERTa-v2/SEW Support for torch 1.11 #16043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
|
||
| logger = logging.get_logger(__name__) | ||
|
|
||
| convert_to_dtype = not version.parse(torch.__version__) < version.parse("1.11") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a better name for that value would be something more specific to its purpose, but then the name starts being long and @sgugger gets angry, wdyt?
| convert_to_dtype = not version.parse(torch.__version__) < version.parse("1.11") | |
| convert_softmax_tensor_to_dtype = not version.parse(torch.__version__) < version.parse("1.11") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd maybe just add a do_convert... to make it a bit clearer it's a flag and not a function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't mind the name after :-)
patrickvonplaten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the torch int div we wrote our own function that does the test internally. I think we should do this the same way and write our own _softmax_backward_data inside pytorch_utils which will do the test internally, then import this one here.
|
Addressed your comment @sgugger, could you do a second review? As seen with Sylvain offline, I've moved out the |
| is_torch_less_than_1_8 = version.parse(torch.__version__) < version.parse("1.8.0") | ||
| is_torch_less_than_1_11 = version.parse(torch.__version__) < version.parse("1.11") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The torch version cannot change during runtime, so this is harmless
| return torch.div(tensor1, tensor2, rounding_mode="floor") | ||
|
|
||
|
|
||
| def softmax_backward_data(parent, grad_output, output, dim, self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The self comes from the signature of the PyTorch function which is identical
|
@sgugger @LysandreJik Thanks for your awesome work on building this immensely valuable ecosystem (and community!). |
|
I don't think we have a patch planned. We will have 4.18 released probably next week instead :-) |
AWESOME! looking forward to it! |
|
Note that this evaluates True for pre-releases such as '1.11.0a0+b6df043'. So the the error is still present. |
The internal
torchmethod_softmax_backward_datachanged API between 1.10 and 1.11, from requiring a tensor as its last output to requiring a size.This PR updates the concerned models so that they are correctly supported.
Torch 1.11: https://github.com/pytorch/pytorch/blame/e47a5a64bbf4d388b70397e3237f9d5710ee4c9c/tools/autograd/derivatives.yaml#L1861
Before: https://github.com/pytorch/pytorch/blame/768cfaa8f86bf7c7b0af441d1536f060274c27a0/tools/autograd/derivatives.yaml#L1704