[Gradient checkpointing] Enable for Deberta + DebertaV2 + SEW-D#14175
Conversation
tests/test_modeling_deberta_v2.py
Outdated
| self.parent.assertEqual(result.start_logits.shape, (self.batch_size, self.seq_length)) | ||
| self.parent.assertEqual(result.end_logits.shape, (self.batch_size, self.seq_length)) | ||
|
|
||
| def create_and_check_forward_and_backwards( |
There was a problem hiding this comment.
add tests to verify that gradient checkpointing works as expected.
There was a problem hiding this comment.
This should eventually be part of the common tests with a tests_gradient_checkpointing flag set to True for models that can do it and to False for models that cannot. I had a feeling this was already the case, but it seems not!
(Can/should be done in a separate PR)
There was a problem hiding this comment.
Actually it already is! Will remove the newly added tests - test coverage is good already
| return_att=False, | ||
| output_attentions=False, |
tests/test_modeling_deberta_v2.py
Outdated
| self.parent.assertEqual(result.start_logits.shape, (self.batch_size, self.seq_length)) | ||
| self.parent.assertEqual(result.end_logits.shape, (self.batch_size, self.seq_length)) | ||
|
|
||
| def create_and_check_forward_and_backwards( |
There was a problem hiding this comment.
This should eventually be part of the common tests with a tests_gradient_checkpointing flag set to True for models that can do it and to False for models that cannot. I had a feeling this was already the case, but it seems not!
(Can/should be done in a separate PR)
anton-l
left a comment
There was a problem hiding this comment.
LGTM, thanks for making it work :)
What does this PR do?
This PR adds
gradient_checkpointingfor DebertaV2 and thus also enables it for SEW-D.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.