Skip to content

[Gradient checkpointing] Enable for Deberta + DebertaV2 + SEW-D#14175

Merged
patrickvonplaten merged 5 commits intohuggingface:masterfrom
patrickvonplaten:add_gradient_checkpointing_debert_v2
Oct 27, 2021
Merged

[Gradient checkpointing] Enable for Deberta + DebertaV2 + SEW-D#14175
patrickvonplaten merged 5 commits intohuggingface:masterfrom
patrickvonplaten:add_gradient_checkpointing_debert_v2

Conversation

@patrickvonplaten
Copy link
Copy Markdown
Contributor

@patrickvonplaten patrickvonplaten commented Oct 27, 2021

What does this PR do?

This PR adds gradient_checkpointing for DebertaV2 and thus also enables it for SEW-D.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@patrickvonplaten patrickvonplaten changed the title up [Gradient checkpointing] Enable for DebertaV2 Oct 27, 2021
self.parent.assertEqual(result.start_logits.shape, (self.batch_size, self.seq_length))
self.parent.assertEqual(result.end_logits.shape, (self.batch_size, self.seq_length))

def create_and_check_forward_and_backwards(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add tests to verify that gradient checkpointing works as expected.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should eventually be part of the common tests with a tests_gradient_checkpointing flag set to True for models that can do it and to False for models that cannot. I had a feeling this was already the case, but it seems not!

(Can/should be done in a separate PR)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it already is! Will remove the newly added tests - test coverage is good already

@patrickvonplaten patrickvonplaten requested review from LysandreJik, anton-l and sgugger and removed request for LysandreJik October 27, 2021 12:30
Copy link
Copy Markdown
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, LGTM

Comment on lines -622 to +641
return_att=False,
output_attentions=False,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaner!

self.parent.assertEqual(result.start_logits.shape, (self.batch_size, self.seq_length))
self.parent.assertEqual(result.end_logits.shape, (self.batch_size, self.seq_length))

def create_and_check_forward_and_backwards(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should eventually be part of the common tests with a tests_gradient_checkpointing flag set to True for models that can do it and to False for models that cannot. I had a feeling this was already the case, but it seems not!

(Can/should be done in a separate PR)

@patrickvonplaten patrickvonplaten changed the title [Gradient checkpointing] Enable for DebertaV2 [Gradient checkpointing] Enable for Deberta + DebertaV2 + SEW-D Oct 27, 2021
Copy link
Copy Markdown
Member

@anton-l anton-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for making it work :)

Copy link
Copy Markdown
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, very clean!

@patrickvonplaten patrickvonplaten merged commit 6200fd7 into huggingface:master Oct 27, 2021
@patrickvonplaten patrickvonplaten deleted the add_gradient_checkpointing_debert_v2 branch October 27, 2021 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants