Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feed forward chunking others #6365

Merged

Conversation

Pradhy729
Copy link
Contributor

Adding feed forward chunking to other models. Based on #6024

@codecov
Copy link

codecov bot commented Aug 9, 2020

Codecov Report

Merging #6365 into master will increase coverage by 2.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6365      +/-   ##
==========================================
+ Coverage   78.42%   80.47%   +2.04%     
==========================================
  Files         156      156              
  Lines       28129    28152      +23     
==========================================
+ Hits        22061    22655     +594     
+ Misses       6068     5497     -571     
Impacted Files Coverage Δ
src/transformers/configuration_reformer.py 100.00% <ø> (ø)
src/transformers/modeling_bert.py 88.26% <ø> (-0.17%) ⬇️
src/transformers/modeling_utils.py 87.35% <ø> (+0.19%) ⬆️
src/transformers/configuration_utils.py 96.62% <100.00%> (+1.38%) ⬆️
src/transformers/modeling_albert.py 83.50% <100.00%> (+0.13%) ⬆️
src/transformers/modeling_distilbert.py 97.84% <100.00%> (+1.65%) ⬆️
src/transformers/modeling_longformer.py 92.02% <100.00%> (+0.07%) ⬆️
src/transformers/modeling_reformer.py 96.09% <100.00%> (ø)
src/transformers/modeling_xlm.py 91.31% <100.00%> (+0.07%) ⬆️
src/transformers/modeling_xlnet.py 83.42% <100.00%> (+0.11%) ⬆️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb7330b...80c6b27. Read the comment docs.

@patrickvonplaten
Copy link
Contributor

#6024 is merged :-) Great work @Pradhy729! It would be a good idea to rebase this PR to current master so that you can easily leverage the tests that were added in #6024 just by setting the flag test_chunking=True for all models you want to add here.

@Pradhy729
Copy link
Contributor Author

Yes - definitely will do. Was just waiting for the merge. Thanks for adding the tests.

@Pradhy729
Copy link
Contributor Author

Pradhy729 commented Aug 12, 2020

@patrickvonplaten Feed forward chunking has been added for the following:

  1. Albert
  2. Distillbert
  3. Longformer
  4. XLNet
  5. XLM

Also, changed model signature to have callable as first positional argument.

@Pradhy729 Pradhy729 changed the title [WIP] Feed forward chunking others Feed forward chunking others Aug 12, 2020
@Pradhy729
Copy link
Contributor Author

Hi @patrickvonplaten --> Can you review and approve if this looks good?

@@ -188,6 +188,7 @@ def __init__(self, **kwargs):
self.pad_token_id = kwargs.pop("pad_token_id", None)
self.eos_token_id = kwargs.pop("eos_token_id", None)
self.decoder_start_token_id = kwargs.pop("decoder_start_token_id", None)
self.chunk_size_feed_forward = kwargs.pop("chunk_size_feed_forwar", 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, thanks for adding this!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the docstring from Reformer to this file and delete the corresponding docstring / config variable from reformer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it's aleady done - never mind

@@ -1447,7 +1447,7 @@ def prune_layer(


def apply_chunking_to_forward(
chunk_size: int, chunk_dim: int, forward_fn: Callable[..., torch.Tensor], *input_tensors
forward_fn: Callable[..., torch.Tensor], chunk_size: int, chunk_dim: int, *input_tensors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for changing that. @LysandreJik - as you said this is the better order of the arguments and should be fine in terms of breaking backward compatibility

@@ -60,7 +60,7 @@ class ModelTesterMixin:
test_resize_embeddings = True
test_head_masking = True
test_missing_keys = True
test_chunking = False
test_chunking = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the test_chunking=True statement in other test files as well?

@patrickvonplaten
Copy link
Contributor

Hey @Pradhy729 - this looks great!

  1. Can you add the docstrings for chunk_size_feed_forward as explained in the comment above and delete the corresponding config param in Reformer and the Reformer docstring (You can just cut & paste the Reformer docstring here)
  2. Can you please remove the test_chunking=True statements in the model specific test files -> I think it's only in test_modeling_bert.py actually.
  3. It would be awesome if you try to rebase the branch to master (git fetch upstream master, git rebase upstream/master).
    If you have too many merge conflicts - then I'll do it :-)

@Pradhy729
Copy link
Contributor Author

@patrickvonplaten
Done. Please review and let me know if there's anything else.

@patrickvonplaten
Copy link
Contributor

LGTM! @Pradhy729 - great work!

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition, thanks a lot!

@patrickvonplaten
Copy link
Contributor

Merging! Good job @Pradhy729

@patrickvonplaten patrickvonplaten merged commit 2a7402c into huggingface:master Aug 19, 2020
Zigur pushed a commit to Zigur/transformers that referenced this pull request Oct 26, 2020
* Feed forward chunking for Distilbert & Albert

* Added ff chunking for many other models

* Change model signature

* Added chunking for XLM

* Cleaned up by removing some variables.

* remove test_chunking flag

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
fabiocapsouza pushed a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
* Feed forward chunking for Distilbert & Albert

* Added ff chunking for many other models

* Change model signature

* Added chunking for XLM

* Cleaned up by removing some variables.

* remove test_chunking flag

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants