-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Editing RoBERTa max sequence length #1011
Comments
You don't need to change max sequence length to reduce the memory usage. You can define Also you can reduce batch-size (i.e. |
Thanks for the response! I'm still trying to fine-tune on RACE; I unfortunately can't fit even an 1-element batch into my 11 GB GPU, so it's hard to progress much from here. When using
Now that I think about it, maybe what I want is a |
I don’t think Note that the code runs the network separately for each “choice”, so it’s implicitly storing 5x as many activations as the number of tokens in any one instance. You could also try working off of the |
Makes sense, thanks @myleott ! |
Summary: Pull Request resolved: fairinternal/fairseq-py#1011 Pull Request resolved: facebookresearch#1620 Make Fairseq transformer scriptable. Discussion points on possible code refactoring: (1) Original decoder output is a tuple (x, {"attn": attn, "inner_states": inner_states}). TorchScript does not support dictionary with values of different types (attn: Tensor, inner_states: List[Tensor]). Current workaround is to use [attn] for attention field and access via output["attn"][0] in downstream. This is currently used in fairspeq custom transformer code. Another (maybe) cleaner alternative is to use namedtuple for decoder output but involves tons of downstream changes too. (2) Currently TorchScript doesn't support **kwargs. Some unused arguments might get passed in due to polymorphism. Now the only workaround I can think of is to add possible unused arguments, (e.g. line 666 in transformer.py) Differential Revision: D19234599 fbshipit-source-id: 64b8a64995bd2bf9a24f6b0665609a2856dad840
Summary: Pull Request resolved: fairinternal/fairseq-py#1011 Pull Request resolved: #1620 Make Fairseq transformer scriptable. Discussion points on possible code refactoring: (1) Original decoder output is a tuple (x, {"attn": attn, "inner_states": inner_states}). TorchScript does not support dictionary with values of different types (attn: Tensor, inner_states: List[Tensor]). Current workaround is to use [attn] for attention field and access via output["attn"][0] in downstream. This is currently used in fairspeq custom transformer code. Another (maybe) cleaner alternative is to use namedtuple for decoder output but involves tons of downstream changes too. (2) Currently TorchScript doesn't support **kwargs. Some unused arguments might get passed in due to polymorphism. Now the only workaround I can think of is to add possible unused arguments, (e.g. line 666 in transformer.py) Reviewed By: myleott Differential Revision: D19234599 fbshipit-source-id: db3dd364ecf3ae14fb7ac8c0928bd0ebe250f19d
Summary: Pull Request resolved: fairinternal/fairseq-py#1011 Pull Request resolved: facebookresearch#1620 Make Fairseq transformer scriptable. Discussion points on possible code refactoring: (1) Original decoder output is a tuple (x, {"attn": attn, "inner_states": inner_states}). TorchScript does not support dictionary with values of different types (attn: Tensor, inner_states: List[Tensor]). Current workaround is to use [attn] for attention field and access via output["attn"][0] in downstream. This is currently used in fairspeq custom transformer code. Another (maybe) cleaner alternative is to use namedtuple for decoder output but involves tons of downstream changes too. (2) Currently TorchScript doesn't support **kwargs. Some unused arguments might get passed in due to polymorphism. Now the only workaround I can think of is to add possible unused arguments, (e.g. line 666 in transformer.py) Reviewed By: myleott Differential Revision: D19234599 fbshipit-source-id: db3dd364ecf3ae14fb7ac8c0928bd0ebe250f19d
Summary: Pull Request resolved: fairinternal/fairseq-py#1011 Pull Request resolved: facebookresearch#1620 Make Fairseq transformer scriptable. Discussion points on possible code refactoring: (1) Original decoder output is a tuple (x, {"attn": attn, "inner_states": inner_states}). TorchScript does not support dictionary with values of different types (attn: Tensor, inner_states: List[Tensor]). Current workaround is to use [attn] for attention field and access via output["attn"][0] in downstream. This is currently used in fairspeq custom transformer code. Another (maybe) cleaner alternative is to use namedtuple for decoder output but involves tons of downstream changes too. (2) Currently TorchScript doesn't support **kwargs. Some unused arguments might get passed in due to polymorphism. Now the only workaround I can think of is to add possible unused arguments, (e.g. line 666 in transformer.py) Reviewed By: myleott Differential Revision: D19234599 fbshipit-source-id: db3dd364ecf3ae14fb7ac8c0928bd0ebe250f19d
Hi!
Is it possible to use the pretrained RoBERTa with a smaller max sequence length? I'm looking for the analogue of
--max_seq_length
in the original BERT code; the use case is that I'd like to try to reduce my GPU memory usage during fine-tuning (more info at https://github.com/google-research/bert#out-of-memory-issues ).Sorry if this is a silly question, I just haven't been able to find the setting / am not sure if it's supported. Thanks!!
The text was updated successfully, but these errors were encountered: