New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sequence length limited #17
Comments
Hmm, that doesn't seem right, have you tried running the colab? Want to share your script? |
How deep is your network? Try turning on reversibility? |
The depths of both the encoder and decoder are 3. Let me give you the feedback later. |
how much memory are you working with? can you show me your full settings? |
The setting is shown as follows: Similarly, I also had this problem in implementing the reformer model, When I implemented about 500 batches, it was out of memory. |
So firstly, turn on reversibility, and second, you can decrease your batch size and do gradient accumulation instead |
I turned on reversibility and set the batch size 8. But the training stopped at batch 172 and RuntimeError happened because CUDA out of memory. |
@guohanyang1994 make your batch size even smaller and increase your gradient accumulation |
Could I ask why CUDA out of memory occurs during training (batch 172) rather than at the beginning of training? |
Hard to say without seeing the code, but are your batches different sizes? It's possible it takes that long to hit the longest combination of sequence lengths. |
The batch size is fixed at 4. The sequence length is set to 2048 at most, but it still stops at about the 1200th batch. I am still confused :( |
@guohanyang1994 are you sure you don't have a memory leak? Routing Transformer has been trained on GPT3 sized datasets successfully by others, so I doubt there's any problems with the framework |
Oh yeah, exactly it is the memory leak problem. I have fixed it and thank you so much. Sorry to bother you as an NLP beginner. |
ok np :D |
I tried this model, but the sequence length that the Routing Transformer can process seemed limited. I set the batch size as 16 and the sequence length as 1024, but it was out of GPU memory.
The text was updated successfully, but these errors were encountered: