Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managing GPU memory for token length more than 4000 #7

Closed
sibinbh opened this issue May 21, 2022 · 1 comment
Closed

Managing GPU memory for token length more than 4000 #7

sibinbh opened this issue May 21, 2022 · 1 comment

Comments

@sibinbh
Copy link

sibinbh commented May 21, 2022

Hi

Your code helped a lot to understand the chunking process. When i'm trying to fine tune using token length of 4000+ the model breaks with Out of memory exception. I have tried a batch size of 2 and on a larger 48GB GPU as well. I can see we are continuously pushing into GPU which causes memory exhaustion. Is there a way to better manage the memory for samples which are represented by 4000+ tokens.

@MichalBrzozowski91
Copy link
Collaborator

Hi, we made some major changes in this repo. One added feature is the parameter maximal_text_length. It allows to use truncation before the chunking process. As you mentioned, the process for longer texts requires a lot of GPU memory. Maybe setting the parameter to something like 4096 or 2048 would be the compromise between memory constraints and using the longer context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants