Managing GPU memory for token length more than 4000 #7

sibinbh · 2022-05-21T11:54:13Z

Hi

Your code helped a lot to understand the chunking process. When i'm trying to fine tune using token length of 4000+ the model breaks with Out of memory exception. I have tried a batch size of 2 and on a larger 48GB GPU as well. I can see we are continuously pushing into GPU which causes memory exhaustion. Is there a way to better manage the memory for samples which are represented by 4000+ tokens.

MichalBrzozowski91 · 2023-03-10T15:37:19Z

Hi, we made some major changes in this repo. One added feature is the parameter maximal_text_length. It allows to use truncation before the chunking process. As you mentioned, the process for longer texts requires a lot of GPU memory. Maybe setting the parameter to something like 4096 or 2048 would be the compromise between memory constraints and using the longer context.

MichalBrzozowski91 closed this as completed Mar 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Managing GPU memory for token length more than 4000 #7

Managing GPU memory for token length more than 4000 #7

sibinbh commented May 21, 2022

MichalBrzozowski91 commented Mar 10, 2023

Managing GPU memory for token length more than 4000 #7

Managing GPU memory for token length more than 4000 #7

Comments

sibinbh commented May 21, 2022

MichalBrzozowski91 commented Mar 10, 2023