Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impossible to finetune model bigger than 124M or 125M parameters (colab) #291

Open
Blazeolmo opened this issue Mar 10, 2022 · 1 comment

Comments

@Blazeolmo
Copy link

When i try to finetune gpt-2 355M (because gpt-neo 350M is broken), no matter what i do i will always get a cuda oom error. not with a t4 (and no, i'm not planning on getting colab pro just to play with funy stupeed text gen ai), not with fp16 (by the way, that doesnt work as well) and not even with gradient_checkpointing=True.
What the duck am i supposed to do? create vram out of thin air? cant there be something that limits the vram and empties it (i find it astonishing that there's no way to empty the vram other than factory reset of the vm. like, it's vram, not disk storage) to avoid not being able to train/loosing half of the progress/having to factory reset because the vram is full of the previous failed training attempt)

Any potentially usefull help is appreciated. sorry if i was a little rude, i'm just tired of stuff on github being more broken than a chair that has been set on fire.

@Mennaruuk
Copy link

Mennaruuk commented Mar 14, 2022

This issue is most likely not due to the script. It has been reproduced in three different scripts this year. I reported it to Google, I hope they can take a look at it. But yes, I've faced the same issue you're having. Personally, I don't see this getting fixed soon: it looks like Google is allocating fewer high-performance GPUs to free-tier users, accounting for more crashes when training with bigger models. For the time being, consider training with the 124M model using GPU, or bigger models using TPU, which is not time-ideal but at least it will work. I don't recommend using another service such as Gradient. Their free tier won't get you anywhere with even the smallest model. Please consider wording this differently next time, though: there is a human on the other side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants