-
Notifications
You must be signed in to change notification settings - Fork 316
-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA out of memory #180
Comments
I didn't see any issue of your code after a quick scan. Some suggestion for your consideration:
|
Hi @JingqingZ : If we have below configuration, basically 8 vCpu's of 12 GB each, would it work or would we still need to implement Model Parallel? Or each one has to be 16GB each at least? |
K80 can be struggling on this from my experience. |
How can I reduce max input length or max target length ? Do you mean by max input length the number of articles in the training dataset? |
@JingqingZ : Instead of K80, do you have any recommended hardware configuration which you suggest will work fine for pegasus-large fine tuning? |
You may truncate the input text (and target text) into shorter length, for example 256 tokens for input text instead of 512 or 1024 tokens. |
V100 16 GB (or 32 GB) works fine for me. Or you may try TPU v2 or v3. |
I am sorry this particular issue is out of the scope of my knowledge. |
Thank you for your previous tip about using 16 GB of memory. |
I did run the fine-tuning scripts in a virtual environment and it worked. Later on, I created a new virtual environment and when i run the model again the following error keeps popping out:
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 2.00 GiB total capacity; 1.28 GiB already allocated; 4.55 MiB free; 1.28 GiB reserved in total by PyTorch)
Note: batch size is 1
The fine-tuning Script: https://gist.github.com/jiahao87/50cec29725824da7ff6dd9314b53c4b3
The text was updated successfully, but these errors were encountered: