-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cant run the code on Colab #1
Comments
Hi Kenny, Both errors are relative to the GPU used, I guess you ran the code on a T4 GPU. Am I right? T4 GPU does not support bf16 format, so the right change is what you did: fp16 instead of bf16 (or even both false). bf16 provides more precision, FP16 has 5 bits for the exponent, meaning it can encode numbers between -65K and +65K. BF16 has 8 bits in exponent like FP32, meaning it can approximately encode as big numbers as FP32. but both consume the same memory. The second one is an Out of Memory error, I ran the code in a A100 which is "bigger" than T4 but it is not free. But in some tests I could run the code in a T4 changing some parameters (but it was very, very slow). I could not remember right now but probably: Activate nested quantization for 4-bit base models (double quantization) Maximum sequence length to use I am not sure, try with these changes and let me know, if it still fails I can try to reproduce how to run on T4, |
Yeah you are right I ran on a T4 GPU for free. Thz I will try it later!!! And I found that if I set the Batch size into 2 it could run yesterday, but it is slow. After one hour had not finished yet . |
HI Eduardo it works now!!! thz for your help! |
HI Eduardo I'm still encountering an issue related to the T4 environment. I followed your instructions and adjusted the code, which allowed it to work. It completed the training step and saved the model. However, when running the section "# Merge LoRA and base model", Colab crashed due to GPU RAM running out of memory. If possible, could you please create a version that's compatible with the T4? Thank you |
hi
thz for so comprehensive guide which it is really helpful for me to understanding the state-of-art in the field. However, I could not run the code which you shared on Colab.
There are the Erorrs :
#1 on # Define the training arguments box: ValueError: Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0
so I charged it from "fp16 = False, bf16 = True" to "fp16 = True, bf16 = False" on the Setting Global Parameters cell. is it ok to charge it?
#2 after # train cell called OutOfMemoryError since "CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 14.75 GiB total capacity; 10.23 GiB already allocated; 790.81 MiB free; 12.92 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"
this time I have no ideas to fix :(
The text was updated successfully, but these errors were encountered: