Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cant run the code on Colab #1

Closed
kennyluke1023 opened this issue Sep 1, 2023 · 4 comments
Closed

Cant run the code on Colab #1

kennyluke1023 opened this issue Sep 1, 2023 · 4 comments

Comments

@kennyluke1023
Copy link

kennyluke1023 commented Sep 1, 2023

hi

thz for so comprehensive guide which it is really helpful for me to understanding the state-of-art in the field. However, I could not run the code which you shared on Colab.

There are the Erorrs :
#1 on # Define the training arguments box: ValueError: Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0
so I charged it from "fp16 = False, bf16 = True" to "fp16 = True, bf16 = False" on the Setting Global Parameters cell. is it ok to charge it?

#2 after # train cell called OutOfMemoryError since "CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 14.75 GiB total capacity; 10.23 GiB already allocated; 790.81 MiB free; 12.92 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"
this time I have no ideas to fix :(

@edumunozsala
Copy link
Owner

edumunozsala commented Sep 1, 2023

Hi Kenny,

Both errors are relative to the GPU used, I guess you ran the code on a T4 GPU. Am I right?

T4 GPU does not support bf16 format, so the right change is what you did: fp16 instead of bf16 (or even both false). bf16 provides more precision, FP16 has 5 bits for the exponent, meaning it can encode numbers between -65K and +65K. BF16 has 8 bits in exponent like FP32, meaning it can approximately encode as big numbers as FP32. but both consume the same memory.

The second one is an Out of Memory error, I ran the code in a A100 which is "bigger" than T4 but it is not free. But in some tests I could run the code in a T4 changing some parameters (but it was very, very slow). I could not remember right now but probably:

Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False

Maximum sequence length to use
max_seq_length = None (or something smaller than 2048)

I am not sure, try with these changes and let me know, if it still fails I can try to reproduce how to run on T4,

@kennyluke1023
Copy link
Author

Hi Kenny,

Both errors are relative to the GPU used, I guess you ran the code on a T4 GPU. Am I right?

T4 GPU does not support bf16 format, so the right change is what you did: fp16 instead of bf16 (or even both false). bf16 provides more precision, FP16 has 5 bits for the exponent, meaning it can encode numbers between -65K and +65K. BF16 has 8 bits in exponent like FP32, meaning it can approximately encode as big numbers as FP32. but both consume the same memory.

The second one is an Out of Memory error, I ran the code in a A100 which is "bigger" than T4 but it is not free. But in some tests I could run the code in a T4 changing some parameters (but it was very, very slow). I could not remember right now but probably:

Activate nested quantization for 4-bit base models (double quantization) use_nested_quant = False

Maximum sequence length to use max_seq_length = None (or something smaller than 2048)

I am not sure, try with these changes and let me know, if it still fails I can try to reproduce how to run on T4,

Yeah you are right I ran on a T4 GPU for free. Thz I will try it later!!! And I found that if I set the Batch size into 2 it could run yesterday, but it is slow. After one hour had not finished yet .

@kennyluke1023
Copy link
Author

HI Eduardo

it works now!!! thz for your help!

@kennyluke1023
Copy link
Author

HI Eduardo

I'm still encountering an issue related to the T4 environment. I followed your instructions and adjusted the code, which allowed it to work. It completed the training step and saved the model. However, when running the section "# Merge LoRA and base model", Colab crashed due to GPU RAM running out of memory.

If possible, could you please create a version that's compatible with the T4? Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants