Cant run the code on Colab #1

kennyluke1023 · 2023-09-01T07:30:59Z

hi

thz for so comprehensive guide which it is really helpful for me to understanding the state-of-art in the field. However, I could not run the code which you shared on Colab.

There are the Erorrs :
#1 on # Define the training arguments box: ValueError: Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0
so I charged it from "fp16 = False, bf16 = True" to "fp16 = True, bf16 = False" on the Setting Global Parameters cell. is it ok to charge it?

#2 after # train cell called OutOfMemoryError since "CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 14.75 GiB total capacity; 10.23 GiB already allocated; 790.81 MiB free; 12.92 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"
this time I have no ideas to fix :(

edumunozsala · 2023-09-01T16:19:31Z

Hi Kenny,

Both errors are relative to the GPU used, I guess you ran the code on a T4 GPU. Am I right?

T4 GPU does not support bf16 format, so the right change is what you did: fp16 instead of bf16 (or even both false). bf16 provides more precision, FP16 has 5 bits for the exponent, meaning it can encode numbers between -65K and +65K. BF16 has 8 bits in exponent like FP32, meaning it can approximately encode as big numbers as FP32. but both consume the same memory.

The second one is an Out of Memory error, I ran the code in a A100 which is "bigger" than T4 but it is not free. But in some tests I could run the code in a T4 changing some parameters (but it was very, very slow). I could not remember right now but probably:

Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False

Maximum sequence length to use
max_seq_length = None (or something smaller than 2048)

I am not sure, try with these changes and let me know, if it still fails I can try to reproduce how to run on T4,

kennyluke1023 · 2023-09-02T07:04:50Z

Hi Kenny,

Both errors are relative to the GPU used, I guess you ran the code on a T4 GPU. Am I right?

T4 GPU does not support bf16 format, so the right change is what you did: fp16 instead of bf16 (or even both false). bf16 provides more precision, FP16 has 5 bits for the exponent, meaning it can encode numbers between -65K and +65K. BF16 has 8 bits in exponent like FP32, meaning it can approximately encode as big numbers as FP32. but both consume the same memory.

The second one is an Out of Memory error, I ran the code in a A100 which is "bigger" than T4 but it is not free. But in some tests I could run the code in a T4 changing some parameters (but it was very, very slow). I could not remember right now but probably:

Activate nested quantization for 4-bit base models (double quantization) use_nested_quant = False

Maximum sequence length to use max_seq_length = None (or something smaller than 2048)

I am not sure, try with these changes and let me know, if it still fails I can try to reproduce how to run on T4,

Yeah you are right I ran on a T4 GPU for free. Thz I will try it later!!! And I found that if I set the Batch size into 2 it could run yesterday, but it is slow. After one hour had not finished yet .

kennyluke1023 · 2023-09-02T07:24:28Z

HI Eduardo

it works now!!! thz for your help!

kennyluke1023 · 2023-09-04T03:26:41Z

HI Eduardo

I'm still encountering an issue related to the T4 environment. I followed your instructions and adjusted the code, which allowed it to work. It completed the training step and saved the model. However, when running the section "# Merge LoRA and base model", Colab crashed due to GPU RAM running out of memory.

If possible, could you please create a version that's compatible with the T4? Thank you

kennyluke1023 closed this as completed Sep 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cant run the code on Colab #1

Cant run the code on Colab #1

kennyluke1023 commented Sep 1, 2023 •

edited

Loading

edumunozsala commented Sep 1, 2023 •

edited

Loading

kennyluke1023 commented Sep 2, 2023

kennyluke1023 commented Sep 2, 2023

kennyluke1023 commented Sep 4, 2023

Cant run the code on Colab #1

Cant run the code on Colab #1

Comments

kennyluke1023 commented Sep 1, 2023 • edited Loading

edumunozsala commented Sep 1, 2023 • edited Loading

kennyluke1023 commented Sep 2, 2023

kennyluke1023 commented Sep 2, 2023

kennyluke1023 commented Sep 4, 2023

kennyluke1023 commented Sep 1, 2023 •

edited

Loading

edumunozsala commented Sep 1, 2023 •

edited

Loading