OPT Example #1356

conceptofmind · 2022-07-23T00:44:22Z

conceptofmind
Jul 23, 2022

Hi all,

What single GPU was used for the fine-tuning example of OPT? I am running the bash script provided in the repository with the default, 6.7b, configuration on an RTX 3090 locally, before moving to a cloud instance, but I am receiving a CUDA Memory error. The 2.7b OPT model throws a CUDA Memory error as well. The 1.3b model seems to work with a batch size of 8.

Also, is there a specific reason for using model.train() instead of engine.train() in this case?

I appreciate the help.

Thank you,

Enrico

Answered by 1SAA

Jul 26, 2022

Hi, Enrico.

First, could you offer a detailed log or more information about your CUDA Memory error.

Second, we are going to update the OPT example by using our new ColoTensor API. For better perfermance and robustness, I suggest to try the example script which will be released soon.

View full answer

1SAA · 2022-07-26T03:12:51Z

1SAA
Jul 26, 2022

Hi, Enrico.

First, could you offer a detailed log or more information about your CUDA Memory error.

Second, we are going to update the OPT example by using our new ColoTensor API. For better perfermance and robustness, I suggest to try the example script which will be released soon.

4 replies

conceptofmind Jul 26, 2022
Author

Hi @1SAA,

I just ran the default configuration again with the 6.7b OPT model and a batch size of 16 on the same RTX 3090 GPU. I have not made any changes to the default example.

The error output in the log file is:

RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 23.68 GiB total capacity; 20.69 GiB already allocated; 121.81 MiB free; 20.87 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2149) of binary

Which seemed like a standard CUDA OOM error to me.

Using bash ./run_clm.sh -8 - 1.3b -1 worked fine for the short training period I tested.

I appreciate any help and will be looking forward to the newer update of the example. I have been swapping over much of my codebase to Colossal with great results so far.

Thank you,

Enrico

1SAA Jul 28, 2022

Hi Enrico,

we have just updated the example of OPT in the main branch of Colossalai-Examples.
You can try it out, though it has some performance issues right now. And we will fix them in few days.

As for your OOM problem, there are three possible positions to trigger your problem.
The first possible position is the creation of the model. Based on your CUDA and code configuration, you need at least 6.7B * sizeof(fp32)
= 6.7B * 4 = 26.8 GB CUDA memory to initialize the model from pretrain. You can add --init_in_cpu, wihch allows the model initialized in CPU memory, in the bash file to avoid this problem.
The second possible position is the training. The OOM problem is caused due to a large training batch size. You can lower the batch size to solve the OOM problem. Currently, we are looking for more ways to enlarge the batch size without causing a OOM problem.
The third possible posistion is the save of the model. Please use the newest colossalai library, which moves the whole model weight to CPU memory to mitigate the burden of CUDA memory.

conceptofmind Aug 1, 2022
Author

Hi @1SAA ,

I appreciate the additional information and response.

The first thing I had done was lower both the batch size and model size when training. In my comment above, I was able to train a 1.3b OPT model with a batch size of 8 using the initial Colossal OPT example.

In my initial comment, I was wondering what single GPU was used as I did not believe the default example model would fit into 24GB of memory available with an RTX3090. You provided the solution in the first position with the equation, 6.7B * 4 = 26.8 GB CUDA memory. Do you know if an A100 was used for the single GPU model initialization?

I am looking forward to seeing ColossalAI's potential solutions for increasing batch size as well as the updated OPT example. I will be sure to update the library to the latest available release as well.

Thank you for your time and help,

Enrico

1SAA Aug 4, 2022

Hi @conceptofmind,

It is true that we run the example code in an A100.
Thanks for the discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OPT Example #1356

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

OPT Example #1356

conceptofmind Jul 23, 2022

Replies: 1 comment · 4 replies

1SAA Jul 26, 2022

conceptofmind Jul 26, 2022 Author

1SAA Jul 28, 2022

conceptofmind Aug 1, 2022 Author

1SAA Aug 4, 2022

conceptofmind
Jul 23, 2022

Replies: 1 comment 4 replies

1SAA
Jul 26, 2022

conceptofmind Jul 26, 2022
Author

conceptofmind Aug 1, 2022
Author