Loading data set, using only the first card, memory overflow #967

WillWillWong · 2024-03-13T01:24:20Z

WillWillWong
Mar 13, 2024

When I was trying to reason the 13b model, I loaded the data set using only one card to run. My environment is 2*3090, and there are the following problems.torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 136.00 MiB. GPU 0 has a total capacity of 23.48 GiB of which 124.81 MiB is free. Including non-PyTorch memory, this process has 23.31 GiB memory in use. Of the allocated memory 23.06 GiB is allocated by PyTorch, and 2.76 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables).Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:04<00:09, 4.91s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:09<00:04, 4.95s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:11<00:05, 5.97s/it]
In fact, he only used the first card,

bittersweet1999 · 2024-03-13T02:52:20Z

bittersweet1999
Mar 13, 2024
Collaborator

Besides --num-gpus, you also need to write 2gpus in your model cfg, just like below:

models = [
    # LLaMA 13B
    dict(
        type=HuggingFaceCausalLM,
        abbr='llama-13b-hf',
        path="huggyllama/llama-13b",
        tokenizer_path='huggyllama/llama-13b',
        tokenizer_kwargs=dict(padding_side='left',
                              truncation_side='left',
                              use_fast=False,
                              ),
        max_out_len=100,
        max_seq_len=2048,
        batch_size=8,
        model_kwargs=dict(device_map='auto'),
        batch_padding=False, # if false, inference with for-loop without batch padding
        run_cfg=dict(num_gpus=2, num_procs=1),
    )
]

see in the https://github.com/open-compass/opencompass/blob/main/configs/models/hf_llama/hf_llama2_13b_chat.py

4 replies

WillWillWong Mar 13, 2024
Author

I'm reasoning directly from the model address, as you can see in the second picture

WillWillWong Mar 13, 2024
Author

I also wrote the model configuration file, but the effect is the same when using it. The memory is still overflowing

bittersweet1999 Mar 13, 2024
Collaborator

Please try device_map='auto'

WillWillWong Mar 13, 2024
Author

You are right, I had to use cuda when testing on glm3 and didn't think it would work on llama. What is the reason.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading data set, using only the first card, memory overflow #967

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Loading data set, using only the first card, memory overflow #967

WillWillWong Mar 13, 2024

Replies: 1 comment · 4 replies

bittersweet1999 Mar 13, 2024 Collaborator

WillWillWong Mar 13, 2024 Author

WillWillWong Mar 13, 2024 Author

bittersweet1999 Mar 13, 2024 Collaborator

WillWillWong Mar 13, 2024 Author

WillWillWong
Mar 13, 2024

Replies: 1 comment 4 replies

bittersweet1999
Mar 13, 2024
Collaborator

WillWillWong Mar 13, 2024
Author

WillWillWong Mar 13, 2024
Author

bittersweet1999 Mar 13, 2024
Collaborator

WillWillWong Mar 13, 2024
Author