Skip to content

CUDA out of memory in CLI vicuna 7B #657

@mpetruc

Description

@mpetruc

Running inference using vicuna 7B on a 16Gb 3080. Occasionally the script crashes with an error like:
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 16.00 GiB total capacity; 13.69 GiB already allocated; 0 bytes free; 13.78 GiB reserved in total by PyTorch) If reserved memory is >> allocated
memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF.

I modified the modelling_llama.py by adding
import os
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'max_split_size_mb:2000'
also tried with 'max_split_size_mb:4000'

Any suggestions for addressing this issue? Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions