CUDA out of memory in CLI vicuna 7B

Running inference using vicuna 7B on a 16Gb 3080. Occasionally the script crashes with an error like:
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 16.00 GiB total capacity; 13.69 GiB already allocated; 0 bytes free; 13.78 GiB reserved in total by PyTorch) If reserved memory is >> allocated 
memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF.

I modified the modelling_llama.py by adding 
import os
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'max_split_size_mb:2000' 
also tried with  'max_split_size_mb:4000' 

Any suggestions for addressing this issue? Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory in CLI vicuna 7B #657

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

CUDA out of memory in CLI vicuna 7B #657

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions