Load HuggingFace `transformers` models over multiple GPUs with a custom `device_map`.
First, explore how `accelerate` calculates its `max_memory` (a mapping between devices and their maximum available memory), following https://github.com/huggingface/accelerate/blob/v1.0.0rc1/src/accelerate/utils/modeling.py#L842C37-L842C63

In [7]:
import torch

def print_gpu_memory():
    print(f"The current device is {torch.cuda.current_device()}")
    for i in range(torch.cuda.device_count()):
        print(f"GPU {i}: {torch.cuda.mem_get_info(i)[0] / 1024 / 1024 / 1024:.2f} GB free, {torch.cuda.mem_get_info(i)[1] / 1024 / 1024 / 1024:.2f} GB total")

print_gpu_memory()

The current device is 0
GPU 0: 44.09 GB free, 44.35 GB total
GPU 1: 44.09 GB free, 44.35 GB total


In [8]:
from transformers import AutoModelForCausalLM
import gc


def test_memory_leak(model_name="gpt2"):
    print_gpu_memory()
    print("Loading model...")
    model = AutoModelForCausalLM.from_pretrained(model_name, device_map="balanced_low_0", cache_dir="/workspace/hf_cache")
    print_gpu_memory()
    print("Deleting model...")
    del model
    print_gpu_memory()
    print("Collecting garbage...")
    gc.collect()
    torch.cuda.empty_cache()
    print_gpu_memory()

test_memory_leak(model_name="gpt2")

The current device is 0
GPU 0: 44.09 GB free, 44.35 GB total
GPU 1: 44.09 GB free, 44.35 GB total
Loading model...
The current device is 0
GPU 0: 44.09 GB free, 44.35 GB total
GPU 1: 43.56 GB free, 44.35 GB total
Deleting model...
The current device is 0
GPU 0: 44.09 GB free, 44.35 GB total
GPU 1: 43.56 GB free, 44.35 GB total
Collecting garbage...
The current device is 0
GPU 0: 44.09 GB free, 44.35 GB total
GPU 1: 44.09 GB free, 44.35 GB total


In [9]:
test_memory_leak(model_name="lmsys/vicuna-7b-v1.3")

The current device is 0
GPU 0: 44.09 GB free, 44.35 GB total
GPU 1: 44.09 GB free, 44.35 GB total
Loading model...


Loading checkpoint shards: 100%|██████████| 2/2 [00:10<00:00,  5.50s/it]


The current device is 0
GPU 0: 44.09 GB free, 44.35 GB total
GPU 1: 18.98 GB free, 44.35 GB total
Deleting model...
The current device is 0
GPU 0: 44.09 GB free, 44.35 GB total
GPU 1: 18.98 GB free, 44.35 GB total
Collecting garbage...
The current device is 0
GPU 0: 44.09 GB free, 44.35 GB total
GPU 1: 44.09 GB free, 44.35 GB total
