CUDA OOM error when using Llama converted weights, but not repository weights #31648

matthewclso · 2024-06-26T22:05:43Z

System Info

transformers version: 4.41.2
Platform: Linux-6.5.0-1020-aws-x86_64-with-glibc2.35
Python version: 3.11.5
Huggingface_hub version: 0.23.4
Safetensors version: 0.4.3
Accelerate version: 0.31.0
Accelerate config: not found
PyTorch version (GPU?): 2.3.1+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: 8x L4 GPUs
Using distributed or parallel set-up in script?: DeepSpeed (ZeRO not enabled)

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

import pandas as pd

from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM
from transformers.integrations.deepspeed import HfDeepSpeedConfig
from peft import get_peft_model, LoraConfig, TaskType
import torch
import deepspeed
from tqdm import tqdm

num_epochs = 100

deepspeed_config = {
    "train_batch_size": 8,
    "optimizer": {
        "type": "Adam",
        "params": {
            "lr": 2e-4,
            "betas": [0.9, 0.95],
            "weight_decay": 0,
        },
    },
    "scheduler": {
        "type": "WarmupLR",
        "params": {
            "warmup_min_lr": 0,
            "warmup_max_lr": 2e-4,
            "warmup_num_steps": 1000,
        },
    },
    "bf16": {
        "enabled": True,
    },
    "data_efficency": {
        "enabled": True,
        "data_sampling": {
            "enabled": True,
            "num_epochs": num_epochs,
        },
    },
}

dschf = HfDeepSpeedConfig(deepspeed_config)

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.add_special_tokens(
    {
        "pad_token": "<|pad|>",
        "unk_token": "<|unk|>",
    }
)
tokenizer.padding_side = "left"
max_len = 1024

config = AutoConfig.from_pretrained(model_name)
config.vocab_size = len(tokenizer)
config.pad_token_id = tokenizer.pad_token_id
config.padding_idx = tokenizer.pad_token_id

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3-8B",
    torch_dtype=torch.bfloat16,
    config=config,
    ignore_mismatched_sizes=True,
)

model.resize_token_embeddings(len(tokenizer))

model, optimizer, train_dataloader, lr_scheduler = deepspeed.initialize(
    model=model,
    config=deepspeed_config,
)

device = torch.device("cuda")

test_tensor = torch.zeros((1, max_len), dtype=torch.int64).to(device)
output = model(test_tensor)

The above code fails without error on my L4 GPUs. However, when I download the weights from the Llama3 repository, convert the weights using

./download.sh
...
python src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir Meta-Llama-3-8B --model_size 8B --llama_version 3 --output_dir weights/Meta-Llama-3-8B-HF

and run this code -

model = AutoModelForCausalLM.from_pretrained(
    "weights/Meta-Llama-3-8B-HF",
    torch_dtype=torch.bfloat16,
    config=config,
    ignore_mismatched_sizes=True,
)
...
test_tensor = torch.zeros((1, max_len), dtype=torch.int64).to(device)
output = model(test_tensor)

I get a CUDA OOM error. I also tried the same code on an 8x V100 32GB machine, with the same results. Any stage of ZeRO optimization also doesn't make any difference.

Expected behavior

Using downloaded and converted weights vs. using Hugging Face repository weights should not make a difference in terms of CUDA memory usage.

The text was updated successfully, but these errors were encountered:

matthewclso closed this as completed Jun 26, 2024

matthewclso closed this as not planned Won't fix, can't repro, duplicate, stale Jun 26, 2024

matthewclso closed this as completed Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA OOM error when using Llama converted weights, but not repository weights #31648

CUDA OOM error when using Llama converted weights, but not repository weights #31648

matthewclso commented Jun 26, 2024 •

edited

Loading

CUDA OOM error when using Llama converted weights, but not repository weights #31648

CUDA OOM error when using Llama converted weights, but not repository weights #31648

Comments

matthewclso commented Jun 26, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

matthewclso commented Jun 26, 2024 •

edited

Loading