chinese-llama-2-13b-hf Baichuan2-13B-Chat等 13B 模型A100-40G 微调OOM #2908

liboaccn · 2024-03-20T11:28:07Z

Reminder

I have read the README and searched the existing issues.

Reproduction

CUDA_VISIBLE_DEVICES=$GPU_NO python /home/users/xxx/code/LLaMA-Factory/src/train_bash.py
--stage sft
--do_train
--model_name_or_path $MODEL_PATH
--dataset $DATASET
--dataset_dir $DATASET_DIR
--template $TEMP
--finetuning_type lora
--lora_target $TARGET
--output_dir $SFT_CHECKPOINT
--overwrite_cache
--overwrite_output_dir
--per_device_train_batch_size 4
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 3.0
--plot_loss
--fp16

Expected behavior

试过几个13B 模型微调的时候都OOM报错

Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 134.00 MiB. GPU 0 has a total capacity of 39.39 GiB of which 107.94 MiB is free. Including non-PyTorch memory, this process has 39.28 GiB memory in use. Of the allocated memory 37.86 GiB is allocated by PyTorch, and 93.47 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

System Info

accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=False,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=4,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=/home/users/xxx/code/csaft/sft/../output/Baichuan2-13B-Chat/lawyer_llama_sh/,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=steps,
lr_scheduler_kwargs={},
lr_scheduler_type=cosine,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_torch,
optim_args=None,
output_dir=/home/users/xxx/code/csaft/sft/../output/Baichuan2-13B-Chat/lawyer_llama_sh,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=4,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=/home/users/xxx/code/csaft/sft/../output/Baichuan2-13B-Chat/lawyer_llama_sh,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=1000,
save_strategy=steps,
save_total_limit=None,
seed=42,
skip_memory_metrics=True,
sortish_sampler=False,
split_batches=None,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
)

Others

No response

hiyouga · 2024-03-20T11:29:06Z

--per_device_train_batch_size 1

liboaccn · 2024-03-20T11:30:13Z

补充之前版本的[LLaMA-Factory] 在13B微调的时候没有问题，后来升级到最新版，升级各种组件后报错了
torch 2.2.0
torchvision 0.17.1
accelerate 0.28.0
peft 0.9.0
transformers 4.38.0.

hiyouga · 2024-03-20T11:32:37Z

看起来不是版本问题

liboaccn · 2024-03-20T11:33:39Z

--per_device_train_batch_size 1

尝试过报错依然换其他模型也不行 qwen llama baichuan 13b /14b的都不行

hiyouga · 2024-03-20T11:42:49Z

用 readme 推荐的版本试试

hiyouga added the solved This problem has been already solved label Mar 20, 2024

hiyouga closed this as completed Mar 20, 2024

Cucunnber mentioned this issue May 27, 2024

预训练codeqwen1.5-7b时显存分布异常，训练一段时间后爆OOM #3908

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chinese-llama-2-13b-hf Baichuan2-13B-Chat等 13B 模型A100-40G 微调OOM #2908

chinese-llama-2-13b-hf Baichuan2-13B-Chat等 13B 模型A100-40G 微调OOM #2908

liboaccn commented Mar 20, 2024

hiyouga commented Mar 20, 2024

liboaccn commented Mar 20, 2024 •

edited

Loading

hiyouga commented Mar 20, 2024

liboaccn commented Mar 20, 2024

hiyouga commented Mar 20, 2024

chinese-llama-2-13b-hf Baichuan2-13B-Chat等 13B 模型A100-40G 微调OOM #2908

chinese-llama-2-13b-hf Baichuan2-13B-Chat等 13B 模型A100-40G 微调OOM #2908

Comments

liboaccn commented Mar 20, 2024

Reminder

Reproduction

Expected behavior

System Info

Others

hiyouga commented Mar 20, 2024

liboaccn commented Mar 20, 2024 • edited Loading

hiyouga commented Mar 20, 2024

liboaccn commented Mar 20, 2024

hiyouga commented Mar 20, 2024

liboaccn commented Mar 20, 2024 •

edited

Loading