Weird DPO loss #46

ChenDRAG · 2023-11-24T03:07:46Z

Hi, I would like to raise some attention to issue #38.

It seems that the DPO-Lora training loss (red line) drops abruptly at the beginning of each epoch, which seems weird. (I tried Lora model global batch size 64, multi_gpu acceleration, 8GPUs, learning rate 1e-4, others same suggested)

In the mean time, the full parameter fine tunning has no such problem (official settings).

I don't know if this is normal and assume this is a bug associated with the lora model. Is there any explanations? Has anyone encountered the same issue? If your rerun loss is normal, can you share your configs?

JhonDan1999 · 2024-05-28T06:47:54Z

I am experiencing similar behaviour the training loss values show considerable fluctuations as you can see below

here is my code is there something wrong with the training parameters that caused this behaviour:

from trl import DPOTrainer


training_arguments = TrainingArguments(
    output_dir="results",
    num_train_epochs=5,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=4,
    optim='paged_adamw_32bit',
    save_steps=10000000000,
    logging_steps=10,
    learning_rate=2e-4,
    weight_decay=2e-4,
    # fp16 = False,  # Set fp16 to False
    bf16=True,
    max_grad_norm=0.3,
    warmup_ratio=0.03,
    lr_scheduler_type='constant',
    save_strategy = "no",


    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={"use_reentrant":False},
    remove_unused_columns=False
)

import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

trainer = DPOTrainer(
    model=peft_model,

    ref_model=None,
    model_init_kwargs=None,
    ref_model_init_kwargs=None,

    tokenizer=tokenizer,
    args=training_arguments,
    beta=0.1,
    loss_type="sigmoid",
    train_dataset=formatted_train_data,
    eval_dataset=None,  # Provide eval dataset if available
    max_length=max_seq_length,
    peft_config=peft_config,

)

trainer.train()

@lewtun we really need your input here please

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird DPO loss #46

Weird DPO loss #46

ChenDRAG commented Nov 24, 2023 •

edited

JhonDan1999 commented May 28, 2024 •

edited

Weird DPO loss #46

Weird DPO loss #46

Comments

ChenDRAG commented Nov 24, 2023 • edited

JhonDan1999 commented May 28, 2024 • edited

ChenDRAG commented Nov 24, 2023 •

edited

JhonDan1999 commented May 28, 2024 •

edited