Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird DPO loss #46

Open
ChenDRAG opened this issue Nov 24, 2023 · 1 comment
Open

Weird DPO loss #46

ChenDRAG opened this issue Nov 24, 2023 · 1 comment

Comments

@ChenDRAG
Copy link

ChenDRAG commented Nov 24, 2023

Hi, I would like to raise some attention to issue #38.

It seems that the DPO-Lora training loss (red line) drops abruptly at the beginning of each epoch, which seems weird. (I tried Lora model global batch size 64, multi_gpu acceleration, 8GPUs, learning rate 1e-4, others same suggested)

In the mean time, the full parameter fine tunning has no such problem (official settings).

image

I don't know if this is normal and assume this is a bug associated with the lora model. Is there any explanations? Has anyone encountered the same issue? If your rerun loss is normal, can you share your configs?

@JhonDan1999
Copy link

JhonDan1999 commented May 28, 2024

I am experiencing similar behaviour the training loss values show considerable fluctuations as you can see below

Screenshot 2024-05-28 at 10 08 59 AM

here is my code is there something wrong with the training parameters that caused this behaviour:

from trl import DPOTrainer


training_arguments = TrainingArguments(
    output_dir="results",
    num_train_epochs=5,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=4,
    optim='paged_adamw_32bit',
    save_steps=10000000000,
    logging_steps=10,
    learning_rate=2e-4,
    weight_decay=2e-4,
    # fp16 = False,  # Set fp16 to False
    bf16=True,
    max_grad_norm=0.3,
    warmup_ratio=0.03,
    lr_scheduler_type='constant',
    save_strategy = "no",


    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={"use_reentrant":False},
    remove_unused_columns=False
)

import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

trainer = DPOTrainer(
    model=peft_model,

    ref_model=None,
    model_init_kwargs=None,
    ref_model_init_kwargs=None,

    tokenizer=tokenizer,
    args=training_arguments,
    beta=0.1,
    loss_type="sigmoid",
    train_dataset=formatted_train_data,
    eval_dataset=None,  # Provide eval dataset if available
    max_length=max_seq_length,
    peft_config=peft_config,

)

trainer.train()

@lewtun we really need your input here please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants