Add CPOTrainer #1382

fe1ixxu · 2024-02-29T14:49:26Z

Hi! This PR wants to add CPOTrainer proposed in the paper Contrastive Preference Optimization: Pushing the Boundaries of LLM
Performance in Machine Translation

The CPO method is one of the algorithm for building the state-of-the-art LLM-based translation model: ALMA

fe1ixxu · 2024-02-29T14:57:28Z

cc @kashif @lewtun

kashif · 2024-02-29T15:18:05Z

@fe1ixxu how close is the trainer in terms of code to the DPOTrainer? Can one subclass from it?

fe1ixxu · 2024-02-29T16:03:32Z

@kashif Thanks for the quick response! CPO is an approximation of DPO. The key differences between CPOTrainer and DPOTrainer are:

Remove the need of reference model
add an extra NLL loss for preferred data

I'm uncertain whether subclassing CPOTrainer from DPOTrainer is a proper idea, as DPOTrainer introduces numerous features related to reference models that are unnecessary for CPOTrainer.

HuggingFaceDocBuilderDev · 2024-03-05T19:59:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

fe1ixxu · 2024-03-07T12:48:57Z

Hi @kashif CPO docs has been finished now! Thanks!

fe1ixxu · 2024-03-21T13:51:52Z

Hi @kashif and @lewtun, I see implementation of CPOTrainer has been finished for a while and it has passed all checks. If this looks good to you, is there any chance to merge it to the main branch? Thanks!

lewtun

Thank you for this very nice implementation of CPO @fe1ixxu 🔥 ! I left a few small comments and a suggestion to remove a deepspeed function I don't think we need. Apart from that LGTM!

docs/source/cpo_trainer.mdx

trl/trainer/cpo_trainer.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Because CPO does not need init for reference model

kashif force-pushed the cpo-trainer branch from d17049f to 55e1b90 Compare March 15, 2024 08:43

fe1ixxu and others added 10 commits March 15, 2024 10:09

add CPOTrainer

9d45ce3

add docs

ed1e7c6

fix formatting

0e76df1

removed precompute_ref_log_probs arg

8c95468

remove precompute_ref_log_probs

551a0df

typos

4850ac9

finish cpo trainer doc

7c4e6b4

remove redundant lines

ba760de

typo

44256a5

formatting

f4d07cb

kashif force-pushed the cpo-trainer branch from 55e1b90 to f4d07cb Compare March 15, 2024 09:09

fe1ixxu and others added 5 commits March 15, 2024 20:48

compute chosen nll loss also for enc-dec models

8ca08f8

fix gradient error of inplace operation for enc-dec models

4d2c292

formatting

abd4a45

use CPOConfig

2137ac9

formatting

606f294

kashif requested review from kashif and lewtun March 17, 2024 12:58

kashif approved these changes Mar 17, 2024

View reviewed changes

kashif added 5 commits March 17, 2024 14:15

use model_init_kwargs from CPOConfig

afd02c2

comments in example

a1836ac

fix doc string

da4a4ee

fix typo in docstring

ef1cd23

update year

2aabcf4

kashif added 6 commits March 17, 2024 16:43

Merge branch 'huggingface:main' into cpo-trainer

18975f1

fixed typo

bcb6cd7

Merge branch 'main' into cpo-trainer

27672c3

use preference dataset

39c8e61

fix learning rate

93591bd

move dataset_num_proc to configs

6b24fd1

lewtun approved these changes Mar 21, 2024

View reviewed changes

docs/source/cpo_trainer.mdx Outdated Show resolved Hide resolved

docs/source/cpo_trainer.mdx Outdated Show resolved Hide resolved

trl/trainer/cpo_trainer.py Outdated Show resolved Hide resolved

trl/trainer/cpo_trainer.py Show resolved Hide resolved

fe1ixxu and others added 9 commits March 21, 2024 11:14

Update cpo paper link from HF: cpo_trainer.mdx

6def5dd

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

update description for CPO: cpo_trainer.mdx

d9a48de

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

remove _prepare_deepspeed for cpo

b300473

Because CPO does not need init for reference model

Add explanation to CPO loss

4a0dcd0

format

25c5495

fix bug when lengths are given

328434e

Merge remote-tracking branch 'upstream/main' into cpo-trainer

dd9344a

add CPOTrainer to README

8c842be

fix grammer

2ff65bd

kashif merged commit d1df79f into huggingface:main Mar 22, 2024
9 checks passed

kashif deleted the cpo-trainer branch March 22, 2024 20:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CPOTrainer #1382

Add CPOTrainer #1382

fe1ixxu commented Feb 29, 2024 •

edited

fe1ixxu commented Feb 29, 2024

kashif commented Feb 29, 2024

fe1ixxu commented Feb 29, 2024 •

edited

HuggingFaceDocBuilderDev commented Mar 5, 2024

fe1ixxu commented Mar 7, 2024

fe1ixxu commented Mar 21, 2024 •

edited

lewtun left a comment

Add CPOTrainer #1382

Add CPOTrainer #1382

Conversation

fe1ixxu commented Feb 29, 2024 • edited

fe1ixxu commented Feb 29, 2024

kashif commented Feb 29, 2024

fe1ixxu commented Feb 29, 2024 • edited

HuggingFaceDocBuilderDev commented Mar 5, 2024

fe1ixxu commented Mar 7, 2024

fe1ixxu commented Mar 21, 2024 • edited

lewtun left a comment

Choose a reason for hiding this comment

fe1ixxu commented Feb 29, 2024 •

edited

fe1ixxu commented Feb 29, 2024 •

edited

fe1ixxu commented Mar 21, 2024 •

edited