Skip to content

Releases: huggingface/trl

v0.9.6 release

08 Jul 13:51
314e8eb
Compare
Choose a tag to compare

We are excited to introduce the new v0.9.6 release. Many new exciting features and algorithms. The highlights are as follows:

  • Support for SimPO by @fe1ixxu, a reference-free method that also regularizes output length. To use this loss, the users can input loss_type="simpo" and cpo_alpha=0 in the CPOConfig and use it with the CPOTrainer.
image

We also included many important fixes and improvements such as fixing prints in the CLI with GCP containers by @alvarobartt. Enjoy the release!

What's Changed

New Contributors

Full Changelog: v0.9.4...v0.9.6

v0.9.4

06 Jun 14:17
974b0d3
Compare
Choose a tag to compare

Mainly backward compatibility fixes with SFTTrainer.

What's Changed

New Contributors

Full Changelog: v0.9.3...v0.9.4

v0.9.3 RLOO / PPOv2 Trainer, RM Visualization

05 Jun 16:08
c0819ee
Compare
Choose a tag to compare

We are excited to introduce the new v0.9.3 release. Many new exciting features and algorithms. The highlights are as follows:

  1. RLOO Trainer: RLOO (Reinforce Leave-one-out) is a new online RL algorithm for RLHF, proposed by Ahmadian et al from Cohere. Check out our docs here to get started
  2. PPOv2 Trainer: We are introducing a new experimental PPOv2 trainer which is more aligned with OpenAI's PPO implementation based on https://arxiv.org/abs/2403.17031. Check out our docs here to get started
  3. Reward model visualization: the reward model training now includes visualization on the eval dataset, as shown below.
Screen.Recording.2024-05-09.at.2.37.44.PM.mov
  1. New losses in the DPO Trainer: DPOTrainer now includes losses / support for Self-play Preference Optimization, Robust DPO, TR-DPO, Iterative Reasoning Preference Optimization, and Pairwise Noise Contrastive Alignment
  2. New losses in the KTO Trainer: KTOTrainer now includes the loss for Binary Classifier Optimization (BCO)

What's Changed

New Contributors

Full Changelog: v0.8.6...v0.9.2

v0.8.6: Fixes for CLI

22 Apr 08:59
e90e8d9
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.8.5...v0.8.6

v0.8.5: Important fixes for CLIs

18 Apr 11:58
3595eb0
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.8.4...v0.8.5

v0.8.4: CLI / CPO / KTO important fixes

17 Apr 15:22
a5788ac
Compare
Choose a tag to compare

This patch release includes important fixes for the CLI and KTO & CPO trainers

What's Changed

New Contributors

Full Changelog: v0.8.3...v0.8.4

v0.8.3: Patch release for CLI

12 Apr 10:25
9822647
Compare
Choose a tag to compare

What's Changed

This is a patch release that includes an import fix for CLIs

Full Changelog: v0.8.2...v0.8.3

v0.8.2: ORPO & CPO Trainer / Vision LLMs support for `SFTTrainer`, KTO fixes

11 Apr 13:51
143e111
Compare
Choose a tag to compare

ORPO Trainer & Vision LLMs support for SFTTrainer, KTO fixes

This release includes two new trainers: ORPO from KAIST and CPO
The release also includes Vision LLM such as Llava support for SFTTrainer, please see: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details

ORPO Trainer

CPO Trainer

VLLMs support for SFTTrainer

You can now use SFTTrainer to fine-tune VLLMs such as Llava !
See: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details

KTO Fixes

Many fixes were introduced for the KTOTrainer:

  • Update KTO example to use better model and ChatML support by @lewtun in #1485
  • [KTO] Use batching to speed up data processing by @lewtun in #1470
  • Update KTO example with good dataset & chat format by @lewtun in #1481
  • [KTO] fix interleaving, reporting, and hanging bugs by @kawine and @claralp in #1499
  • [KTO] fix metric logging by @claralp in #1514

10x PPO !

Other fixes

New Contributors

Full Changelog: v0.8.1...v0.8.2

v0.8.1: Patch release for CLIs

20 Mar 10:39
8534f0e
Compare
Choose a tag to compare

This patch release includes some important fixes for CLIs

What's Changed

Full Changelog: v0.8.0...v0.8.1

v0.8.0: KTOTrainer, TRL CLIs, QLoRA + FSDP !

19 Mar 16:25
f2c7177
Compare
Choose a tag to compare

New Trainer: KTOTrainer:

We recently introduced the KTOTrainer in order to run KTO algorithms on LLMs !

TRL Command Line Interfaces (CLIs):

Run SFT, DPO and chat with your aligned model directly from the terminal:

SFT:

trl sft --model_name_or_path facebook/opt-125m --dataset_name imdb --output_dir opt-sft-imdb

DPO:

trl dpo --model_name_or_path facebook/opt-125m --dataset_name trl-internal-testing/Anthropic-hh-rlhf-processed --output_dir opt-sft-hh-rlhf 

Chat:

trl chat --model_name_or_path Qwen/Qwen1.5-0.5B-Chat

Read more about CLI in the relevant documentation section or use --help for more details.

FSDP + QLoRA:

SFTTrainer now supports FSDP + QLoRA

  • Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA by @pacman100 in #1416

Other fixes

New Contributors

Full Changelog: v0.7.11...v0.8.0