Skip to content

Implementation of "Decoding-time Realignment of Language Models", ICML 2024.

Notifications You must be signed in to change notification settings

liutianlin0121/decoding-time-realignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

DeRa: Decoding-time Realignment of Language Models

Open In Colab

DeRa is a simple method to explore and evaluate different regularization strengths in RLHF-aligned models without the need of retraining.

Two main use cases of DeRa are:

  • tailor language model alignment strength to specific user preferences or downstream applications
  • identify promising regularization strengths to retrain a model, without expensive hyperparameter sweeps

In the colab notebook link above, you'll find a reference implementation of DeRa with HuggingFace transformers 🤗. Specifically, we demonstrate the application of DeRa to the Zephyr-7b model, showing its ability to adjust the alignment levels of language models at decoding time.

Please find more details in our paper, accepted for a spotlight presentation at ICML 2024:

@inproceedings{Liu2024decoding,
 title = {Decoding-time Realignment of Language Models},
 author={Liu, Tianlin and Guo, Shangmin and Bianco, Leonardo and Calandriello, Daniele and Berthet, Quentin and Llinares, Felipe and Hoffmann, Jessica and Dixon, Lucas and Valko, Michal and Blondel, Mathieu},
 booktitle = {Proceedings of the International Conference on Machine Learning},
 year = {2024}
}

About

Implementation of "Decoding-time Realignment of Language Models", ICML 2024.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published