DeRa: Decoding-time Realignment of Language Models

DeRa is a simple method to explore and evaluate different regularization strengths in RLHF-aligned models without the need of retraining.

Two main use cases of DeRa are:

tailor language model alignment strength to specific user preferences or downstream applications
identify promising regularization strengths to retrain a model, without expensive hyperparameter sweeps

In the colab notebook link above, you'll find a reference implementation of DeRa with HuggingFace transformers 🤗. Specifically, we demonstrate the application of DeRa to the Zephyr-7b model, showing its ability to adjust the alignment levels of language models at decoding time.

Please find more details in our paper, accepted for a spotlight presentation at ICML 2024:

@inproceedings{Liu2024decoding,
 title = {Decoding-time Realignment of Language Models},
 author={Liu, Tianlin and Guo, Shangmin and Bianco, Leonardo and Calandriello, Daniele and Berthet, Quentin and Llinares, Felipe and Hoffmann, Jessica and Dixon, Lucas and Valko, Michal and Blondel, Mathieu},
 booktitle = {Proceedings of the International Conference on Machine Learning},
 year = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
dera_zephyr_demo.ipynb		dera_zephyr_demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeRa: Decoding-time Realignment of Language Models

About

Releases

Packages

Languages

liutianlin0121/decoding-time-realignment

Folders and files

Latest commit

History

Repository files navigation

DeRa: Decoding-time Realignment of Language Models

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages