Aligning Language Models with Human Feedback without Reinforcement Learning

This repository contains the project and presentation for the 2nd Cycle Integrated Project in Computer Science and Engineering 2023/24 master's course @ IST.

Authors and Advisors

Author	GitHub
Martim Santos	https://github.com/martimfasantos

Advisors	Website
André Martins	https://andre-martins.github.io/
Francisco Melo	https://gaips.inesc-id.pt/~fmelo/

Final Grade: 18 / 20

Details

Project & Presentation

Abstract

Large language models (LLMs) are characterized by their remarkable ability to learn extensive world knowledge and generate human-like text across diverse applications. However, these often contain misleading and toxic content, emphasizing the need to align them with human values and preferences to ensure more useful and secure AI systems. A widely employed strategy in numerous prominent models, including OpenAI’s GPT-3.5 and GPT-4, involves Reinforcement Learning from Human Feedback (RLHF). While this method has demonstrated impressive outcomes, RLHF’s complexity, instability, and sensitivity to hyperparameters challenge its empirical success and usability across various real-life scenarios.

In this work, we intend to deeply study and compare the different novel RL-free approaches that focus on overcoming the drawbacks of RLHF, while demonstrating competitive performance in multiple language tasks.

Finally, we propose a novel alternative method that combines existing approaches by leveraging their strengths, aiming to outperform these approaches in the language tasks of dialogue, summarization, and machine translation.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Aligning_Language_Models_with_Human_Feedback_without_Reinforcement_Learning-PIC2-Martim_Santos_95638.pdf		Aligning_Language_Models_with_Human_Feedback_without_Reinforcement_Learning-PIC2-Martim_Santos_95638.pdf
PIC2_Presentation.pdf		PIC2_Presentation.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aligning Language Models with Human Feedback without Reinforcement Learning

Authors and Advisors

Details

Abstract

About

Releases

Packages

martimfasantos/PIC2

Folders and files

Latest commit

History

Repository files navigation

Aligning Language Models with Human Feedback without Reinforcement Learning

Authors and Advisors

Details

Abstract

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages