Skip to content

2nd Cycle Integrated Project in Computer Science and Engineering 2023/24 master's course

Notifications You must be signed in to change notification settings

martimfasantos/PIC2

Repository files navigation

Aligning Language Models with Human Feedback without Reinforcement Learning

This repository contains the project and presentation for the 2nd Cycle Integrated Project in Computer Science and Engineering 2023/24 master's course @ IST.

Authors and Advisors

Author GitHub
Martim Santos https://github.com/martimfasantos
Advisors Website
André Martins https://andre-martins.github.io/
Francisco Melo https://gaips.inesc-id.pt/~fmelo/

Final Grade: 18 / 20


Details

Project & Presentation


Abstract

Large language models (LLMs) are characterized by their remarkable ability to learn extensive world knowledge and generate human-like text across diverse applications. However, these often contain misleading and toxic content, emphasizing the need to align them with human values and preferences to ensure more useful and secure AI systems. A widely employed strategy in numerous prominent models, including OpenAI’s GPT-3.5 and GPT-4, involves Reinforcement Learning from Human Feedback (RLHF). While this method has demonstrated impressive outcomes, RLHF’s complexity, instability, and sensitivity to hyperparameters challenge its empirical success and usability across various real-life scenarios.

In this work, we intend to deeply study and compare the different novel RL-free approaches that focus on overcoming the drawbacks of RLHF, while demonstrating competitive performance in multiple language tasks.

Finally, we propose a novel alternative method that combines existing approaches by leveraging their strengths, aiming to outperform these approaches in the language tasks of dialogue, summarization, and machine translation.

About

2nd Cycle Integrated Project in Computer Science and Engineering 2023/24 master's course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published