This repository contains the project and presentation for the 2nd Cycle Integrated Project in Computer Science and Engineering 2023/24 master's course @ IST.
Author | GitHub |
---|---|
Martim Santos | https://github.com/martimfasantos |
Advisors | Website |
---|---|
André Martins | https://andre-martins.github.io/ |
Francisco Melo | https://gaips.inesc-id.pt/~fmelo/ |
Final Grade: 18 / 20
Large language models (LLMs) are characterized by their remarkable ability to learn extensive world knowledge and generate human-like text across diverse applications. However, these often contain misleading and toxic content, emphasizing the need to align them with human values and preferences to ensure more useful and secure AI systems. A widely employed strategy in numerous prominent models, including OpenAI’s GPT-3.5 and GPT-4, involves Reinforcement Learning from Human Feedback (RLHF). While this method has demonstrated impressive outcomes, RLHF’s complexity, instability, and sensitivity to hyperparameters challenge its empirical success and usability across various real-life scenarios.
In this work, we intend to deeply study and compare the different novel RL-free approaches that focus on overcoming the drawbacks of RLHF, while demonstrating competitive performance in multiple language tasks.
Finally, we propose a novel alternative method that combines existing approaches by leveraging their strengths, aiming to outperform these approaches in the language tasks of dialogue, summarization, and machine translation.