The code for the paper "What Is Missing: Interpretable Ratings For Large Language Model Outputs. The paper can be found here.
To work with the models in this project, ensure you're using git lfs (Git Large File Storage) for efficient model downloading. Here’s how to set up and retrieve the models:
-
Install Git LFS
module load git-lfs/3.4.0 git lfs install
-
Pull Models
Navigate into the desired directory and use the following commands to pull the models:-
Mixtral 8x7B Instruct (with Flash Attention)
- URL: Mixtral 8x7B Instruct v0.1
- Clone:
git clone https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
-
Llama3 8B Instruct
- URL: Llama3 8B Instruct
- Clone:
git clone https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
-
all-mpnet-base-v2 (Sentence Transformer)
- URL: all-mpnet-base-v2
- Clone:
git clone https://huggingface.co/sentence-transformers/all-mpnet-base-v2
-
All dependencies are listed in the requirements.txt. The key libraries include:
- Huggingface
transformers Pytorch- Flash Attention (Only on A100 GPUs, not included in current implementation)
- Huggingface
trl - Sentence Transformers
- Comet.ml
To install the required packages:
pip install --no-index transformers
pip install --no-index torch
pip install --no-index bitsandbytes
pip install --no-index -U flash-attn --no-build-isolation (Only on A100 GPUs)
pip install --no-index -U sentence-transformers
pip install --no-index trlOn the Beluga cluster, each node provides 64GB of GPU RAM, 16GB in each GPU. By using quantization with 4-bit parameters, the Mixtral model will require approximately 27GB of GPU memory. Llama 3-8B is listed to require 16Gb of GPU memory. Running both models requires 43GB. I found that there was just under the required amount of memory with 3 GPUs. I requested 4 GPUs for a total of 64GB GPU RAM. RAM is also required for the sentence transformer model but 64GB should still be sufficient.
NOTE: This does not include any extra overhead for the training models. The numbers listed above are strictly for inference.
To run the models on a cluster, use the following command:
sbatch llm-interaction-job.shTo run the Online DPO trainer on a cluster, use the following command:
sbatch odpo-trainer-job.shMake sure all file paths in the script are correctly set for the cluster's file system.
I am using contrastive search as seen in this blog post. The parameters are automatically set to defaults but can be changed for tasks like topic generation. Alternatively, high temperature sampling can produce better results in terms of training.
To stay within the Hugging Face toolset, I will be using the TRL library found here.
For initial testing I am using the ultrafeedback-prompt dataset. It is a standard conversational dataset that is used in the TRL examples. Remember to use git lfs.
- URL: ultrafeedback-prompt
- Clone:
git clone https://huggingface.co/datasets/trl-lib/ultrafeedback-promptProducing useful embeddings is important to actually tell the model what knowledge is missing. I am using Sentence Transformers to extract the useful embeddings. These are usually used for semantic search and is more useful than the standard LLM tokenizers.
- RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
- ETA-REWARDING LANGUAGE MODELS: Self-Improving Alignment with LLM-as-a-Meta-Judge
- Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication
- Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
- Self-Rewarding Language Models
- WEAK-TO-STRONG GENERALIZATION: ELICITING STRONG CAPABILITIES WITH WEAK SUPERVISION