Skip to content

nstranges/what-is-missing

Repository files navigation

what-is-missing

The code for the paper "What Is Missing: Interpretable Ratings For Large Language Model Outputs. The paper can be found here.

Models

To work with the models in this project, ensure you're using git lfs (Git Large File Storage) for efficient model downloading. Here’s how to set up and retrieve the models:

  1. Install Git LFS

    module load git-lfs/3.4.0
    git lfs install
  2. Pull Models
    Navigate into the desired directory and use the following commands to pull the models:

    • Mixtral 8x7B Instruct (with Flash Attention)

    • Llama3 8B Instruct

      • URL: Llama3 8B Instruct
      • Clone:
        git clone https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
    • all-mpnet-base-v2 (Sentence Transformer)

      git clone https://huggingface.co/sentence-transformers/all-mpnet-base-v2

Required Libraries

All dependencies are listed in the requirements.txt. The key libraries include:

  • Huggingface transformers
  • Pytorch
  • Flash Attention (Only on A100 GPUs, not included in current implementation)
  • Huggingface trl
  • Sentence Transformers
  • Comet.ml

To install the required packages:

pip install --no-index transformers
pip install --no-index torch
pip install --no-index bitsandbytes
pip install --no-index -U flash-attn --no-build-isolation (Only on A100 GPUs)
pip install --no-index -U sentence-transformers
pip install --no-index trl

GPU RAM Calculation

On the Beluga cluster, each node provides 64GB of GPU RAM, 16GB in each GPU. By using quantization with 4-bit parameters, the Mixtral model will require approximately 27GB of GPU memory. Llama 3-8B is listed to require 16Gb of GPU memory. Running both models requires 43GB. I found that there was just under the required amount of memory with 3 GPUs. I requested 4 GPUs for a total of 64GB GPU RAM. RAM is also required for the sentence transformer model but 64GB should still be sufficient.

NOTE: This does not include any extra overhead for the training models. The numbers listed above are strictly for inference.

Running on a Cluster

To run the models on a cluster, use the following command:

sbatch llm-interaction-job.sh

To run the Online DPO trainer on a cluster, use the following command:

sbatch odpo-trainer-job.sh

Make sure all file paths in the script are correctly set for the cluster's file system.

Note on Generation

I am using contrastive search as seen in this blog post. The parameters are automatically set to defaults but can be changed for tasks like topic generation. Alternatively, high temperature sampling can produce better results in terms of training.

Online DPO

To stay within the Hugging Face toolset, I will be using the TRL library found here.

Datasets

For initial testing I am using the ultrafeedback-prompt dataset. It is a standard conversational dataset that is used in the TRL examples. Remember to use git lfs.

git clone https://huggingface.co/datasets/trl-lib/ultrafeedback-prompt

Sentence Transformers

Producing useful embeddings is important to actually tell the model what knowledge is missing. I am using Sentence Transformers to extract the useful embeddings. These are usually used for semantic search and is more useful than the standard LLM tokenizers.

Helpful and Similar Papers

About

Studying the interaction of LLM models when discussing different topics. Using open-source models in the Hugging Face Transformers library.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors