Skip to content


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation


This is the repository for the GPT-3 baselines described in the RAFT benchmark paper.

Set up a virtual environament and install necessary requirements from the requirements file.

conda create -n raft-baselines python=3.8 && conda activate raft-baselines
python -m pip install -r requirements.txt

Install raft-baselines.

python develop

You may have to run the above command with sudo prepended for permissions.

Starter Kit

A starter kit notebook walks through the basics of making predictions using models from the HuggingFace Model Hub. There's also a Colab version.

RAFT Predict

Use the raft_predict script to run classifiers on the RAFT datasets. By default, the script will run on the first 5 test examples for each dataset. To use a random classifier on the first 10 examples from the ADE Corpus V2 dataset:

python -m raft_baselines.scripts.raft_predict with n_test=10 'configs=["ade_corpus_v2"]' classifier_name=RandomClassifier

The other classifiers available are:

  • GPT3Classifier: the one used for the GPT-3 baseline in the paper
  • TransformersCausalLMClassifier: takes as input a model_type string, and runs an arbitrary CausalLM from the HuggingFace Model Hub

For example, to generate predictions from DistilGPT-2 on the first 10 examples of the ADE Corpus you can run:

python -m raft_baselines.scripts.raft_predict with n_test=10 'configs=["ade_corpus_v2"]' classifier_name=TransformersCausalLMClassifier 'classifier_kwargs={"model_type":"distilgpt2"}'

In order to run experiments with GPT-3, you will need to have an OpenAI API key. Create a file called .env and put your API key there. Copy the format of .env-example:



We use Sacred to track our experiments and outputs. This has no overhead at runtime, simply run either of our two experiment scripts with python like normal. You can change where tracking files get saved to by modifying the observer at the top of every experiment file, or you can change the details of the experiment via the various configuration parameters specified in the configs block.

# For labeling the test set
python -m raft_baselines.scripts.raft_predict
# For tuning various dimensions on the train set with LOO validation
python -m raft_baselines.scripts.raft_train_experiment

Alternately, you can modify the input variables to an experiment from the command line, as is done in the example above. Regardless, some modification will be necessary if you want to run different experiments. See this tutorial for more information.

Similarly, you can save metrics with raft_experiment.log_scalar(), or by using the sacred observer directly. See this tutorial for more information.

To save out predictions and upload to the HuggingFace Hub (and the leaderboard), see the RAFT submission template.


This repository is licensed under the MIT License.