This repository contains the dataset and code for creating the USB Benchmark. ( Update: The benchmark datasets are now also available on Huggingface at this link! )
This repository also provides scripts to run and evaluate some models on the benchmark. The models include ChatGPT (via few-shot prompting), and Flan-T5-XL (finetuned on the training set of the benchmark, available at the links listed here).
Below we provide step-by-step instructions to create the benchmark datasets and run evaluation of the two models:
Clone the repository
git clone https://github.com/kukrishna/usb.git
Extract the raw document annotations by running the command
tar -xf raw_annotations.tar.gz
Install the required libraries (very few) by running
pip install -r requirements.txt
Run the single master script which will create the labeled datasets for all tasks and domains.
The labled datasets will be written in a folder named task_datasets
.
bash create_all_datasets.sh
To ensure that the labeled datasets were created properly, you can match the checksums for the generated
files which we have included in the checksums.txt
file.
We provide pipelines to evaluate models such as ChatGPT or FlanT5 on all tasks. When you run a pipeline script, it : (1) converts examples from the task datasets into seq2seq format with task instructions and optional fewshot examples in the input (2) runs inference using the corresponding model (3) runs the evaluation script to produce metrics for the model outputs for each task
cd experiments
# to run chatgpt pipeline
bash run_pipeline_chatgpt.sh
# to run finetuned flant5-xl pipeline
bash run_pipeline_flant5.sh
More details can be found in the paper: https://aclanthology.org/2023.findings-emnlp.592/
If you use this dataset, please cite it as below:
@inproceedings{krishna-etal-2023-usb,
title = "{USB}: A Unified Summarization Benchmark Across Tasks and Domains",
author = "Krishna, Kundan and
Gupta, Prakhar and
Ramprasad, Sanjana and
Wallace, Byron and
Bigham, Jeffrey and
Lipton, Zachary",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
year = "2023",
pages = "8826--8845"
}