Skip to content

Code for "Tailoring Vaccine Messaging with Common-Ground Opinions" (NAACL Findings 2024)

Notifications You must be signed in to change notification settings

rickardstureborg/tailor-cgo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

tailor-cgo

This repo is for our NAACL Findings 2024 paper "Tailoring Vaccine Messaging with Common-Ground Opinions". We provide the data download scripts for both human-labeled and LLM-labeled responses, model weights for the BERT automatic evaluators, and inference scripts to assess how well-tailored a response of interest is.

We describe the dataset here (#Tailor-CGO Dataset), along with a tutorial of how to run automatic evaluation (#Automatic Evaluation). This work builds on lots of previous work, mentioned in the last section (#Citations).

Tailor-CGO Dataset

The dataset contains both human- and LLM-annotated preferences/scores for how "well tailored" each written response is. Annotations are structured as a (1) relative preference between two responses or (2) an absolute score given to each response individually.

To use the dataset:

from datasets import load_dataset

# get either absolute or relative datasets:
dataset = load_dataset("DukeNLP/tailor-cgo", "relative_preferences")
dataset = load_dataset("DukeNLP/tailor-cgo", "absolute_scores")

You can view the full datacard on huggingface: https://huggingface.co/datasets/DukeNLP/tailor-cgo

Datapoints have the following structure:

// Example of absolute score annotation
{
 "response_id": 96, 
 "concern": {
             "concern_id": 606, 
             "text": "the harmful ingredients in the influenza vaccine could..."
             },
 "opinion": {
             "opinion_id": 1108,
             "text": "When advocating for a bigger government..."
             },
 "system": {
            "model": "vicuna-33b-v1.3", 
            "temperature": 0.31619653,
            "prompt": "prompt-cot-health_expert-unguided"
            }, 
 "response": "I understand ...", 
 "evaluation": {
                "model": "gpt-4-1106-preview",  // 'crowdsourced' for human evaluated responses
                "temperature": 1.0, // None for human evaluated responses
                "prompt": "modified-geval",  // None for human evaluated responses
                "n_scores": 100,
                "raw_outputs": ["2\n\nThe response attempts to", 
                                "Tailoring Score = 1", ...],  // None for human evaluated responses
                "scores": [2, 1, ...], 
                "mean_score": 1.32, 
                "mode_score": 1,  // None for human evaluated responses
                }
}
// Example of relative preference annotation
{
 "responseA": {
               "response_id": 0,
               "concern": {
                           "concern_id": 481,
                           "text": "we might be underestimating..."
                           },
               "opinion": {
                           "opinion_id": 56,
                           "text": "It is okay to..."
                           },
               "system": {
                          "model": "gpt-4-0613",
                          "temperature": 0.9046691,
                          "prompt": "prompt-cot-ai_assistant-unguided"
                          },
               "response": "I appreciate your..."
               },
 "responseB": {
               "response_id": 1,
               "concern": {
                           "concern_id": 481,
                           "text": "we might be underestimating..."
                           }, 
               "opinion": { // Note: opinion is not always the same as in A
                           "opinion_id": 56, "text": "It is okay to..."
                           },
               "system": { // Note: system is not always the same as in A
                          "model": "gpt-4-0613",
                          "temperature": 0.9046691,
                          "prompt": "prompt-cot-ai_assistant-unguided"
                          },
               "response": "I completely understand..."
               },
 "preferences": ["A", "A", "A"],
 "majority_vote": "A"
 }

On hugginface, the dataset has this structure. There are a few additional files that are not loaded through huggingface dataset loader. *-relative_by_absolute indicates absolute scores have been formatted as relative preference annotations by directly comparing scores. *-relative_by_relative indicates relative preferences are generated directly from prompting models through pairwise comparisons.

data/
├── human_labeled/
│   ├── absolute_scores/
│   │   └── dev-absolute.jsonl
│   └── relative_preferences/
│       ├── dev-relative.jsonl
│       ├── dev-relative_by_absolute.jsonl
│       ├── dev-relative-extra.jsonl
│       ├── test-relative.jsonl
│       ├── train-relative.jsonl
│       └── train-relative-extra.jsonl
└── llm_labeled/
    ├── dev-relative_by_absolute.jsonl
    ├── dev-relative_by_relative.jsonl
    ├── test-relative_by_absolute.jsonl
    ├── test-relative_by_relative.jsonl
    ├── train-relative_by_absolute.jsonl
    └── train-absolute.jsonl

For each file, we list some relevant statistics on various definitions of their size $N$:

file unique responses comparisons annotations per sample
dev-absolute.jsonl 400 N/A 3
dev-relative_by_absolute.jsonl* 400 200 3
dev-relative-extra.jsonl** 600 400 1
dev-relative.jsonl 400 200 3
test-relative-extra.jsonl** 1200 800 1
test-relative.jsonl 800 400 3
train-absolute.jsonl 20000 N/A 100
train-relative.jsonl 1200 600 1

NOTE:

*This file is translated from absolute scores to relative comparisons by comparing scores across responses in dev-absolute.jsonl.

**Unlike rest of files, these comparisons are non-iid. We do not use these files in our paper but make them available here.

For further explanation of how the data is collected, please see our paper.

Automatic Evaluation

We release the model weights for our automatic evaluators, both $BERT_{REL}$ and $BERT_{ABS}$. Below is a demonstration of how to use these models:

Download and open our latest release. This will download model weights and place them in the local/ directory. Then, generate predictions according to this demo below:

from models.regression import Finetuner
from transformers import BertTokenizer

# load model from our github release
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = Finetuner.load_from_checkpoint('local/regression-fine-2e-ckpt/last-release.ckpt')
model.eval()

# data to process
concern = "there seems to be a lack of proper blinding and control treatments in this vaccine research which could potentially skew the results"
opinion = "The idea that robots and computers will be able to do most of the jobs currently done by humans in the future seems extremely realistic."
message = "Certainly, it's understandable to be concerned about..."
# format for passing to model
data = 'SEP'.join((message, concern, opinion))

# predict
response = tokenizer(data, 
                     padding=True, 
                     truncation=True, 
                     max_length=500, 
                     return_tensors='pt')
scores = model.forward({
        'response': response,
        'labels': None,
    }, predict=True)['scores'].tolist()

# result
print(scores)

Citations

If you use this content in your work, please cite our paper:

@misc{stureborg2024tailoring,
      title={Tailoring Vaccine Messaging with Common-Ground Opinions}, 
      author={Rickard Stureborg and Sanxing Chen and Ruoyu Xie and Aayushi Patel and Christopher Li and Chloe Qinyu Zhu and Tingnan Hu and Jun Yang and Bhuwan Dhingra},
      year={2024},
      eprint={2405.10861},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

If you mention the taxonomy of vaccine concerns (VaxConcerns) which the dataset is based on, please cite this paper:

@article{VaxConcerns,
	title = {Development and validation of {VaxConcerns}: A taxonomy of vaccine concerns and misinformation with Crowdsource-Viability},
	issn = {0264-410X},
	url = {https://www.sciencedirect.com/science/article/pii/S0264410X2400255X},
	doi = {https://doi.org/10.1016/j.vaccine.2024.02.081},
	journal = {Vaccine},
	author = {Stureborg, Rickard and Nichols, Jenna and Dhingra, Bhuwan and Yang, Jun and Orenstein, Walter and Bednarczyk, Robert A. and Vasudevan, Lavanya},
	year = {2024},
}

About

Code for "Tailoring Vaccine Messaging with Common-Ground Opinions" (NAACL Findings 2024)

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages