Benchmark dataset for anti-queer bias in large language models (LLMs)
Our paper, Towards Winoqueer: Developing a Benchmark for Anti-Queer Bias in Large Language Models, was published in the QueerInAI workshop at NAACL 2022!
Finetuning Data are currently down because of licensing concerns - sorry for the outage. Expect the correct data to be posted on or before 09/09.
Scripts use to preprocess data (segment and normalize) and finetune models. Tweets are normalized using TweetNormalizer from BERTweet.
Model Checkpoints are included for four models (BERT_base, BERT_large, SpanBERT_base, SpanBERT_large) under three finetuning conditions (none, LGBTQ+ news, LGBTQ+ twitter).
winoqueer_benchmark.csv
is the benchmark data used in our experiments in the paper. Use this to replicate our results!
Our data follows the CrowS-Pairs format, and you should use their evaluation script to run our metric.
Some files in this repo are large. You will probably need to use Git LFS.