Prompt-Tuned Conspiracy Theory Classification on Twitter using Large Language Models

This repository provides the data and code needed to replicate "Semantic proximity underlies progression in the endorsement of multiple conspiracy theories by individuals." Paul Vicinanza, Echo Zhou, Hayagreeva Rao, and Henrich Greve. 2024.

Theoretically, this paper seeks to understand the sequential process of conspiracy endorsements on Twitter. We find that users jump to semantically similar but more extreme conspiracy theories over time. However, prominent cultural events, in our case the murder of George Floyd and subsequent Black Lives Matter protests, may lead users to endorse semantically distant conspiracy theories.

The core methodological task of the paper is to classify tweets as endorsing, or not endorsing, a given conspiracy theory. We identify 16 conspiracies in the data through a combination of unsupervised topic modeling and manual annotation. We embed each tweet by finetuning COVID-Twitter-BERT with Simple Constrastive Learning of Sentence Embeddings (SimCSE) to reduce anisotropy and increase uniformity in the embedding space. We further reduce dimensionality with UMAP and cluster the low-dimensional embeddings with HDBSCAN. Manual annotation of these clusters reveals the conspirational content and we compute semantic similarity between conspiracy theories as the pairwise distance between clusters:

However, this approach could not identify conspiracy theory endorsements at the tweet level. Using a keyword match for the top-n terms of a conspiracy theory, we construct a set of possible conspiracy endorsements for each conspiracy. We prompt-tune a frozen BLOOMZ-1.7B parameter model separately for each conspiracy theory using a hand-labeled dataset of 500 tweets. With user-tweet level data, we identify semantic similarity as a meaningful predictor of future conspiratorial endorsements.

Prompt Tuning Benchmark

Despite using 100x fewer parameters, this approach signficantly outperforms a few-shot prompted GPT-3 model.

	BLOOMZ-1.7B Prompt Tuned	GPT3 175B Davinci-003
Accuracy	0.85	0.63
F1	0.904	0.697
Precision	0.887	0.52
Recall	0.903	0.929

Replication code

Code needed to replicate the results of the study is provided as a jupyter notebook designed to run in Google Colab. This is a single, self-contained file with built in reference to the data. All you need to is open the notebook in Colab! No Drive mounting or git cloning needed.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
images		images
prompt_tuned_models		prompt_tuned_models
LICENSE		LICENSE
README.md		README.md
prompt_tuned_conspiracies.ipynb		prompt_tuned_conspiracies.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompt-Tuned Conspiracy Theory Classification on Twitter using Large Language Models

Prompt Tuning Benchmark

Replication code

About

Releases

Packages

Languages

License

pvicinanza/llm_prompt_tuning_conspiracies

Folders and files

Latest commit

History

Repository files navigation

Prompt-Tuned Conspiracy Theory Classification on Twitter using Large Language Models

Prompt Tuning Benchmark

Replication code

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages