This repository provides the data and code needed to replicate "Semantic proximity underlies progression in the endorsement of multiple conspiracy theories by individuals." Paul Vicinanza, Echo Zhou, Hayagreeva Rao, and Henrich Greve. 2024.
Theoretically, this paper seeks to understand the sequential process of conspiracy endorsements on Twitter. We find that users jump to semantically similar but more extreme conspiracy theories over time. However, prominent cultural events, in our case the murder of George Floyd and subsequent Black Lives Matter protests, may lead users to endorse semantically distant conspiracy theories.
The core methodological task of the paper is to classify tweets as endorsing, or not endorsing, a given conspiracy theory. We identify 16 conspiracies in the data through a combination of unsupervised topic modeling and manual annotation. We embed each tweet by finetuning COVID-Twitter-BERT with Simple Constrastive Learning of Sentence Embeddings (SimCSE) to reduce anisotropy and increase uniformity in the embedding space. We further reduce dimensionality with UMAP and cluster the low-dimensional embeddings with HDBSCAN. Manual annotation of these clusters reveals the conspirational content and we compute semantic similarity between conspiracy theories as the pairwise distance between clusters:
However, this approach could not identify conspiracy theory endorsements at the tweet level. Using a keyword match for the top-n terms of a conspiracy theory, we construct a set of possible conspiracy endorsements for each conspiracy. We prompt-tune a frozen BLOOMZ-1.7B parameter model separately for each conspiracy theory using a hand-labeled dataset of 500 tweets. With user-tweet level data, we identify semantic similarity as a meaningful predictor of future conspiratorial endorsements.
Despite using 100x fewer parameters, this approach signficantly outperforms a few-shot prompted GPT-3 model.
BLOOMZ-1.7B Prompt Tuned | GPT3 175B Davinci-003 | |
---|---|---|
Accuracy | 0.85 | 0.63 |
F1 | 0.904 | 0.697 |
Precision | 0.887 | 0.52 |
Recall | 0.903 | 0.929 |
Code needed to replicate the results of the study is provided as a jupyter notebook designed to run in Google Colab. This is a single, self-contained file with built in reference to the data. All you need to is open the notebook in Colab! No Drive mounting or git cloning needed.