This repository provides replication materials for the paper: (final paper name)
For replication purposes, you will need to download the following large files from Google Drive
- Download
data/url_to_flag_updatedtoFeb24_obitfilter.csv
from here - Download
data/theta_Feb27update_withdomtopic.csv
from here
Results for the main text of the paper can then be replicated using the R
files figures_1_2_3.R
and figures_4_5.R
. If you would also like to replicate the supplement, you can do so with the file plots_for_supplement.R
.
A reminder that the paper uses data from the following sources:
- The NYTimes COVID Case/Death Rate Repository. This is pulled directly from the repository in
util.R
- MEDSL's Election and County Data. This is pulled directly from the repository in
util.R
- Kieran Healy's 2020 Election Results. This is pulled directly from the repository in
util.R
- 2019 population estimates of U.S. counties from the Census Bureau. This is contained in the file
data/co-est2019-alldata.csv
. - Community Resilience Estimates provided by the U.S. Census to estimate the percentage of individuals in each county that had 0 risk factors, 1-2 risk factors, or 3+ risk factors for COVID. This is contained in the file
data/cre-2018-a11.csv
.
Given the cleaned and preprocessed CSV data file (download it here), the STM model and Theta output from the model can be replicated by using code_for_stm/run_stm.R
.
Given the Theta output from the STM model, in some of our analysis, we map each article to one topic by finding the dominate topic in the topic distribution per article. This code can be found here: code_for_stm/add_domtopic_to_theta.py
.
If one is interested in how we preprocessed the data and decided in the parameter k, the files code_for_stm/preprocess_text_for_topicmodel.py
and code_for_stm/findK.py
provide those details.