Data and code for "Characterizing the impacts of disease on behavior across scales: Policy, perception, and potential for infection"
Authors: Casey M Woika*, Juliana C Taube*, Vittoria Colizza, Shweta Bansal
All scripts to reproduce the analyses in the main and supplementary texts are provided.
Scripts should be run in numerical order, and most external data files needed to run them can be found in this repository. Exceptions are documented below.
data/input/contains data files of population sizes, urban/rural and HHS region classifications, and crosswalks from state to county FIPS codes that are used in data processing and figure construction.data/contact_by_worry_bin_week_county_trunc72.csvcontains weighted mean estimates of non-household contact rates by county-week disaggregated by worry survey response (1 indicates somewhat or very worried, and 0 indicates not at all or not very worried).data/fitted_predictions.csvcontains GAM fitted estimates of non-household contact rates by county-week regardless of worry status.processed_data/casey_regression_inputs_underreporting_2025-01-20.csvcontains covariates for underreporting sensitivity analysis since the underreporting rates are not publicly available.processed_data/full_updated_covariate_data_20241025.csvcontains covariates for main regression analysis related to policy, control variables, and connected counties.processed_data/prop_worried_obs_2026-02-27.csvcontains mean estimates of proportion of survey respondents worried about COVID-19 by county-week since raw survey data cannot be shared.
01_gam_contact_by_worry.r: Estimates smooth estimates of non-household contacts at the county-week scale disaggregated by worry status.02_prep_worry_data.R: Estimates proportion worried at the county-week scale using raw survey responses.03_prep_inla_covariates.R: Prepares covariates for INLA models, including measured risk, perceived risk, and policy data, for focal and linked counties.04_prep_sir_contact_predictions.R: Estimates county-week contact using observed data, or predictions from simple regressions based on measured or perceived risk at different spatial scales.05_run_inla_models.R: Runs main INLA models.06_run_inla_models_sensitivity.R: Runs sensitivity INLA models, including different rolling averages, lags, responses, and predictor variables.07_make_inla_figures.R: Makes figures based on INLA results.08_run_isolated_sir.R: Runs county-specific SIR models based on contact predictions.09_make_isolated_sir_figures.R: Makes figures based on SIR models.
- New York Times case count files can be downloaded from https://github.com/nytimes/covid-19-data/tree/master/rolling-averages. County, state, and national level data are needed for the years 2020 and 2021.
- Raw contact survey data cannot be provided to estimate weekly non-household contacts and the proportion of worried respondents; thus, pre-processed data are provided. To access raw survey data, one must enter into an agreement with the Delphi Research Group, more information is available at https://cmu-delphi.github.io/delphi-epidata/symptom-survey/data-access.html.
- Safegraph social distancing data for supplementary sensitivity analyses are not provided, more information is available at https://docs.safegraph.com/docs/social-distancing-metrics.
- Underreporting estimates are not publicly available, so case data already corrected based on these estimates are provided.