This repository contains a county-level dataset of publicly available data that may be pertinent to Covid-19 analyses. The source code is also contained here.
Comments and contributions welcome. Please acknowledge source if used.
The county-level dataset is stored in two formats:
- Stata: county_panel.dta
- CSV: county_panel.csv
These should be identical, except stata-date-format variables are dropped for the csv. The string version of date variables remain
Data cover 3,109 counties, but exclude Alaska and Hawaii due to limited data availability.
All variables are 'time-invariant'. Can easily be matched with other county-level variables using FIPS code, such as COVID case or mobility data.
(with description if necessary, also see section below for more details on source/construction)
- state: State Name
- county: County Name
- fips: 5-digit fips code, numeric
- county_population
- white_pct
- black_pct
- hispanic_pct
- nonwhite_pct
- female_pct
- age29andunder_pct
- age65andolder_pct
- median_hh_inc
- less_hs_pct
- lesscollege_pct
- rural_pct
- trumpshare: Share of voters voting for Trump in 2016 Presidential election
- voter_participation: share of voting population participating in 2016 general election vote
- state_fips: 2 digit fips code for state
- effectivedate_MDY: first date county is under any stay-at-home order (whether county or state declared), in MM/DD/YYYY format
- effectivedate: same as prev, but statadate format (only in .dta file)
- stateeffectivedate_MDY: first date under a STATE stay-at-home order, in MM/DD/YYYY format
- state_effectivedate: same as prev, but statadate format (only in .dta file)
- density_pop: county population per square mile
- density_housing_units:
- vehicles: avg number of vehicles in household
- pubtrans: share of pop taking public transit to work
- teleworkable: share of occupations in county that are 'teleworkable'
- beds_licensed: number of hospital beds in county
- beds_staffed: number of 'staffed' hospital beds in county
- beds_icu: number of icu beds in county
- beds_icu_adult
- beds_icu_pediatric
- obesity: obesity rate in county
- sufficient_phys_activity: share of population getting 'sufficient physical activity'
- binge_drinker: share of pop binge-drinking at least once in last 30 days
- heavy_drinker: share of pop having on avg more than 2 drinks per day over past 30 days
- smoke_daily: share of pop who smoke daily
- diabetes: share of pop with diabetes
Source for variables 4–17. Info on primary sources can be found here: https://github.com/MEDSL/2018-elections-unoffical/blob/master/election-context-2018.md
Source for variables 19–22. methodology/references: https://github.com/jrstromme/covid-stay-at-home-orders
Source for variables 23 and 24.
Source for variables 25–26. Obtained from IPUMS USA. Source file omitted here because it is too large. Variables: STATEFIP, COUNTYFIP, PUMA, VEHICLES, PERNUM, PERWT, OCC, TRANWORK
A crosswalk from PUMA to county was obtained from the Missouri Census Data Center's Geocorr tool: http://mcdc.missouri.edu/applications/geocorr2018.html This creates allocations from PUMA's to counties. This file is saved as ACS/raw/geocorr2018.csv
Source for variable 27. A teleworkable index was constructed from O*NET data based on Dingel and Neiman's methodology. https://github.com/jdingel/DingelNeiman-workathome
Source for variables 28–32. Hospital bed availability by county is available at: https://www.arcgis.com/home/item.html?id=1044bb19da8d4dbfb6a96eb1b4ebf629
Source for variables 33–38 Prevalence of various diseases is available at: http://ghdx.healthdata.org/us-data
All source data, except for ACS data, are included in repository. Please see makefile for order in which .do and .R codes should be run. Some directories in .R and .do files will need to be adjusted if you try to run them on your own. 'make all' will run all code start to finish (But ACS must be downloaded on your own and some paths changed).