Skip to content

🌐 A collection of tidied, neighborhood-level public datasets

Notifications You must be signed in to change notification settings

j-hagedorn/locals

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Under active development; not ready for use

Purpose

This repository contains scripts for reading, transforming and combining datasets that are relevant for analysis of behavioral health and social determinants of health (SDoH) at the local neighborhood level (using ‘census tract’ as a proxy for ‘neighborhood’) and, where necessary, county level.

Datasets

This list includes datasets which are available at one or more of the following levels of aggregation:

  • Address (allows for geocoding and attribution of location to census tract)
  • Census Tract
  • County

These lower levels of aggregation can be rolled up to state level using FIPS codes.

The list of datasets are tracked in the .csv file located in the data folder, with more specific documentation found below as issues are identified. Please push a commit marking the complete field in the .csv file as TRUE. There are currently 13 datasets completed for inclusion.

topic datalink publisher county tract address
Behavioral Behavioral Risk Factor Surveillance System CDC x
Commuting Longitudinal Employer-Household Dynamics US Census Bureau Center for Economic Studies x x
COVID-19 COVID-19 Cases/Deaths by US County New York Times x
COVID-19 COVID-19 County Projections Columbia University x
Density Rural-Urban Commuting Area Codes USDA x x
Economic Location Affordability Index Office of the Secretary of Transportation
Economic Low-Income Housing Tax Credit HUD x
Food Food Environment Atlas USDA x
Health County Health Rankings U Wisc x
Health WONDER Database CDC
Health 500 Cities: Census Tract-level Data CDC x
Healthcare Geographic Variation Public Use File CMS x
Opportunity Neighborhood Atlas U Wisc x x
Opportunity All Outcomes by Race, Gender and Parental Income Percentile Opportunity Insights x x
Opportunity Neighborhood Characteristics Opportunity Insights x x
Provider Adult Foster Care Homes LARA x
Provider Area Health Resources Files HRSA x
Provider Health Professional Shortage Areas (HPSA) Mental Health HRSA x x x
Provider Health Professional Shortage Areas (HPSA) Primary Care HRSA x x x
Provider Medically Underserved Areas/Populations (MUA/P) HRSA x x x
Provider NPPES CMS x
Provider Inpatient Psychiatric Facility Quality Measure Data by Facility CMS x
Provider Hospitals HIFLD x
Provider Prison Boundaries HIFLD x
Provider Local Law Enforcement Locations HIFLD x
Provider Nursing Homes HIFLD x
Provider incarceration trends by county VERA x
Provider Behavioral Health Treatment Facility Listing SAMHSA x x
Social Social Vulnerability Index CDC x x
Social Eviction Lab Princeton U x x
Various American Community Survey US Census Bureau x x
Various National Neighborhood Data Archive NANDA x x
Vital National Vitality Statistics CDC x

Processing and Format

There are different output file formats for each level of aggregation in the data.

Census Tract Dataset

The following fields must be included in all files:

  • dataset: A shortened name of the dataset, to allow for subsetting when datasets are combined.
  • state: Two-digit state 2010 FIPS code
  • county: Three-digit county 2010 FIPS code
  • tract: Six-digit tract 2010 FIPS code
  • year: The year of the published dataset.
  • race: Should be marked as pooled where data is not broken out by race. Should be marked as NA when the variable is not related to a population metric, such as in a count of facilities.
  • gender: Should be marked as pooled where data is not broken out by gender. Should be marked as NA when the variable is not related to a population metric, such as in a count of facilities.
  • age_range: Should be marked as pooled where data is not broken out by age range. Should be marked as NA when the variable is not related to a population metric, such as in a count of facilities.
  • var_name: The name of the variable/metric being reported.
  • value: The numeric value of the measure identified in var_name
  • stat_type: The type of summary statistic being reported in value. For example: n, mean, se, median, etc.

County-level Dataset

All fields from the census tract level data should be included in all files, other than the tract variable.

Address Dataset

Address-level datasets should include the following fields:

  • dataset: A shortened name of the dataset, to allow for subsetting when datasets are combined.
  • state:
  • county:
  • tract: The census tract within which the address is located, obtained by using the TBDfun::census_tract function.
  • address:
  • lat, lon: Geocoded latitude and longitude coordinates of address
  • year: The year of the published dataset.
  • ...: Other variables specific to the dataset, which may be of value to retain, though these will not be aggregated in the tract or county-level data.

Variables

A list of available variables in the combined datasets are available in the data dictionary.

Releases

No releases published

Packages

No packages published