Purpose

Under active development; not ready for use

Purpose

This repository contains scripts for reading, transforming and combining datasets that are relevant for analysis of behavioral health and social determinants of health (SDoH) at the local neighborhood level (using ‘census tract’ as a proxy for ‘neighborhood’) and, where necessary, county level.

Datasets

This list includes datasets which are available at one or more of the following levels of aggregation:

Address (allows for geocoding and attribution of location to census tract)
Census Tract
County

These lower levels of aggregation can be rolled up to state level using FIPS codes.

The list of datasets are tracked in the .csv file located in the data folder, with more specific documentation found below as issues are identified. Please push a commit marking the complete field in the .csv file as TRUE. There are currently 13 datasets completed for inclusion.

topic	datalink	publisher	county	tract	address
Behavioral	Behavioral Risk Factor Surveillance System	CDC	x
Commuting	Longitudinal Employer-Household Dynamics	US Census Bureau Center for Economic Studies	x	x
COVID-19	COVID-19 Cases/Deaths by US County	New York Times	x
COVID-19	COVID-19 County Projections	Columbia University	x
Density	Rural-Urban Commuting Area Codes	USDA	x	x
Economic	Location Affordability Index	Office of the Secretary of Transportation
Economic	Low-Income Housing Tax Credit	HUD		x
Food	Food Environment Atlas	USDA	x
Health	County Health Rankings	U Wisc	x
Health	WONDER Database	CDC
Health	500 Cities: Census Tract-level Data	CDC		x
Healthcare	Geographic Variation Public Use File	CMS	x
Opportunity	Neighborhood Atlas	U Wisc	x	x
Opportunity	All Outcomes by Race, Gender and Parental Income Percentile	Opportunity Insights	x	x
Opportunity	Neighborhood Characteristics	Opportunity Insights	x	x
Provider	Adult Foster Care Homes	LARA			x
Provider	Area Health Resources Files	HRSA	x
Provider	Health Professional Shortage Areas (HPSA) Mental Health	HRSA	x	x	x
Provider	Health Professional Shortage Areas (HPSA) Primary Care	HRSA	x	x	x
Provider	Medically Underserved Areas/Populations (MUA/P)	HRSA	x	x	x
Provider	NPPES	CMS			x
Provider	Inpatient Psychiatric Facility Quality Measure Data by Facility	CMS			x
Provider	Hospitals	HIFLD			x
Provider	Prison Boundaries	HIFLD			x
Provider	Local Law Enforcement Locations	HIFLD			x
Provider	Nursing Homes	HIFLD			x
Provider	incarceration trends by county	VERA	x
Provider	Behavioral Health Treatment Facility Listing	SAMHSA	x	x
Social	Social Vulnerability Index	CDC	x	x
Social	Eviction Lab	Princeton U	x	x
Various	American Community Survey	US Census Bureau	x	x
Various	National Neighborhood Data Archive	NANDA	x	x
Vital	National Vitality Statistics	CDC	x

Processing and Format

There are different output file formats for each level of aggregation in the data.

Census Tract Dataset

The following fields must be included in all files:

dataset: A shortened name of the dataset, to allow for subsetting when datasets are combined.
state: Two-digit state 2010 FIPS code
county: Three-digit county 2010 FIPS code
tract: Six-digit tract 2010 FIPS code
year: The year of the published dataset.
race: Should be marked as pooled where data is not broken out by race. Should be marked as NA when the variable is not related to a population metric, such as in a count of facilities.
gender: Should be marked as pooled where data is not broken out by gender. Should be marked as NA when the variable is not related to a population metric, such as in a count of facilities.
age_range: Should be marked as pooled where data is not broken out by age range. Should be marked as NA when the variable is not related to a population metric, such as in a count of facilities.
var_name: The name of the variable/metric being reported.
value: The numeric value of the measure identified in var_name
stat_type: The type of summary statistic being reported in value. For example: n, mean, se, median, etc.

County-level Dataset

All fields from the census tract level data should be included in all files, other than the tract variable.

Address Dataset

Address-level datasets should include the following fields:

dataset: A shortened name of the dataset, to allow for subsetting when datasets are combined.
state:
county:
tract: The census tract within which the address is located, obtained by using the TBDfun::census_tract function.
address:
lat, lon: Geocoded latitude and longitude coordinates of address
year: The year of the published dataset.
...: Other variables specific to the dataset, which may be of value to retain, though these will not be aggregated in the tract or county-level data.

Variables

A list of available variables in the combined datasets are available in the data dictionary.

Name		Name	Last commit message	Last commit date
Latest commit History 198 Commits
code		code
data		data
docs		docs
test		test
.gitignore		.gitignore
README.Rmd		README.Rmd
README.md		README.md
locals.Rproj		locals.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

docs

docs

test

test

.gitignore

.gitignore

README.Rmd

README.Rmd

README.md

README.md

locals.Rproj

locals.Rproj

Repository files navigation

Purpose

Datasets

Processing and Format

Census Tract Dataset

County-level Dataset

Address Dataset

Variables

About

Releases

Packages

Contributors 3

Languages

j-hagedorn/locals

Folders and files

Latest commit

History

Repository files navigation

Purpose

Datasets

Processing and Format

Census Tract Dataset

County-level Dataset

Address Dataset

Variables

About

Topics

Resources

Stars

Watchers

Forks

Languages