Skip to content

Latest commit

 

History

History
123 lines (107 loc) · 4.95 KB

README.md

File metadata and controls

123 lines (107 loc) · 4.95 KB

Excellent healthcare data sources

Electronic Health Records

MIMIC (Multiparameter Intelligent Monitoring in Intensive Care)

  • Cost: Free
  • Size: 49,785 hospital admissions of 38,597 patients
  • Requires: Application, human subjects research course, several week turnaround
  • Includes: demographics, vitals, labs, procedures, medications, nursing notes, imaging reports, mortality events
  • Reference: https://doi.org/10.13026/C2XW26
  • Access: https://mimic.mit.edu/docs/gettingstarted
  • Temporal coverage: 2001 through 2012
  • Source: Single Institution - Beth Israel Deaconess Medical Center
  • Owner: MIT Laboratory for Computational Physiology

SYNTHEA

Medical Images

MIDRC (Medical Imaging and Data Resource Center)

  • Cost: Free
  • Size: 29,630 imaging studies of 10,835 patients (and growing)
  • Labels: COVID
  • Requires: No application
  • Includes: imaging, demographics
  • Reference: https://doi.org/10.1117/1.JMI.8.S1.010902
  • Access: https://data.midrc.org/DD
  • Temporal coverage: 2019 onwards
  • Source: Multiple Insitutions
  • Owner: Center for Translational Data Science @ University of Chicago

CheXpert:

National Institute of Health Chest X-ray

  • Cost: Free
  • Size: 108,948 chest radiographs of 32,717 patients
  • Labels: 9 different pathologies
  • Requires: No application
  • Includes: imaging
  • Reference: Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, Ronald M. Summers; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2097-2106
  • Access: https://stanfordmlgroup.github.io/competitions/chexpert/
  • Source: Multiple Institutions
  • Owner: National Institute of Health

Medical Surveys

National Health Interview Survey:

  • Cost: Free
  • Size: Variable, 50-100k responses annually
  • Requires: No application
  • Includes: self-reported health outcomes, demographics, socioeconomic data
  • Reference: https://doi.org/10.18128/D070.V6.4
  • Access: https://nhis.ipums.org/
  • Temporal coverage: 1963 through present
  • Source: Census Microdata
  • Owner: IPUMS

Medical Claims Databases

Humana Synthetic data

  • Cost: Free
  • Requires: No application for the sample (but a company application for the entire extract)
  • Includes: synthetic medical and pharmacy billing data for 1.5m patients
  • Access: https://developers.humana.com/syntheticdata
  • Owner: Humana

Medicare Claims Synthetic Public Use Files

  • Cost: Free
  • Requires: No application
  • Includes: synthetic billing data that mirrors the limited CMS datasets
  • Reference: Borton, J. M., et al. "Data Entrepreneurs’ Synthetic PUF: A Working PUF as an Alternative to Traditional Synthetic and Non-synthetic PUFs." JSM Proceedings, Survey Research Methods Section (2010).
  • Access: https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DE_Syn_PUF
  • Temporal coverage: 2008 through 2010
  • Source: Multiple institutions
  • Owner: Centers for Medicare and Medicaid Services

IBM MarketScan

National Inpatient Sample

  • Cost: $150 to $1000 per year depending on your situation and data selected
  • Size: Huge, this approximates a 20-percent stratified sample of all hospital discharges
  • Requires: Application and 15m training course
  • Includes: fine-grain billing details, hospital reported health outcomes, demographics
  • Access: https://www.distributor.hcup-us.ahrq.gov/Databases.aspx
  • Temporal coverage: 2012 through present
  • Source: Multiple Institutions
  • Owner: Healthcare Cost and Utilization Project

Summary Statistics

Dartmouth Health Atlas

  • Cost: Free
  • Requires: no application or restrictions
  • Includes: medical reimbursement rates, access to care, mortality, hospital capacity
  • Access: https://data.dartmouthatlas.org/#rates
  • Temporal coverage: 2011 through 2019
  • Source: Multiple institutions
  • Owner: Dartmouth