Skip to content

tracking medical datasets, with a focus on medical imaging

Notifications You must be signed in to change notification settings

mateuszbuda/medical-datasets

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 

Repository files navigation

List of Medical Datasets

I maintain this list mostly as a personal braindump of interesting medical datasets, with a focus on medical imaging.
Rather than try to group / cluster datasets, I'm going to try to maintain a set of keywords for each.
See commit log for a list of additions over time.

Disclaimer: please remember to solve real clinical problems ☺

Main Medical Imaging List

224,316 chest radiographs of 65,240 patients, with labels from reports
Keywords: very-large, X-ray, labels

100000 radiographs
Keywords: very-large, X-ray, labels

371,920 chest x-rays associated with 227,943 imaging studies
3/16/2019: Not yet linked with MIMIC ICU data. See news article
v2: free-text radiology reports
Need to request access
Keywords: very-large, X-ray, labels

160,000 images from 67,000 patients that were interpreted and reported by radiologists
labeled with 174 different radiographic findings, 19 differential diagnoses and 104 anatomic locations organized as a hierarchical taxonomy mapped to standard Unified Medical Language System (UMLS)
Keywords: very-large, X-ray, labels

Several collections
Tons of Images of various kinds, including CT, MR, Pathology, PT, with diagnoses
Keywords: vary-large, CT, MR, labels

Part of Cancer Imaging Archive
50000+ patients with CT data, some pathology, limited availability
Keywords: very-large, CT, labels

32000+ CT scans with annotations, meta-data, semantic labels from radiological reports
Keywords: very-large, CT, labels

10,000+ labeled echocardiogram videos and human expert tracing
Keywords: very-large, ultrasound, labels

MRI for 8500 young (9-10yo) subjects (about 4100 for training)
Keywords: large, MRI

two large scale neuroimaging datasets on reading and language development
Over 3000 MRI, fMRI
article | more resources
Keywords: large, MRI

414 T1 MRIs from the OASIS dataset, processed using FreeSurfer and SAMSEG
Includes original images, along with processed volumes and resulting anatomical segmentation maps
Keywords: large, MRI, segmentations, labels, annotations, processed

1,370 knee MRI exams with diagonsis (healthy/ACL tear/meniscal tear)
Keywords: large, MRI, labels

k-space data
1500 fully sample knee MRIs and 10K clinical MRIs, and 6.5K brain MRIs.
Part of a challenge
Keywords: large, MRI, k-space

Open-Access Multi-Coil k-Space Dataset for Cardiovascular Magnetic Resonance Imaging
k-space data, roughly 250 volumes
Keywords: medium, MRI, k-space

1704 MRI, 556 amyloid and tau CSF samples, blood markers, genetic info and longitudinal cognitive data on ~400 at risk individuals
Keywords: medium, MRI, genetics, labels

10 Medical image datasets with segmentations
2000+ CT & MR images of various organs from different sources
Keywords: medium, MRI, segmentations

Multiple Acquisitions for Standardization of Structural Imaging Validation and Evaluation
8000 diffusion-weighted volumes
10 3D FLAIR, T1-, and T2-weighted datasets of a single healthy subject
Keywords: large, MRI

1000+ fMRI and other modalities subjects with annotated event files; raw and preprocessed
Keywords: medium, fMRI

List of mri k-space datasets

Few subjects, but many modalities (T1,T2,SWI,Angio,DWI, fMRI during Forrest Gump at 3T (audio+visual+eyetracking+physio) and 7T (audio+physio only), some audio tasks, and other important visual tasks)
Keywords: small, multi-modal

LIDC-IDRI consists of diagonstic and lung cancer screening CTs.
1018 cases with some Radiologist Annotations/Segmentations and nodule counts
Also available through LUng Nodule Analysis (LUNA) challenge
Keywords: large, CT, labels

All imaging
Fundus imaging
Keywords: very-large

4703 CXR of COVID19 patients, manually annotated Brixia score
Keywords: large, x-ray, covid

349 CT images collected from several COVID19-related papers
Image captions
Keywords: medium, CT, covid

~150 xrays, ongoing, some hospital data
Keywords: medium, x-ray, covid

~5000 xrays
Keywords: medium, x-ray, pneumonia

~100 segmented CT slices
Keywords: medium, CT, segmentations, covid

1350+ Xrays, 150+ CTs, 800 diagnoses
Keywords: medium, CT, covid

~250 chest CTs with positive RT-PCR SARS-CoV-2, annotations of COVID-19 lesions Keywords: medium, CT, covid, annotations, segmentations

ongoing, about 60 patients at last check, CT
paper pdf
Keywords: medium, CT, covid

1000+ CTs of COVID19 patients
50 are annotated per pixel
Keywords: large, CT, covid, segmentations

1000 X-rays and 240 CTs with annotations (paper)
Keywords: large, CT, covid, segmentations

Various imaging (longitudinal MRI), Genetics, Clinical data
Several thousand patients
Keyworks: large, MRI, genetics, clinical

~120 image volumes (whole body CT and MRI images)
more than 1900 annotated anatomical structures
Keywords: medium, MRI, CT, whole-body, manual-segmentation

Seems like 101 manually labelled brain MRIs
Keywords: medium, MRI, brain, manual-segmentation

3000 brain scans (T1w, bold, events)
Standardized tests, scores, demographics
Keywords: large, MRI, fMRI, tests

2600+ scanned film mammography studies
Keywords: large, x-ray

63 manually labelled brain scans. Costs ($1500?) Discussion
Keywords: medium, MRI, brain, manual-segmentation, costly

This is a challenge for ISBI2019

22 particiapnts with cognitive and physiological mreasures, and 7T rs-fMRI

200+ subjects across several datasets (CTs, Xrays, MRIs)

20 cardiac MR images in Congenital Heart Disease

paper
~50 children (~10yo) with single follow-up with MRI, fMRI and assesments
Keywords: medium, fMRI, longitudinal

paper
3T fMRI 132 typical dev children, 2 time points, four tasks
Keywords: medium, fMRI, longitudinal

aggregates auditory story-listening fMRI datasets acquired over the course of roughly seven years
Keywords: medium, fMRI

229 T1-weighted MRI scans (n=220) with lesion segmentation
MNI152 standard-space T1-weighted average structural template image
A .csv file containing lesion metadata
paper
Keywords: medium, MRI, segmentations

21 Canine mammary carcinoma whole slide images.
Annotated by 2/3 experts Keywords: small, 2D, whole slide imaging

48 manually annotated in utero fetal MR
Keywords: small, mri, fetal, labels

Single voluneer, 73 Sessions at multiple sites over ~17 years
MRI, at least T1 at each session, with other modalities varying by session.
Phenotype file provided
Keywords: small, MRI, longitudinal

Single volume, histological space , 100 micron) with GM/WM surfaces and cortical layers
ftp://bigbrain.loris.ca | interactive
Keywords: small, histology, high-resolution, segmentations

Single volume, ultra-high resolution MRI dataset (100-micron)
Keywords: small, MRI, brain

8-subjects large-scale fMRI (40-sessions, high sampling, high resolution). T1w, T2w, T2*w MRI
Video description
Keywords: small, MRI, brain, fMRI

(ex-vivo) brain MRIs or brains of different animals
Keywords: small, MRI, brain, animals

Three Diffusion of healthy traveling adults
Keywords: small, MRI, diffusion, brain

Prenatal brain MRI samples (looks like single subject?)
Keywords: small, MRI, fetal

Non-imaging

predict sepsis in an ICU population
5000 ICU patients in three separate hospital systems

detailed information about critical care stays for over 200,000 admissions at 200+ hospitals across the US.
With access to MIMIC, can access eICU-CRD immediately after signing an updated DUA.
paper

Non-medical but useful / fun

Moment in time

Other lists or pooling resources (relevant xkcd)

About

tracking medical datasets, with a focus on medical imaging

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published