Skip to content
/ kmdata Public

kmdata: A Database of Reconstructed Individual Patient Level Data from 153 Oncology Clinical Trials

License

Notifications You must be signed in to change notification settings

raredd/kmdata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kmdata

This R[1] package contains a database of 304 reconstructed, patient-level clinical trial datasets on multiple survival endpoints, including overall and progression-free survival.

The data have been extracted from Kaplan-Meier (KM) curves reported in 153 oncology Phase III clinical trial publications.

Studies have been identified through a PubMed search of clinical trials in breast, lung cancer, prostate, and colorectal cancer published between 2014 and 2016.

For each trial that met the our search criteria, we extracted and curated study-level information.

All reported KM survival curves were digitized with the software DigitizeIt.

The digitized KM survival curves were used to estimate (possibly censored) patient-level event times using the Guyot-algorithm[2].

References

[1] R-Core-Team. A Language and Environment for Statistical Computing. R Found Stat Comput. 2018;2: https://www.r-project.org.

[2] Guyot P, Ades AE, Ouwens MJNM, Welton NJ. Enhanced secondary analysis of survival data: Reconstructing the data from published Kaplan-Meier survival curves. BMC Med Res Methodol. 2012;12. doi:10.1186/1471-2288-12-9


Installation

# install.packages('devtools')
devtools::install_github('raredd/kmdata', build_vignettes = TRUE)

Getting started

View a list of the data available in the kmdata package:

library('kmdata')
data(package = 'kmdata')

Or generate individual patient data (IPD) from digitized survival curve data using the method described by Guyot:

library('survival')
sf <- survfit(Surv(time, status) ~ 1, cancer)
aa <- summary(sf, times = 0:8 * 100)

pt <- ipd(sf$time, sf$surv, aa$time, aa$n.risk)
pf <- survfit(Surv(time, event) ~ 1, pt)

Compare:

sf
# Call: survfit(formula = Surv(time, status) ~ 1, data = cancer)
# 
#       n  events  median 0.95LCL 0.95UCL 
#     228     165     310     285     363 
pf
# Call: survfit(formula = Surv(time, event) ~ 1, data = pt)
# 
#       n  events  median 0.95LCL 0.95UCL 
#     228     163     310     285     363 
summary(sf, times = 0:5 * 100)
# Call: survfit(formula = Surv(time, status) ~ 1, data = cancer)
# 
# time n.risk n.event survival std.err lower 95% CI upper 95% CI
#    0    228       0   1.0000  0.0000       1.0000        1.000
#  100    196      31   0.8640  0.0227       0.8206        0.910
#  200    144      41   0.6803  0.0311       0.6219        0.744
#  300     92      29   0.5306  0.0346       0.4669        0.603
#  400     57      25   0.3768  0.0358       0.3128        0.454
#  500     41      12   0.2933  0.0351       0.2320        0.371
summary(pf, times = 0:5 * 100)
# Call: survfit(formula = Surv(time, event) ~ 1, data = pt)
# 
# time n.risk n.event survival std.err lower 95% CI upper 95% CI
#    0    228       0    1.000  0.0000       1.0000        1.000
#  100    196      31    0.864  0.0228       0.8201        0.909
#  200    144      40    0.681  0.0314       0.6221        0.745
#  300     92      29    0.531  0.0348       0.4672        0.604
#  400     57      25    0.379  0.0358       0.3145        0.456
#  500     41      13    0.290  0.0349       0.2295        0.367

Additional resources

See the kmdata intro vignette or more information about generating IPD from digitized survival curves.

About

kmdata: A Database of Reconstructed Individual Patient Level Data from 153 Oncology Clinical Trials

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages