Skip to content

serghiou/clinicaltrialr

Repository files navigation

clinicaltrialr

Overview

ClinicalTrials.gov provides a fantastic resource of data on all registered clinical trials. However, even though it also provides API access, it is not possible to easily create a readily analysable database with all study fields of interest. This package aims to make using data from ClinicalTrials.gov as easy as it could be, by providing the functions required to extract all data for all studies of interest. It does so by creating an interface to the old ClinicalTrials.gov API, which can be found here; more in the alternative sources below.

Usage

  1. Install package.

    # Get all packages necessary for this example
    pkgs <- c("devtools", "magrittr", "pbapply", "dplyr")
    missing_pkgs <- pkgs[!(pkgs %in% installed.packages()[,"Package"])]
    if(length(missing_pkgs)) install.packages(missing_pkgs)
    
    # Download the clinicaltrialr package and all dependencies
    devtools::install_github("serghiou/clinicaltrialr")
  2. Build a query using Advanced Search and copy the URL, e.g. http://www.clinicaltrials.gov/ct2/results?cond=Heart+Failure.

  3. Download the results table corresponding to the copied link.

    library(clinicaltrialr)
    results <- ct_read_results("http://www.clinicaltrials.gov/ct2/results?cond=Heart+Failure")
  4. Download all records and construct a dataframe.

    # Install and load pbapply to parallelize this step
    if (!('pbapply' %in% installed.packages()[,"Package"])) install.packages("pbapply")
    
    # Extract data from each trial (this is time-consuming)
    # (note that you may need to use a different cl number if your CPU has less than 8 cores)
    trials_list <- pbapply::pblapply(results$`NCT Number`, ct_read_trial, cl = 7)
    trials <- dplyr::bind_rows(trials_list)
  5. Re-extract values for which the algorithm was not allowed acccess to the website.

    missing_idx <- grep("Error in open", trials_list)
    missing_nct <- results$`NCT Number`[missing_idx]
    missing_doc <- pbapply::pblapply(missing_nct, read_trials, cl = 7)
    trials_list[missing_idx] <- missing_doc
    trials <- dplyr::bind_rows(trials_list)
  6. Save as CSV in a folder called "output".

    write_csv(trials, "../output/trial-records.csv")

Alternatives

Acknowledgements

  • This package was built using information provided by ClinicalTrials.gov here.

  • This package would not have been possible without the xml2 package.

  • All I know about building packages I owe to Hadley Wickham and Jennifer Bryan's book!

TODO

This release only contains the basic functions to download and extract popular fields of studies on ClinicalTrials.gov. The following functions would also be great to have and contributions by anyone with the time and interest to enhance this package are more than welcome!

  1. Migrate the current functions to the new API, rather than the old one.
  2. A function to download all records of interest in bulk and process that XML file, rather than downloading one at a time.
  3. A function to download all of ClinicalTrials.gov.
  4. A function to extract even more fields from each study record.
  5. Tidy up complex fields, such as primary and secondary outcomes, which can be placed within lists.
  6. Functions to help with data post-processing (e.g. identify cluster trials, overall survival outcomes, etc.)
  7. Additional functions (e.g. should this have already been published, etc.)

About

clinicaltrialr: Import study records from ClinicalTrials.gov

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages