clinicaltrialr

Overview

ClinicalTrials.gov provides a fantastic resource of data on all registered clinical trials. However, even though it also provides API access, it is not possible to easily create a readily analysable database with all study fields of interest. This package aims to make using data from ClinicalTrials.gov as easy as it could be, by providing the functions required to extract all data for all studies of interest. It does so by creating an interface to the old ClinicalTrials.gov API, which can be found here; more in the alternative sources below.

Usage

Install package.

# Get all packages necessary for this example
pkgs <- c("devtools", "magrittr", "pbapply", "dplyr")
missing_pkgs <- pkgs[!(pkgs %in% installed.packages()[,"Package"])]
if(length(missing_pkgs)) install.packages(missing_pkgs)

# Download the clinicaltrialr package and all dependencies
devtools::install_github("serghiou/clinicaltrialr")

Build a query using Advanced Search and copy the URL, e.g. http://www.clinicaltrials.gov/ct2/results?cond=Heart+Failure.

Download the results table corresponding to the copied link.

library(clinicaltrialr)
results <- ct_read_results("http://www.clinicaltrials.gov/ct2/results?cond=Heart+Failure")

Download all records and construct a dataframe.

# Install and load pbapply to parallelize this step
if (!('pbapply' %in% installed.packages()[,"Package"])) install.packages("pbapply")

# Extract data from each trial (this is time-consuming)
# (note that you may need to use a different cl number if your CPU has less than 8 cores)
trials_list <- pbapply::pblapply(results$`NCT Number`, ct_read_trial, cl = 7)
trials <- dplyr::bind_rows(trials_list)

Re-extract values for which the algorithm was not allowed acccess to the website.

missing_idx <- grep("Error in open", trials_list)
missing_nct <- results$`NCT Number`[missing_idx]
missing_doc <- pbapply::pblapply(missing_nct, read_trials, cl = 7)
trials_list[missing_idx] <- missing_doc
trials <- dplyr::bind_rows(trials_list)

Save as CSV in a folder called "output".

write_csv(trials, "../output/trial-records.csv")

Alternatives

ClinicalTrials.gov has its own API interface to their new API (this package at the moment uses the old API). This can be used to create an XML file of all records about studies of interest, a CSV file with specific fields from all studies of interest or a CSV file with just one field of interest for all studies of interest. These allow the retrieval of at most 100, 1000 or all records, but using the fields min_rnk and max_rnk it is possible to, in chunks, download all records of interest (in XML for the former, in XML/CSV for the latter two).
There is an rclinicaltrials package. However, (a) it does not allow for the complicated kind of queries that I would like to use and for which I needed to use the Advanced Search function of the website and (b) it creates dataframes that are not easy to analyze and share with others possibly using other platforms.
A list of clinincal trial registry scrapers can be found in the opentrials/registers GitHub repository.

Acknowledgements

This package was built using information provided by ClinicalTrials.gov here.
This package would not have been possible without the xml2 package.
All I know about building packages I owe to Hadley Wickham and Jennifer Bryan's book!

TODO

This release only contains the basic functions to download and extract popular fields of studies on ClinicalTrials.gov. The following functions would also be great to have and contributions by anyone with the time and interest to enhance this package are more than welcome!

Migrate the current functions to the new API, rather than the old one.
A function to download all records of interest in bulk and process that XML file, rather than downloading one at a time.
A function to download all of ClinicalTrials.gov.
A function to extract even more fields from each study record.
Tidy up complex fields, such as primary and secondary outcomes, which can be placed within lists.
Functions to help with data post-processing (e.g. identify cluster trials, overall survival outcomes, etc.)
Additional functions (e.g. should this have already been published, etc.)

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
R		R
inst/extdata		inst/extdata
man		man
packrat		packrat
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CRAN-RELEASE		CRAN-RELEASE
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
cran-comments.md		cran-comments.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

clinicaltrialr

Overview

Usage

Alternatives

Acknowledgements

TODO

About

Releases

Packages

Languages

License

serghiou/clinicaltrialr

Folders and files

Latest commit

History

Repository files navigation

clinicaltrialr

Overview

Usage

Alternatives

Acknowledgements

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages