Fetching contributors…
Cannot retrieve contributors at this time
76 lines (64 sloc) 3.07 KB
title tags authors affiliations date bibliography
biotmle: Targeted Learning for Biomarker Discovery
targeted learning
variable importance
causal inference
name orcid affiliation
Nima S. Hejazi
name orcid affiliation
Weixin Cai
name orcid affiliation
Alan E. Hubbard
name index
Division of Biostatistics, University of California, Berkeley
26 July 2017


The biotmle package provides an implementation of a biomarker discovery methodology based on targeted minimum loss-based estimation (TMLE) [@vdl2011targeted] and a generalization of the moderated t-statistic of [@smyth2004linear], designed for use with biological sequencing data (e.g., microarrays, RNA-seq). The statistical approach made available in this package relies on the use of TMLE to rigorously evaluate the association between a set of potential biomarkers and another variable of interest while adjusting for potential confounding from another set of user-specified covariates. The implementation is in the form of a package for the R language for statistical computing [@R].

There are two principal ways in which the biomarker discovery techniques in the biotmle R package can be used: to evaluate the association between (1) a phenotypic measure (say, environmental exposure) and a biomarker of interest, and (2) an outcome of interest (e.g., survival status at a given time) and a biomarker measurement, both while controlling for background covariates (e.g., BMI, age). By using an estimation procedure based on TMLE, the package produces results based on the Average Treatment Effect (ATE), a statistical parameter with a well-studied causal interpretation (see @vdl2011targeted for extended discussions), making the biotmle R package well-suited for applications in bioinformatics, epidemiology, and genomics.

After adjusting our data set to be consistent with the expect input format -- please consult the vignette accompanying the R package for details -- we would call the principal function of this R package: biomarkertmle.

We would perform a moderated test on the output of the biomarkertmle function using the function modtest_ic.

While the principal table of results produced by this R package matches those produced by the well-known limma R package [@smyth2005limma], there are also several plot methods made available for the bioTMLE S4 class -- subclassed from the popular SummarizedExperiment class -- introduced by this package [@huber2015orchestrating]. For illustrative purposes, we demonstrate the ouput of two such functions on anonymized experimental data below:

Heatmap visualizing the Average Treatment Effect contribution of a change in exposure to each biomarker of interest

Volcano plot visualizing the log fold change in the Average Treatment Effect against the raw p-value from the moderated t-test performed on each biomarker