Permalink
Fetching contributors…
Cannot retrieve contributors at this time
76 lines (64 sloc) 3.07 KB
title tags authors affiliations date bibliography
biotmle: Targeted Learning for Biomarker Discovery
targeted learning
variable importance
causal inference
bioinformatics
genomics
R
name orcid affiliation
Nima S. Hejazi
0000-0002-7127-2789
1
name orcid affiliation
Weixin Cai
0000-0003-2680-3066
1
name orcid affiliation
Alan E. Hubbard
0000-0002-3769-0127
1
name index
Division of Biostatistics, University of California, Berkeley
1
26 July 2017
paper.bib

Summary

The biotmle package provides an implementation of a biomarker discovery methodology based on targeted minimum loss-based estimation (TMLE) [@vdl2011targeted] and a generalization of the moderated t-statistic of [@smyth2004linear], designed for use with biological sequencing data (e.g., microarrays, RNA-seq). The statistical approach made available in this package relies on the use of TMLE to rigorously evaluate the association between a set of potential biomarkers and another variable of interest while adjusting for potential confounding from another set of user-specified covariates. The implementation is in the form of a package for the R language for statistical computing [@R].

There are two principal ways in which the biomarker discovery techniques in the biotmle R package can be used: to evaluate the association between (1) a phenotypic measure (say, environmental exposure) and a biomarker of interest, and (2) an outcome of interest (e.g., survival status at a given time) and a biomarker measurement, both while controlling for background covariates (e.g., BMI, age). By using an estimation procedure based on TMLE, the package produces results based on the Average Treatment Effect (ATE), a statistical parameter with a well-studied causal interpretation (see @vdl2011targeted for extended discussions), making the biotmle R package well-suited for applications in bioinformatics, epidemiology, and genomics.

After adjusting our data set to be consistent with the expect input format -- please consult the vignette accompanying the R package for details -- we would call the principal function of this R package: biomarkertmle.

We would perform a moderated test on the output of the biomarkertmle function using the function modtest_ic.

While the principal table of results produced by this R package matches those produced by the well-known limma R package [@smyth2005limma], there are also several plot methods made available for the bioTMLE S4 class -- subclassed from the popular SummarizedExperiment class -- introduced by this package [@huber2015orchestrating]. For illustrative purposes, we demonstrate the ouput of two such functions on anonymized experimental data below:

Heatmap visualizing the Average Treatment Effect contribution of a change in exposure to each biomarker of interest

Volcano plot visualizing the log fold change in the Average Treatment Effect against the raw p-value from the moderated t-test performed on each biomarker

\newpage

References