K2Taxonomer is an R package built around a "top-down" recursive partitioning framework to perform unsupervised learning of nested “taxonomy-like” subgroups from high-throughput -omics data. This framework was devised to be flexible to different data structures, enabling its applicability to analyze both bulk and single-cell data sets. In addition to implementing the algorithm, this package includes functionality to annotate estimated subgroups using gene- and pathway-level analyses.
The recursive partitioning approach utilized by K2Taxonomer
presents advantages over conventional unsupervised approaches, including:
- Identification of robust partitions of a set of observations by aggregating ensembles of partition estimates from repeated perturbations of the data.
- Performing partition-specific feature selection, preventing the need to perform feature selection on the whole data set prior to running the algorithm.
- Tailoring of analyses to specific data structures through the use of different clustering algorithms for performing partition estimation.
The documentation of this package describes how K2Taxonomer
can be applied to either bulk or single-cell gene expression data. For analyses of single-cell gene expression data K2Taxonomer
is designed to characterize nested subgroups of previously identified cell types, such as those previously estimated by scRNAseq clustering analysis.
A preprint of the manuscript describing K2Taxonomer
is publicly available here.
https://montilab.github.io/K2Taxonomer/
- R (>= 4.0)
You may install K2Taxonomer
from GitHub directly using the devtools
R package or clone the repository and download from source. Typical download time is around 1 minute.
install.packages("devtools")
devtools::install_github("montilab/K2Taxonomer")
Here we demonstrate the basic functionality of K2Taxonomer
, which is described in more detail in the vignette, Running K2Taxonomer.
An alternative workflow for running K2Taxonomer
for subgrouping cell type labels using single-cell expression data is described in the vignette, Running K2Taxonomer on single-cell RNA sequencing data.
## K2Taxonomer package
library(K2Taxonomer)
## For creating and manipulating ExpressionSet objects
library(Biobase)
## Read in ExpressionSet object
data(sample.ExpressionSet)
K2res <- K2preproc(sample.ExpressionSet)
K2res <- K2tax(K2res,
stabThresh=0.5)
K2res <- runDGEmods(K2res)
genesetsMadeUp <- list(
GS1=genes[1:50],
GS2=genes[51:100],
GS3=genes[101:150]
)
K2res <- runGSEmods(K2res,
genesets=genesetsMadeUp,
qthresh=0.1)
K2res <- runGSVAmods(K2res,
ssGSEAalg="gsva",
ssGSEAcores=1,
verbose=FALSE)
K2res <- runDSSEmods(K2res)
K2dashboard(K2res,
analysis_name="Example",
output_dir=".")