Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



9 Commits

Repository files navigation


This package implements a Pearson nearest-centroid classifier that assigns class labels to single samples according to the four transcriptomic UROMOL2021 classes of non-muscle-invasive bladder cancer (NMIBC): class 1, class 2a, class 2b and class 3.

The classifier code was adapted from the consensusMIBC classifier: Kamoun, A et. al. A Consensus Molecular Classification of Muscle-invasive Bladder Cancer, Eur Urol (2019), doi: Both classifiers can be found on our online web application:

A smaller, example data set is provided to run the classifier.


Please cite Lindskrog and Prip et al. An integrated multi-omics analysis identifies prognostic molecular subtypes of non-muscle-invasive bladder cancer. Nat Commun. 2021. PMID: 33863885. DOI: 10.1038/s41467-021-22465-w


You may install this package with devtools:

devtools::install_github("sialindskrog/classifyNMIBC", build_vignettes = TRUE)


classifyNMIBC(x, minCor = .2, gene_id = c("ensembl_gene_ID", "hgnc_symbol")[1])

'x': dataframe with unique genes in rows and samples to be classified in columns (or single named vector of gene expression values). RNA-seq data needs to be log-transformed and micro-array data should be normalized. Gene names may be supplied as Ensembl gene IDs or HUGO gene symbols.

'minCor': a numeric value specifying a minimal threshold for best Pearson's correlation between a sample's gene expression profile and centroids profiles. A sample showing no correlation above this threshold will remain unclassifed and prediction results will be set to NA. Default minCor value is 0.2.

'gene_id': a character value specifying the type of gene identifiers used for the names/rownames of 'x', ensembl_gene_ID for Ensembl gene IDs or hgnc_symbol for HUGO gene symbols.



NMIBC_class <- classifyNMIBC(test_data)

#       NMIBC_class cor_pval separationLevel   Class_1  Class_2a  Class_2b   Class_3
# U0026     Class_1        0       0.4596168 0.8722377 0.7244603 0.8402362 0.7649863
# U1270     Class_1        0       0.8715953 0.9076109 0.7632527 0.7878480 0.8151044
# U0062     Class_1        0       0.7756813 0.8935798 0.7522703 0.7440653 0.8040516
# U0268     Class_1        0       0.1017993 0.8979415 0.7236878 0.8106683 0.8932611
# U1031     Class_1        0       0.8399565 0.8918952 0.7297127 0.8244154 0.8430349
# U2111     Class_3        0       0.6588400 0.6799160 0.7345242 0.6853025 0.7820521

The classifier returns a dataframw with 7 columns:

'NMIBC_Class': the predicted class labels of the samples.

'cor_pval': the p-value associated with the Pearson's correlation between the sample and the nearest centroid.

'separationLevel': gives a measure (ranging from 0 to 1) of how a sample is representative of its consensus class, with 0 meaning the sample is too close to the other classes to be confidently assigned to one class label, and 1 meaning the sample is very representative of its class. This separationLevel is measured as follows: (correlation to nearest centroid - correlation to second nearest centroid) / median difference of sample-to-centroid correlation.

The remaing four columns return the Pearson's correlation values for each sample and each centroid.


No description, website, or topics provided.






No releases published