Multi-view hierarchical clustering in R

Installation

install.packages("devtools")
library(devtools)

install_github("pievos101/HC-fused")
library(HCfused)

Approach

Basic usage

Loading some data views

data(view1)
data(view2)

Loading the outcome vector

data(target)

Multi-view clustering using HCfused. Let's cluster the views using the ward.D method.

k   = length(unique(target))
res = HCmv(list(view1, view2), k=k, method="ward.D")

The fused cluster solution can be obtained from

cl = res$cluster

Note, when no k (number of clusters) is set, the optimal number of clusters is inferred by the silhouette coefficient.

Let's check the performance based on the Adjusted R Index (ARI)

require(aricode)
ARI(cl, target)
NMI(cl, target)

The fused affinity matrix can be accesed via

affinityMatrix = res$P

which can be clustered by any clustering algorithm

distanceMatrix = 1 - affinityMatrix
fused = hclust(as.dist(distanceMatrix), method="average")
cl = cutree(fused, k=k)

Let's check the performance based on the Adjusted R Index (ARI)

require(aricode)
ARI(cl, target)
NMI(cl, target)

The fusion algorithm HCfuse

You may want to use your own clustering algorithm and employ laste fusion using the hierarchical fusion algorithm HCfuse.

For instance, lets assume we have two cluster solutions cl1 and cl2.

cl1 = c(1,1,1,2,2,2,3,3,3)
cl2 = c(1,1,2,2,2,3,3,3,3)

Now, we need to create the co-association matrices

ass1 = association(cl1)
ass2 = association(cl2)

These two binary matrices can be fused.

res = HCfuse(list(ass1, ass2))
affinityMatrix = res$NETWORK

The resulting affinity matrix can then be clustered by any clustering algorithm.

Parea: multi-view hierarchical ensemble clustering

require(GA)

First, we need to define the fitness function for the genetic algorithm.

# Fitness function for genetic algorithm
check_ensemble <- function(x, methods=FALSE, omics_in=FALSE, fix.k=NaN){
	ens  <- round(x)
	ens  <- methods[ens]
	res  <- Parea(omics=omics_in, this_method=ens, fix.k=fix.k, type=1)
	return(res$SIL) # silhouette cofficient
} # end of fitness function

The following methods are available.

# Available hierarchical clustering methods
methods = c("single", "complete", "average", "mcquitty", "ward.D",
"ward.D2", "centroid", "median")

Starting the genetic algorithm

fix.k = 5

# Perform the genetic algorithm
res <- ga(
	type = "real-valued", 
	fitness = check_ensemble, methods, list(view1, view2), fix.k, lower = c(1,1), upper = c(8,8),  
	elitism = 20, maxiter = 20, popSize = 20, 
	run = 20, parallel=FALSE)

The inferred methods are within the 'solution' slot

sel       <- methods[round(res@solution)]

Now, we can cluster the data with the inferred methods

res       <-  Parea(list(view1, view2), 
				fix.k=k,
				this_method=sel,
				HC.iter=30, 
				type=1)

cl_ensemble  <- res$cluster

Let's check the performance based on ARI and NMI.

require(aricode)
ARI(cl_ensemble, target)
NMI(cl_ensemble, target)

Miscellaneous

Logo made by Adobe Express Logo Maker: https://www.adobe.com/express/create/logo

References

Please cite the following work in case you find the package useful.

Pfeifer, Bastian, and Michael G. Schimek. "A hierarchical clustering and data fusion approach for disease subtype discovery." Journal of Biomedical Informatics 113 (2021): 103636.

https://www.sciencedirect.com/science/article/pii/S1532046420302641

Please see also

Pfeifer, Bastian, et al. "Integrative hierarchical ensemble clustering for improved disease subtype discovery." 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2021.

https://ieeexplore.ieee.org/abstract/document/9669608

for integrative hierarchical ensemble clustering with HC-fused.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
R		R
application		application
data		data
man		man
simulation		simulation
src		src
DESCRIPTION		DESCRIPTION
HCfused.png		HCfused.png
LICENSE		LICENSE
NAMESPACE		NAMESPACE
Parea1.png		Parea1.png
README.md		README.md
Rbuildignore		Rbuildignore
logo.png		logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-view hierarchical clustering in R

Installation

Approach

Basic usage

The fusion algorithm HCfuse

Parea: multi-view hierarchical ensemble clustering

Miscellaneous

References

About

Releases

Packages

Languages

License

pievos101/HC-fused

Folders and files

Latest commit

History

Repository files navigation

Multi-view hierarchical clustering in R

Installation

Approach

Basic usage

The fusion algorithm HCfuse

Parea: multi-view hierarchical ensemble clustering

Miscellaneous

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages