You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that it is possible to extract arbitrary number of clusters with hdbscan by using the cutree function on the hc component of the hdbscan output. But is there any simple ways to get the membership probabilities for each element given the fixed number of cluster? I.e. a matrix which gives the cluster probabilities for each element and cluster (such as fanny? in cluster`)?
The text was updated successfully, but these errors were encountered:
The clusters given any given 'flat' cut through the HDBSCAN hierarchy corresponds to a DBSCAN* clustering with a non-normalized KNN density estimate given by 1/core_dist(x) for each point.
The so-called 'membership probabilities' are effectively just the ratio of the difference between the points core distance from a given clusters maximum core distance.
So to get these values, all one needs is the core distance.
library(dbscan)
data("DS3", package="dbscan")
minPts<-25Lhcl<- hdbscan(DS3, minPts, gen_hdbscan_tree=TRUE)
# plot(DS3, col = cl$cluster+1L)## Core distance is needed to calculate membership probabilitiescore_dist<- kNNdist(DS3, k=minPts-1)[, minPts-1]
## Substitute k / h for whatever you wantcl<- cutree(hcl$hc, k=5L)
cluster_ids<- Filter(function(x){ x!=0L }, unique(cl))
prob<- rep(0, length(cl))
for (cidin unique(cluster_ids)) {
max_f<- max(core_dist[cl==cid])
pr<- (max_f-core_dist[cl==cid])/max_fprob[cl==cid] <-pr
}
membership_prob<-prob/sum(prob) ## membership probabilities
Note that the KNN density estimate is not smooth, and the derived membership 'probabilities' are very course.
The values given by default for HDBSCAN are the probabilities for the salient clusters only, which are created by several non-global cuts to the hierarchy. The membership probabilities were more-or-less meant to describe the degree to which each point contributes to the stability of its corresponding cluster.
I noticed that it is possible to extract arbitrary number of clusters with hdbscan by using the
cutree
function on thehc
component of the hdbscan output. But is there any simple ways to get the membership probabilities for each element given the fixed number of cluster? I.e. a matrix which gives the cluster probabilities for each element and cluster (such asfanny? in
cluster`)?The text was updated successfully, but these errors were encountered: