Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting hierarchy from DBSCAN #69

Closed
neilireson opened this issue Dec 12, 2019 · 2 comments
Closed

Extracting hierarchy from DBSCAN #69

neilireson opened this issue Dec 12, 2019 · 2 comments

Comments

@neilireson
Copy link

I use ELKI within my Java code and have been trying to export the cluster hierarchy generated with HDBSCAN, however this just results in a single root cluster with the child cluster all being leaves.

In order to "fix" this I changed the collectChildren method in the HDBSCANHierarchyExtraction class.
Replacing
collectChildren(temp, clustering, child, clus, flatten);
with
finalizeCluster(child, clustering, clus, flatten);

This does seem to result in a proper hierarchy, although does return all clusters (including those with fewer than minPts data points). However, my understanding of the code is not enough to know whether this is in any way sensible or correct.

I use the following code to output the hierarchy:

{
        ...create clustering code...

        Relation<NumberVector> coords = db.getRelation(TypeUtil.NUMBER_VECTOR_FIELD_2D);
        List<Cluster<DendrogramModel>> topClusters = clustering.getToplevelClusters();
        Hierarchy<Cluster<DendrogramModel>> hierarchy = clustering.getClusterHierarchy();

        for (Cluster<DendrogramModel> cluster : topClusters) {
            System.out.println("---------------------------------");
            outputHierarchy(cluster, hierarchy, coords, "");
        }
}

private static void outputHierarchy(Cluster<DendrogramModel> cluster,
                                    Hierarchy<Cluster<DendrogramModel>> hierarchy,
                                    Relation<NumberVector> coords,
                                    String indent) {
    final DBIDs ids = cluster.getIDs();
    DendrogramModel model = cluster.getModel();
    System.out.format("%s%s: %d : %.3f%n", indent, cluster.getName(), ids.size(), model.getDistance());
    if (!ids.isEmpty()) {
        System.out.print(indent);
        for (DBIDIter iter = ids.iter(); iter.valid(); iter.advance()) {
            System.out.print(Arrays.toString(coords.get(iter).toArray()));
        }
        System.out.println();
    }
    if (hierarchy != null) {
        if (hierarchy.numChildren(cluster) > 0) {
            for (It<Cluster<DendrogramModel>> iter = hierarchy.iterChildren(cluster); iter.valid();
                 iter.advance()) {
                outputHierarchy(iter.get(), hierarchy, coords, indent + "  ");
            }
        }
    }
}
@kno10
Copy link
Member

kno10 commented Dec 13, 2019

That fix is not correct - the method does need to switch between a "flattening" and a non-flattening strategy. The HDBSCAN logic is based on a rather complex notion of cluster stability; and there could of course be a bug in how this value is computed. But from a quick look, this is supposed to yield a flat clustering then. Because of this notion of cluster stability, I'm not sure if such a hierarchical result is well defined with the HDBSCAN extractor (maybe the clustering.hierarchical.extraction.SimplifiedHierarchyExtraction is more of what you were looking for?)

@neilireson
Copy link
Author

Thanks for that. It seems like extracting what I need will require a little more consideration. I have used the Python HDBSCAN code and that does prune the tree to produce reasonable clusters but still provide a hierarchy. Perhaps I'll try to compare the code, however my knowledge of Python is scant.

@kno10 kno10 closed this as completed Jul 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants