ward_tree fails when connectivity matrix is too sparse #3159

Closed
mvdoc opened this Issue May 18, 2014 · 8 comments

Comments

Projects
None yet
6 participants
Contributor

mvdoc commented May 18, 2014

Hi,
when running some simulations I found out that if the connectivity matrix is too sparse, ward_tree fails with an IndexError. It works fine with a different connectivity matrix.

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-44-fdfab3cceda7> in <module>()
----> 1 out = ward_tree(X, connectivity)

/usr/lib/python2.7/dist-packages/sklearn/cluster/hierarchical.pyc in ward_tree(X, connectivity, n_components, copy, n_clusters)
    167         # identify the merge
    168         while True:
--> 169             inert, i, j = heappop(inertia)
    170             if used_node[i] and used_node[j]:
    171                 break

IndexError: index out of range

See this notebook for the simulation code: http://nbviewer.ipython.org/github/mvdoc/notebooks/blob/master/ward_tree_failing.ipynb

Owner

agramfort commented May 18, 2014

can you check if the number of connected components is one?

Contributor

mvdoc commented May 18, 2014

the number of connected components in both cases (failing and not) is > 1

Owner

agramfort commented May 18, 2014

I suspect the code cannot work properly in this case. @GaelVaroquaux any
opinion?

Owner

GaelVaroquaux commented May 18, 2014

I suspect the code cannot work properly in this case. @GaelVaroquaux any
opinion?

It should: we try to complete the graph. However, it seems that it is not
working right.

larsmans added the Bug label May 19, 2014

Contributor

mvdoc commented May 20, 2014

@GaelVaroquaux, I was wondering what you think about the alternative of running the ward's algorithm within each disconnected component, and then reconstructing the tree by computing the distance between these.

This would avoid completing the graph.

Owner

GaelVaroquaux commented May 20, 2014

@GaelVaroquaux, I was wondering what you think about the alternative of
running the ward's algorithm within each disconnected component, and
then reconstructing the tree by computing the distance between these.

It's a change in behavior, and as such I am a bit cautious. But I am not
completely opposed to it.

It would require to be implemented similarly in the hc_tree code.

amueller added this to the 0.15.1 milestone Jul 18, 2014

Contributor

mvdoc commented Dec 17, 2014

With the latest code the bug is somehow fixed.

mvdoc closed this Dec 17, 2014

Owner

MechCoder commented Dec 17, 2014

great :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment