New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Linkage 'Z' uses the same cluster more than once thrown when clustering #6785

Closed
lynxoid opened this Issue Nov 14, 2016 · 6 comments

Comments

Projects
None yet
4 participants
@lynxoid

lynxoid commented Nov 14, 2016

I compute cophenet index on the Z matrix generated by the scipy.cluster.hierarchy.linkage function, but the computation errors out w/ ValueError: Linkage 'Z' uses the same cluster more than once. Here is a minimal example:

import numpy as np

from scipy.cluster.hierarchy import linkage
from scipy.spatial.distance import pdist
from scipy.cluster.hierarchy import cophenet
from scipy.cluster.hierarchy import dendrogram

metric = 'hamming'
M = np.loadtxt("z_bug_matrix.txt")
Z = linkage(M, method='average', metric=metric)
# next line throws a "Linkage 'Z' uses the same cluster more than once." error
c, coph_dists = cophenet(Z, pdist(M, metric))
# this also throws the same error:
# dendrogram(
#        Z,
#        truncate_mode='lastp',  # show only the last p merged clusters
#        p=20,                 # show the last 20 merges
#        leaf_rotation=90.,      # rotates the x axis labels
#        leaf_font_size=12.,     # font size for the x axis labels
#        show_contracted=True,    # show item counts in the contracted clusters
#        color_threshold=0.5 # color clusters
#    )

Error thrown in cophenet index:

Traceback (most recent call last):
  File "test_scipy_z_bug.py", line 10, in <module>
    c, coph_dists = cophenet(Z, pdist(M, metric))
  File "/home/users/filippod/.conda/envs/ccs_py35/lib/python3.5/site-packages/scipy/cluster/hierarchy.py", line 1096, in cophenet
    is_valid_linkage(Z, throw=True, name='Z')
  File "/home/users/filippod/.conda/envs/ccs_py35/lib/python3.5/site-packages/scipy/cluster/hierarchy.py", line 1421, in is_valid_linkage
    % name_str)
ValueError: Linkage 'Z' uses the same cluster more than once.

Error thrown in dendrogram:

Traceback (most recent call last):
  File "test_scipy_z_bug.py", line 20, in <module>
    color_threshold=0.5 # color clusters
  File "/home/users/filippod/.conda/envs/ccs_py35/lib/python3.5/site-packages/scipy/cluster/hierarchy.py", line 2227, in dendrogram
    is_valid_linkage(Z, throw=True, name='Z')
  File "/home/users/filippod/.conda/envs/ccs_py35/lib/python3.5/site-packages/scipy/cluster/hierarchy.py", line 1421, in is_valid_linkage
    % name_str)
ValueError: Linkage 'Z' uses the same cluster more than once.

When examining the matrix, I find that an internal node appears twice -- once merged w/ a leaf and once merged w/ an internal node. Since a node is only allowed one parent, this seems to be a bug.

scipy version: 0.18.1
numpy version: 1.11.1

Matrix file is attached (25620 rows).
z_bug_matrix.txt

@nmayorov

This comment has been minimized.

Contributor

nmayorov commented Nov 14, 2016

Hi, it looks like it was a bug indeed, I believe it was fixed in #6495

My comment "the original version relies on the order of merges in nn_chain" --- I think now it is not true and was my mistake, I was mislead by the paper. Sorry about that.

An example that now Z is a valid clustering:

import numpy as np
from scipy.cluster.hierarchy import linkage, is_valid_linkage

metric = 'hamming'
M = np.loadtxt("z_bug_matrix.txt")
Z = linkage(M, method='average', metric='hamming')

print(is_valid_linkage(Z))
True
@lynxoid

This comment has been minimized.

lynxoid commented Nov 14, 2016

Great, thanks! I'll test against master.

Do you guys plan to push a new release soon since there are already 700 commits on top of that?

@nmayorov

This comment has been minimized.

Contributor

nmayorov commented Nov 14, 2016

So yes, it was incorrect before that commit. It could work when the distances are more or less unique (often).

Not sure about release plans, @ev-br ?

Can we consider this issue as resolved? I believe so, I have some related doubts, but I will open another issue for that.

@ev-br

This comment has been minimized.

Member

ev-br commented Nov 14, 2016

Milestone 0.19.0 is currently due in January, https://github.com/scipy/scipy/milestones
The date is just six months from 0.18.0, no more.
It's good idea to start thinking about the release indeed (and that includes a decision on 1.0). But at any rate, chances of anything coming out of the door in 2016 are slim :-).

@ev-br

This comment has been minimized.

Member

ev-br commented Nov 14, 2016

To be clear: help reviewing and/or moving forward open PRs (both marked for 0.19, 1.0 and not) is appreciated.

@lucyantie

This comment has been minimized.

lucyantie commented Feb 28, 2017

Hi all,

I'm sorry to be asking the same question. I'm new to python. I have the same issue when trying to use scipy.ward clustering. I've read the post & earlier bug fix entry but I couldn't understand any of it. Can someone please help me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment