Original ticket http://projects.scipy.org/scipy/ticket/1272 on 2010-09-07 by trac user cassio, assigned to unknown.
The following methods: ward, centroid, median, in the package scipy.cluster.hierarchy are not working on a proximity matrix.
Suppose I have 5 objects and a condensed distance matrix of 10 pairwise distances, when I try to run any of the above methods I get:
ValueError: Valid methods when the raw observations are omitted are 'single', 'complete', 'weighted', and 'average'.
I don't understand why I cannot use ward, centroid or median only providing the proximity matrix, as those algorithms clearly allow for this (relational methods).
In R I can use the hclust command and generate the hierarchies for ward, centroid and median.
Sample code to generate the problem:
from pylab import *
from numpy import *
from scipy.cluster.hierarchy import *
y = random_sample(10)
Z = ward(y)
Merge branch 'pull-391-mstats' into master. Closes #1798.
Reviewed at scipy#391
I'm running into this same problem with SciPy v0.14.0. I'm using scipy.spatial.distance.pdist to create a condensed distance matrix and passing it to ward and I get the aforementioned Valid methods when the raw observations are omitted are 'single', 'complete', 'weighted', and 'average' error.
Valid methods when the raw observations are omitted are 'single', 'complete', 'weighted', and 'average'
I'm running into this problem as well. I'm trying to run a hierarchical clustering procedure on my dataset. I'm using sp.cluster.hierarchy.fclusterdata. Works fine for method='single' or method='complete'. However, when I try 'ward' or 'centroid', I get the following error: Valid methods when the raw observations are omitted are 'single', 'complete', 'weighted', and 'average'.
Any ideas on what to do next?
@tlnagy @chprabhu I think there are math tricks that let you compute things like squared distances to centroids of clusters given all pairs squared distances, letting you sidestep scipy's current restrictions on input format (distances vs. points) for some methods that are currently marked 'euclidean' and which do not allow the distance input format. Maybe someone will implement this for ward clustering at a euroscipy sprint.