Issue 1453: MDS: fall back to SVD when the full similarity matrix is known #3141

Closed
wants to merge 2 commits into
from

Projects

None yet

4 participants

Contributor

The SVD method can now be used to calculate the metric MDS. In order to do this a new argument method which defaults to smacof and can be set to svd was introduced in the MDS estimator.
This was done in order to fix issue #1453.

@ogrisel ogrisel commented on the diff Jul 30, 2014
sklearn/manifold/mds.py
+ """
+
+ similarities, = check_arrays(similarities, sparse_format='dense')
+ n_samples = similarities.shape[0]
+
+ if similarities.shape[0] != similarities.shape[1]:
+ raise ValueError("similarities must be a square array (shape=%d)" %
+ n_samples)
+ if not np.allclose(similarities, similarities.T):
+ raise ValueError("similarities must be symmetric")
+
+ H = np.eye(*similarities.shape) - 1./n_samples*np.ones(similarities.shape)
+ K = -0.5*np.dot(H, np.dot(similarities**2, H))
+ w, V = np.linalg.eig(K)
+ # Sort eigenvalues and eigenvectors in decreasing order
+ ix = np.argsort(w)[::-1]
ogrisel
ogrisel Jul 30, 2014 Owner

w is already sorted by default, no?

NelleV
NelleV Apr 2, 2015 Owner

Addressed in my branch (else, no, but I've switched to sparse.linalg.eigs, which does).

@ogrisel ogrisel commented on the diff Jul 30, 2014
sklearn/manifold/mds.py
+ number of dimension in which to immerse the similarities
+ overridden if initial array is provided.
+ """
+
+ similarities, = check_arrays(similarities, sparse_format='dense')
+ n_samples = similarities.shape[0]
+
+ if similarities.shape[0] != similarities.shape[1]:
+ raise ValueError("similarities must be a square array (shape=%d)" %
+ n_samples)
+ if not np.allclose(similarities, similarities.T):
+ raise ValueError("similarities must be symmetric")
+
+ H = np.eye(*similarities.shape) - 1./n_samples*np.ones(similarities.shape)
+ K = -0.5*np.dot(H, np.dot(similarities**2, H))
+ w, V = np.linalg.eig(K)
ogrisel
ogrisel Jul 30, 2014 Owner

Better use the linalg package from scipy: scipy has a mandatory requirement on an optimized BLAS / LAPACK. This is not the case fo numpy that can therefore be much slower.

ogrisel
ogrisel Jul 30, 2014 Owner

BTW: I don't understand why the method is called SVD if we don't internally use an SVD solver but instead use an eigen decomposition solver. Do you have a reference that motivates this way of implementing the "svd method for MDS"?

NelleV
NelleV Apr 2, 2015 Owner

Classical mds, where the similarity matrix is a centered inner product, can be solved using an eigenvalue decomposition. One reference could be elements of statistical learning, eq 14.100 (exercise 14.11). If the similarities are euclidean distances, we can convert these into a centered inner product (section 18.10).

@ogrisel ogrisel commented on the diff Jul 30, 2014
sklearn/manifold/tests/test_mds.py
@@ -59,3 +61,63 @@ def test_MDS():
[4, 2, 1, 0]])
mds_clf = mds.MDS(metric=False, n_jobs=3, dissimilarity="precomputed")
mds_clf.fit(sim)
+
+
+def test_svd_mds():
+ # Generate 4 randomly chosen points
+ Y = np.array([[1, 0, 1],
+ [-1, 3, 2],
+ [1, -2, 3],
+ [2, -1, -3]])
+ sim = euclidean_distances(Y)
+ # calculate error or smacof-based solution
+ X_smacof, smacof_stress = mds.smacof(sim, n_components=2, random_state=42)
+ X_svd, svd_stress = mds.svd_mds(sim, n_components=2)
+ assert_less(svd_stress, smacof_stress)
ogrisel
ogrisel Jul 30, 2014 Owner

Can you please put an inline comment that explains why this should always be the case?

ogrisel
ogrisel Jul 30, 2014 Owner

Should X_smacof and X_svd be approximately be equal up to a sign flip? If so you could check the equality of the component wise absolute values.

NelleV
NelleV Apr 2, 2015 Owner

Actually, smacof tries to solve a not convex problem using convex optimization technics. There is no guarantee to find the optimum (while there is using the eigenvector strategy).

@ogrisel ogrisel commented on the diff Jul 30, 2014
sklearn/manifold/mds.py
+ n_samples)
+ if not np.allclose(similarities, similarities.T):
+ raise ValueError("similarities must be symmetric")
+
+ H = np.eye(*similarities.shape) - 1./n_samples*np.ones(similarities.shape)
+ K = -0.5*np.dot(H, np.dot(similarities**2, H))
+ w, V = np.linalg.eig(K)
+ # Sort eigenvalues and eigenvectors in decreasing order
+ ix = np.argsort(w)[::-1]
+ w = w[ix]
+ V = V[:, ix]
+ if not np.all(w >= -1e-12):
+ raise ValueError("similarities must be euclidean")
+ X = np.sqrt(w[:n_components])*V[:, :n_components]
+ dists = euclidean_distances(X)
+ stress = ((similarities.ravel() - dists.ravel()) ** 2).sum() / 2
NelleV
NelleV Apr 2, 2015 Owner

Fixed in my branch.

@ogrisel ogrisel commented on the diff Jul 30, 2014
sklearn/manifold/mds.py
@@ -257,6 +257,44 @@ def smacof(similarities, metric=True, n_components=2, init=None, n_init=8,
return best_pos, best_stress
+def svd_mds(similarities, n_components=2):
+ """
+ Computes multidimensional scaling using SVD algorithm
+
+ Parameters
+ ----------
+ similarities : symmetric ndarray, shape (n_samples, n_samples)
+ similarities between the points
+
+ n_components : int, optional, default: 2
+ number of dimension in which to immerse the similarities
+ overridden if initial array is provided.
+ """
+
+ similarities, = check_arrays(similarities, sparse_format='dense')
ogrisel
ogrisel Jul 30, 2014 Owner

This will have to rebased on top of master and replaced by a call to check_array, see:

http://scikit-learn.org/stable/developers/utilities.html#validation-tools

NelleV
NelleV Apr 2, 2015 Owner

Fixed in my branch.

Owner
NelleV commented Apr 2, 2015

Here is a new PR : #4485

Owner

closing as replaced by #4485.

@amueller amueller closed this Oct 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment