New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spectral initialization for 1D line #360
Comments
An error in the UMAP version of spectral embedding perhaps?
…On Thu, Feb 13, 2020 at 4:37 AM Dmitry Kobak ***@***.***> wrote:
Hi Leland, I was playing around with embedding a 1D line and noticed that
the spectral initialization does not behave like I think it should.
Here is a reproducible example:
n = 10000
X = np.zeros((n,3))
X[:,0] = np.arange(n)
from sklearn.manifold import SpectralEmbedding
S = SpectralEmbedding(n_components=2, n_neighbors=15).fit_transform(X)
Z = UMAP().fit_transform(X)
plt.figure(figsize=(6,3))
plt.subplot(121)
plt.scatter(Z[:,0], Z[:,1], s=1, c=np.arange(n))
plt.subplot(122)
plt.scatter(S[:,0], S[:,1], s=1, c=np.arange(n))
[image: index]
<https://user-images.githubusercontent.com/8970231/74421054-b5a21200-4e4c-11ea-9383-f57385dc82fc.png>
The spectral embedding looks like a parabola, without any overlaps. But
the UMAP result looks like a parabola folded in two. I ran it with small
n_epochs and then it's even clearer that it is initialized with the
parabola folded in two.
Why would that be?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#360?email_source=notifications&email_token=AC3IUBNSMY6O2VNMFNTVGDDRCUIFJA5CNFSM4KUOJWPKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4INGNOJQ>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC3IUBLDQNBGTDQ53FYW5QDRCUIFJANCNFSM4KUOJWPA>
.
|
That's what I am suspecting. Is this the relevant function? https://github.com/lmcinnes/umap/blob/master/umap/spectral.py#L213 |
@dkobak, yes that would be the function. For the example data you provide, it's using the This is probably an initialization issue (and specifically probably a convergence of the eigenvector calculation): the spectral initialization in uwot, which uses the same graph Laplacian but uses RSpectra to find the eigenvectors, correctly initializes the data to a parabola. Conversely, the Laplacian Eigenmap initialization does a pretty bad job, even though the only difference is the choice of the graph Laplacian. |
Hi @jlmelville.
Do you think it's possible? To be honest, I rather suspected some bug/problem in the UMAP code around the
Not sure what you mean here. My code snippet above uses https://scikit-learn.org/stable/modules/generated/sklearn.manifold.SpectralEmbedding.html which is Laplacian Eigenmaps, and it yields a parabola shape. |
There are a whole bunch of convergence options that are exposed in the routines we are discussing, so it seems feasible that the speed-vs-accuracy trade-off is set incorrectly for some datasets. In my Laplacian Eigenmaps example, I had |
Hmm. Sklearn SpectralEmbedding uses ARPACK Leland's code calls One could change the params of the |
Based on my experiences with RSpectra in uwot, I am sure that decreasing the |
@jlmelville Interestingly, adding a small amount of Gaussian noise to the simulated data makes the problem go away. I wonder if it somehow can make |
You are right. Pavlin and me now found out the same in the linked thread at openTSNE. Decreasing the tolerance to zero does indeed fix this issue can make the runtime a lot slower. So it's not clear that it would be advantageous by default. I guess I will close this issue then. |
Hi Leland, I was playing around with embedding a 1D line and noticed that the spectral initialization does not behave like I think it should.
Here is a reproducible example:
The spectral embedding looks like a parabola, without any overlaps. But the UMAP result looks like a parabola folded in two. I ran it with small
n_epochs
and then it's even clearer that it is initialized with the parabola folded in two.Why would that be?
The text was updated successfully, but these errors were encountered: