-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] TSNE PCA Intialization #1029
Comments
I’ve created a more up to date issue for this feature and referenced this issue so that we don’t lose the history trail. Now that we have have a sparse PCA in the Python layer that can accept sparse inputs, I don’t think it should be too hard to port that back to c++, at least for an initial version. If for some reason we need more speed or find issues with numerical stability, we can always try using rsvd or another approach. |
Currently, TSNE's embeddings are intialized from random sampling from a uniform(-0,0001, 0.0001) distribution.
It has been shown that to allow TSNE to be stable during its gradient updates, and to preserve global structure more effectively, utilizing a Randomized SVD or PCA or even a Spectral Embedding as the intial conditions can help.
Currently, cuML has excellent implementations for Truncated SVD and Spectral Embeddings, and so cuML's internal primitives can be used.
Likewise, since TSNE only requires 2 initial components, Halko and Martinsson's 2011 Randomized First Pass SVD can be also investigated, since it has superior speed yet it is accurate.
The text was updated successfully, but these errors were encountered: