Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] TSNE PCA Intialization #1029

Closed
danielhanchen opened this issue Aug 19, 2019 · 1 comment
Closed

[FEA] TSNE PCA Intialization #1029

danielhanchen opened this issue Aug 19, 2019 · 1 comment
Assignees
Labels
? - Needs Triage Need team to review and classify Algorithm API Change For tracking changes to algorithms that might effect the API CUDA / C++ CUDA issue Cython / Python Cython or Python issue tests Unit testing for project

Comments

@danielhanchen
Copy link
Contributor

danielhanchen commented Aug 19, 2019

Currently, TSNE's embeddings are intialized from random sampling from a uniform(-0,0001, 0.0001) distribution.

It has been shown that to allow TSNE to be stable during its gradient updates, and to preserve global structure more effectively, utilizing a Randomized SVD or PCA or even a Spectral Embedding as the intial conditions can help.

Currently, cuML has excellent implementations for Truncated SVD and Spectral Embeddings, and so cuML's internal primitives can be used.

Likewise, since TSNE only requires 2 initial components, Halko and Martinsson's 2011 Randomized First Pass SVD can be also investigated, since it has superior speed yet it is accurate.

@danielhanchen danielhanchen added ? - Needs Triage Need team to review and classify feature request New feature or request labels Aug 19, 2019
@cjnolet cjnolet added this to Issue- Needs Prioritizing in v0.10 Release via automation Sep 15, 2019
@cjnolet cjnolet added Algorithm API Change For tracking changes to algorithms that might effect the API CUDA / C++ CUDA issue Cython / Python Cython or Python issue tests Unit testing for project and removed feature request New feature or request labels Sep 15, 2019
@JohnZed JohnZed moved this from Issue- Needs Prioritizing to Issue- P2 in v0.10 Release Sep 23, 2019
@JohnZed JohnZed moved this from Issue- P2 to Defer Post 0.10 in v0.10 Release Sep 26, 2019
@cjnolet cjnolet added this to Issue-Needs prioritizing in v0.11 Release via automation Oct 10, 2019
@cjnolet cjnolet removed this from Defer Post 0.10 in v0.10 Release Oct 10, 2019
@cjnolet cjnolet moved this from Issue-Needs prioritizing to Issue-P1 in v0.11 Release Oct 24, 2019
@JohnZed JohnZed moved this from Issue-P1 to Defer to post-0.11 in v0.11 Release Nov 21, 2019
@cjnolet cjnolet added this to Issue-Needs prioritizing in v0.12 Release via automation Nov 21, 2019
@cjnolet cjnolet removed this from Defer to post-0.11 in v0.11 Release Nov 21, 2019
@fondaing fondaing removed this from Issue-Needs prioritizing in v0.12 Release Mar 5, 2020
@fondaing fondaing added this to Issue-Needs prioritizing in v0.13 Release via automation Mar 5, 2020
@fondaing fondaing removed this from Issue-Needs prioritizing in v0.13 Release Apr 3, 2020
@fondaing fondaing added this to Issue-Needs prioritizing in v0.14 Release via automation Apr 3, 2020
@fondaing fondaing removed this from Issue-Needs prioritizing in v0.14 Release Jun 9, 2020
@fondaing fondaing added this to Issue-Needs prioritizing in v0.15 Release via automation Jun 9, 2020
@fondaing fondaing added this to Issue-Needs prioritizing in v0.16 Release via automation Sep 24, 2020
@fondaing fondaing removed this from Issue-Needs prioritizing in v0.15 Release Sep 24, 2020
@cjnolet cjnolet added this to Issue-Needs prioritizing in v0.17 Release via automation Oct 23, 2020
@cjnolet cjnolet removed this from Issue-Needs prioritizing in v0.16 Release Oct 23, 2020
@cjnolet cjnolet removed this from Issue-Needs prioritizing in v0.17 Release Oct 27, 2020
@wphicks wphicks added this to Issue-Needs prioritizing in v21.06 Release via automation Mar 22, 2021
@cjnolet cjnolet removed this from Issue-Needs prioritizing in v21.06 Release Apr 14, 2021
@cjnolet
Copy link
Member

cjnolet commented Apr 14, 2021

I’ve created a more up to date issue for this feature and referenced this issue so that we don’t lose the history trail. Now that we have have a sparse PCA in the Python layer that can accept sparse inputs, I don’t think it should be too hard to port that back to c++, at least for an initial version. If for some reason we need more speed or find issues with numerical stability, we can always try using rsvd or another approach.

@cjnolet cjnolet closed this as completed Apr 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify Algorithm API Change For tracking changes to algorithms that might effect the API CUDA / C++ CUDA issue Cython / Python Cython or Python issue tests Unit testing for project
Projects
None yet
Development

No branches or pull requests

2 participants