Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on initialization #252

Closed
sbembenek18 opened this issue Nov 10, 2023 · 4 comments
Closed

Question on initialization #252

sbembenek18 opened this issue Nov 10, 2023 · 4 comments

Comments

@sbembenek18
Copy link

The default initialization is PCA -- is that correct? So, is it using the top 50 PCs for the TSNE embedding?
If I wanted to just run my data as is -- what initialization would allow for this?

thanks!

@pavlin-policar
Copy link
Owner

That's right -- the default initialization is PCA. However, t-SNE embeds data into 2D, so we here take the top 2 principal components of the data matrix, and use that as the initialization for the embedding. However, this refers only to the starting positions of the points in the 2D embedding, not to the actual input to the t-SNE algorithm. openTSNE uses the full data matrix, so if you want to do any preprocessing, e.g., taking only the top 50 PCs and using that, you'll have to do this yourself.

So, to answer your question, if you want to construct a t-SNE embedding for your data as is, openTSNE does this by default.

@sbembenek18
Copy link
Author

OK. So, given a data matrix with features, openTSNE, as it's default initialization, calculates the PCs, then takes only the top 2 PCs for initialization. After initialization, the full data matrix with the original (non PCs) features is used to perform the embedding.

If I actually wanted to use e.g., the first 50 PCs as my features as input for the embedding, I would simply calculate this ahead of time and pass this to openTSNE. And to avoid having openTSNE calculate the PCs again, I would (as you showed in '04_large_data_sets') initialize with:

init = openTSNE.initialization.rescale(X[:, :2])

and then use:

openTSNE.TSNE(initialization=init ...)`

To be sure, the parameter n_components is the dimension of the embedding space for tSNE, and your PCA initialization has to use this same number of PCs as well.

Is this correct?

Thanks!

@pavlin-policar
Copy link
Owner

That's all spot on!

@sbembenek18
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants