Question on initialization #252

sbembenek18 · 2023-11-10T04:18:49Z

The default initialization is PCA -- is that correct? So, is it using the top 50 PCs for the TSNE embedding?
If I wanted to just run my data as is -- what initialization would allow for this?

thanks!

pavlin-policar · 2023-11-10T06:57:47Z

That's right -- the default initialization is PCA. However, t-SNE embeds data into 2D, so we here take the top 2 principal components of the data matrix, and use that as the initialization for the embedding. However, this refers only to the starting positions of the points in the 2D embedding, not to the actual input to the t-SNE algorithm. openTSNE uses the full data matrix, so if you want to do any preprocessing, e.g., taking only the top 50 PCs and using that, you'll have to do this yourself.

So, to answer your question, if you want to construct a t-SNE embedding for your data as is, openTSNE does this by default.

sbembenek18 · 2023-11-11T04:04:07Z

OK. So, given a data matrix with features, openTSNE, as it's default initialization, calculates the PCs, then takes only the top 2 PCs for initialization. After initialization, the full data matrix with the original (non PCs) features is used to perform the embedding.

If I actually wanted to use e.g., the first 50 PCs as my features as input for the embedding, I would simply calculate this ahead of time and pass this to openTSNE. And to avoid having openTSNE calculate the PCs again, I would (as you showed in '04_large_data_sets') initialize with:

init = openTSNE.initialization.rescale(X[:, :2])

and then use:

openTSNE.TSNE(initialization=init ...)`

To be sure, the parameter n_components is the dimension of the embedding space for tSNE, and your PCA initialization has to use this same number of PCs as well.

Is this correct?

Thanks!

pavlin-policar · 2023-11-11T08:38:18Z

That's all spot on!

sbembenek18 · 2023-11-12T08:29:54Z

Thanks!

pavlin-policar closed this as completed Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on initialization #252

Question on initialization #252

sbembenek18 commented Nov 10, 2023

pavlin-policar commented Nov 10, 2023

sbembenek18 commented Nov 11, 2023

pavlin-policar commented Nov 11, 2023

sbembenek18 commented Nov 12, 2023

Question on initialization #252

Question on initialization #252

Comments

sbembenek18 commented Nov 10, 2023

pavlin-policar commented Nov 10, 2023

sbembenek18 commented Nov 11, 2023

pavlin-policar commented Nov 11, 2023

sbembenek18 commented Nov 12, 2023