Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional dramatic differences between tSNE and UMAP #319

Closed
jorvis opened this issue Oct 22, 2018 · 5 comments
Closed

Occasional dramatic differences between tSNE and UMAP #319

jorvis opened this issue Oct 22, 2018 · 5 comments

Comments

@jorvis
Copy link
Contributor

jorvis commented Oct 22, 2018

On a test dataset I compute neighbors and then immediately compute/plot both tSNE and UMAP and show them next to each other. Sometimes, we get pretty dramatic differences such as the one attached. Is this an algorithmic difference or something wrong with my approach?

sc.pp.neighbors(adata, n_pcs=n_pcs, n_neighbors=n_neighbors)
sc.tl.tsne(adata, n_pcs=n_pcs, random_state=random_state)
sc.tl.umap(adata)

sc.pl.tsne(adata, color=genes_to_color, color_map='RdBu_r', use_raw=False, save=".png")
sc.pl.umap(adata, color=genes_to_color, color_map='RdBu_r', use_raw=False, save=".png")

screenshot from 2018-10-22 11-57-49

@chlee-tabin
Copy link

This is normal, means that the far away clusters are "globally" more different from the cells that are closer together. UMAP is one way of preserving the global distance, whereas tSNE is pretty much ignorant of the global distance (so one should not consider global distance to make inferences from tSNE plot). I frequently see the UMAP when some very different contaminating cell types are in the sample.

@falexwolf
Copy link
Member

UMAP also has no meaning attached when clusters are completely disconnected (Supplemental Figure 10 of this, soon updated on here on bioRxiv and finally in a journal...); and I'd tend to think that this is such a case. Then, UMAP's parameters have to be adjusted (mostly min_disd and spread).

It's true that UMAP has less tendency to tear apart connected things than tSNE. Overall, it's more faithful to the global topology.

@chlee-tabin
Copy link

@falexwolf Just out of curiosity, have you compared your method with PHATE? (https://www.biorxiv.org/content/early/2017/03/24/120378 ). I have yet to try out PAGA but have found PHATE working fairly well of showing the trajectory inference. (I am just a biologist, so don't know the specifics of comparing methodologies)

@jorvis
Copy link
Contributor Author

jorvis commented Oct 24, 2018

Thank you all for your feedback here - that was helpful. I'll close this so it doesn't look like an issue needs to be handled, but please, do continue any discussion.

@jorvis jorvis closed this as completed Oct 24, 2018
@falexwolf
Copy link
Member

@chlee-tabin Which method? PAGA? PAGA is for coarse-graining the data whereas PHATE is for embeddings, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants