-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Occasional dramatic differences between tSNE and UMAP #319
Comments
This is normal, means that the far away clusters are "globally" more different from the cells that are closer together. UMAP is one way of preserving the global distance, whereas tSNE is pretty much ignorant of the global distance (so one should not consider global distance to make inferences from tSNE plot). I frequently see the UMAP when some very different contaminating cell types are in the sample. |
UMAP also has no meaning attached when clusters are completely disconnected (Supplemental Figure 10 of this, soon updated on here on bioRxiv and finally in a journal...); and I'd tend to think that this is such a case. Then, UMAP's parameters have to be adjusted (mostly It's true that UMAP has less tendency to tear apart connected things than tSNE. Overall, it's more faithful to the global topology. |
@falexwolf Just out of curiosity, have you compared your method with PHATE? (https://www.biorxiv.org/content/early/2017/03/24/120378 ). I have yet to try out PAGA but have found PHATE working fairly well of showing the trajectory inference. (I am just a biologist, so don't know the specifics of comparing methodologies) |
Thank you all for your feedback here - that was helpful. I'll close this so it doesn't look like an issue needs to be handled, but please, do continue any discussion. |
@chlee-tabin Which method? PAGA? PAGA is for coarse-graining the data whereas PHATE is for embeddings, right? |
On a test dataset I compute neighbors and then immediately compute/plot both tSNE and UMAP and show them next to each other. Sometimes, we get pretty dramatic differences such as the one attached. Is this an algorithmic difference or something wrong with my approach?
The text was updated successfully, but these errors were encountered: