Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to separate highly separable data #4

Open
s-andrews opened this issue May 31, 2017 · 4 comments
Open

Failure to separate highly separable data #4

s-andrews opened this issue May 31, 2017 · 4 comments

Comments

@s-andrews
Copy link

I've been testing the latest CRAN version of this package on some data which should be highly separable and have been getting very poor results (basically no separation), so it looks like there's a bug which affects the ability to separate at least some datasets.

I've written up my tests at http://www.bioinformatics.babraham.ac.uk/tsne/ and have provided the data I used there too so you can replicate this. Others have also reported similar findings (links at at the end of my document) so I don't think it's just me.

@jdonaldson
Copy link
Owner

Thanks for the data, I'll look into it.

@jdonaldson
Copy link
Owner

jdonaldson commented May 31, 2017

It doesn't look like the tsne library is inferring the "type" of the transposed dataframe correctly for some reason. I'll pin that down.

One quick workaround is to pass in your transposed matrix with distances precalculated. This separates things as expected.

e.g.

tsne(dist(t(tsne.data)), perplexity = 5) -> tsne.result

rplot

@jdonaldson
Copy link
Owner

jdonaldson commented May 31, 2017

Also, regarding speed, the primary purpose of this library was an educational resource, with features for expressing progress and restartable convergence with pre-trained embeddings.

However, I'm currently working with another collaborator to implement the core logic in RCpp, and adding barnes hut-style techniques. This should bring it into performance parity with the other libraries (which are typically wrappers around the same cpp runtime).

@s-andrews
Copy link
Author

Thanks for the replies and the work round for this data. For our immediate purposes I think we'll shift over to Rtsne (just for the speed boost as much as anything) but it's good to know there's another viable alternative too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants