New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicting on new data #6

zachmayer opened this Issue Dec 2, 2015 · 3 comments


None yet
3 participants

zachmayer commented Dec 2, 2015

It looks like the scikit learn folks are considering an implementation of Barnes-Hut t-SNE that allows for predictions on new data. (They're implementing fit and transform methods, rather than a single fit_transform method).

Would it be possible to do that here, and add a predict method to Rtsne?


This comment has been minimized.


jkrijthe commented Dec 8, 2015

Thanks for the suggestion! I haven't had enough time to look into how hard it is to implement this yet, but at first glance it seems to me they keep the locations of the training set fixed and try to find good location in the embedding for the objects to be 'transformed'. I'm not sure I think this makes a lot of sense and it may be worthwhile to see how their discussion plays out, since they also currently do not seem to have a transform method in master. But if people think this type of transform makes sense it could be worthwhile to implement it.


This comment has been minimized.

zachmayer commented Dec 8, 2015

They HAD a transform method until very recently, but it looks like they just removed it. I'd really love to be able to try this out on new data, but it's probably best to see how their discussion plays out first.


This comment has been minimized.

dfalbel commented Nov 21, 2016

It's a FAQ question here:
And the answer is this:

Once I have a t-SNE map, how can I embed incoming test points in that map?

t-SNE learns a non-parametric mapping, which means that it does not learn an explicit function that maps data from the input space to the map. Therefore, it is not possible to embed test points in an existing map (although you could re-run t-SNE on the full dataset). A potential approach to deal with this would be to train a multivariate regressor to predict the map location from the input data. Alternatively, you could also make such a regressor minimize the t-SNE loss directly, which is what I did in this paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment