New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Cemoody/bhtsne Barnes-Hut t-SNE #4025

Closed
wants to merge 10 commits into
base: master
from

Conversation

Projects
None yet
@cemoody
Contributor

cemoody commented Dec 29, 2014

Introduction

This PR presents the Barnes-Hut implementation of t-SNE. t-SNE is used to visualize high-dimensional data in a low dimensional space that attempts preserve the pairwise high-dimensional similarities in a low-dimensional embedding. The Barnes-Hut algorithm, which is used by astrophysicists to perform N-body simulations, allows the calculation of the t-SNE embedding in O(NlogN) time instead of O(N^2).

This effectively allows us to learn embeddings of data sets with millions of elements instead of tens of thousands.

Example

An IPython Notebook is available where we've put together simple install instructions & a demo of both methods:
http://nbviewer.ipython.org/urls/gist.githubusercontent.com/cemoody/01135ef2f26837548360/raw/76ce7f0bac916a516501ead719b513b22430cad0/Barnes-Hut%20t-SNE%20Demo.ipynb

bhtsne

Performance

This compares the timings for the Barnes-Hut approximation against the current method (note the log y-axis).

timing

The following diagram shows the relative speedup between both versions:
speedup

TODO

  • Create fit & transform methods to update using new data
  • Add test for transform called before fit
  • Add test for skip_num_points
  • Raise error if method='barnes_hut' and using sparse data
  • Raise error if method='barnes_hut' and embedding dim > 3
  • Transform 64bit input floats to 32bit
  • Update the narrative documentation in doc/modules/manifold.rst
  • Add toy data to test perplexity
  • Fix failing tests
  • Expose tree consistency checks as unit tests
  • Fix memory leaks
  • Segfaults on data larger than 50k elements
  • PEP8 the code
  • Include usage documentation
  • Remove extra imports, print statements, pdb statements
  • Ensure python 3 works
  • Ensure output dimensionality works for 2D or 3D
  • Am I using memviews or returning full arrays appropriately?
  • Incorporate into SKLearn
  • Remove GIL as much as possible in Cython code
  • Add answer tests
  • Ensure old t-SNE tests works
  • Changed perplexity calculation to nearest neighbors
  • Changed positive gradient calc to nearest neighbors

Learn

  • You can read more about the technique in general here:

http://lvdmaaten.github.io/tsne/

  • The Barnes-Hut approximation is here:

http://lvdmaaten.github.io/publications/papers/JMLR_2014.pdf

@cemoody cemoody changed the title from Cemoody/bhtsne to Cemoody/bhtsne Barnes-Hut t-SNE Dec 29, 2014

# Implementation by Chris Moody & Nick Travers
# original code by Laurens van der Maaten
# for reference implemenations and papers describing the technique:
# http://homepage.tudelft.nl/19j49/t-SNE.html

This comment has been minimized.

@ogrisel

ogrisel Dec 29, 2014

Member

When you say: "original code" you mean that this implementation is a direct translation in Cython of the original C++ code?

If so this is a violation of the of the legal statement of the original implementation that says in the README:

You are free to use, modify, or redistribute this software in any way you want, but only for
non-commercial purposes. The use of the software is at your own risk; the authors are not
responsible for any damage as a result from errors in the software. 

scikit-learn is a BSD licensed project, it should only include compatibly licensed code. In particular the non-commercial exclusion is not compatible.

If the implementation in this PR is not a derivative work there is no problem. If it was based on the original implementation instead of reimplementing from the paper then we need to ask the original for authorization before being able to consider the inclusion of this contribution in scikit-learn.

This comment has been minimized.

@cemoody

cemoody Dec 30, 2014

Contributor

Initially I wrote it from scratch, following the paper. I used to do
astrocomputing, where I'd written oct-tree codes before (which are similar
to this Barnes-Hut approximation). I then compared results with the LvdM's
code and used those answers to debug my own. Where his code uses
Vantage-Point trees we use BallTree, and there are other major structural
differences. But I also changed the some of names of my methods and
variables to mirror his (for example compute_non_edge_forces or sum_Q)
for easier comparison & debugging. This PR is not a 'direct translation': I
did not take a chunk of C++ code and try to rewrite it in Cython. But at
the same time, I certainly read & understood LvdM's code and it helped
speed up progress on this one.

So I'm not sure if that means it's derivative or not. To be safe, I'll
contact LvdM and ask for authorization.

chris

On Mon, Dec 29, 2014 at 12:51 AM, Olivier Grisel notifications@github.com
wrote:

In sklearn/manifold/bhtsne.pyx
#4025 (diff)
:

+# distutils: extra_compile_args = -fopenmp
+# distutils: extra_link_args = -fopenmp
+# cython: boundscheck=False
+# cython: wraparound=False
+# cython: cdivision=True
+import numpy as np
+from cython.parallel cimport prange, parallel
+from libc.stdio cimport printf
+cimport numpy as np
+cimport cython
+cimport openmp
+
+# Implementation by Chris Moody & Nick Travers
+# original code by Laurens van der Maaten
+# for reference implemenations and papers describing the technique:
+# http://homepage.tudelft.nl/19j49/t-SNE.html

When you say: "original code" you mean that this implementation is a
direct translation in Cython of the original C++ code?

If so this is a violation of the of the legal statement of the original
implementation that says in the README:

You are free to use, modify, or redistribute this software in any way you want, but only for non-commercial purposes. The use of the software is at your own risk; the authors are not responsible for any damage as a result from errors in the software.

scikit-learn is a BSD licensed project, it should only include compatibly
licensed code. In particular the non-commercial exclusion is not compatible.

If the implementation in this PR is not a derivative work there is no
problem. If it was based on the original implementation instead of
reimplementing from the paper they we need to ask the original for
authorization before being able to consider the inclusion of this
contribution in scikit-learn.


Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/4025/files#r22305672.

This comment has been minimized.

@cemoody

cemoody Dec 30, 2014

Contributor

I contacted LvdM and he's reminded everyone that all of his software actually is released under the FreeBSD license (as stated here)! It's not explicitly referenced in the source code, so I've asked him to make it a bit more clear, but I think this means we're OK to include this code from a legal standpoint.

This comment has been minimized.

@ogrisel

ogrisel Dec 30, 2014

Member

all of his software actually is released under the FreeBSD license

Indeed this is not clear for the Barnes-Hut t-SNE as the README of the distribution has a non-commercial clause that directly contradicts the BSD license.

But great then. Thanks for having checked with LvdM.

reduces the computational complexity from O(N^2) to O(uN). See the exact
method '_binary_search_perplexity' for more details.

This comment has been minimized.

@ogrisel

ogrisel Dec 30, 2014

Member

style: only one blank line here.

int verbose):
"""Binary search for sigmas of conditional Gaussians. This approximation
reduces the computational complexity from O(N^2) to O(uN). See the exact
method '_binary_search_perplexity' for more details.

This comment has been minimized.

@ogrisel

ogrisel Dec 30, 2014

Member

style: please follow pep 257:

    """Binary search for sigmas of conditional Gaussians.

    This approximation reduces the computational complexity from O(N^2) to O(uN).
    See the exact method '_binary_search_perplexity' for more details.

    ...
    """"

This comment has been minimized.

@ogrisel

ogrisel Dec 30, 2014

Member

Also the docstring of a public method should not point to a private method for information as the docstring of the private method is not extracted to build the online API documentation.

This comment has been minimized.

@ogrisel

ogrisel Dec 30, 2014

Member

My bad this is a private method as well. Then reference is alright.

@ogrisel ogrisel changed the title from Cemoody/bhtsne Barnes-Hut t-SNE to [WIP] Cemoody/bhtsne Barnes-Hut t-SNE Dec 30, 2014

@ogrisel

This comment has been minimized.

Member

ogrisel commented Dec 30, 2014

I have change the header to [WIP] (work in progress). Please don't forget to change it to [MRG] (meaning ready for merge review) to attract the attention of more reviewers once your are done with your to-do list.

@ogrisel

This comment has been minimized.

Member

ogrisel commented Dec 30, 2014

The narrative documentation will also need an update: http://scikit-learn.org/dev/modules/manifold.html#t-distributed-stochastic-neighbor-embedding-t-sne . The source file is in doc/modules/manifold.rst. I will add a new todo item.

@ogrisel

This comment has been minimized.

Member

ogrisel commented Dec 30, 2014

The openmp flag cannot be enabled by default as it will not be build-able with clang (which is the default compiler under OSX) and probably not under old versions of Microsoft Visual C++ required to build Python extensions for Python 2.7 under Windows. So if we really want to use openmp we should at least have a something in setup.py to check that the compiler supports it and turn it of if it is not supported.

Also the openmp implementation in GCC (libgomp) will crash whenever it's use in conjunction with Python multiprocessing (that does fork without exec under posix by default). There is a patch to make libgomp robust to forks but it has not been reviewed yet. So unless openmp brings a significant speedup vs the serial execution, I would just remove parallel support altogether. If we decide to keep it, it should be turned off by default and only enabled with a specific parameters (such as n_jobs commonly used in the rest of the library, despite the bad name).

@ogrisel

This comment has been minimized.

Member

ogrisel commented Dec 30, 2014

Do the tests pass on your box? On travis there is a bunch of direct errors (e.g. missing attributes) that make me thing that you have not pushed all the code your run locally (maybe the generated C files are not up to date?).

@ogrisel

This comment has been minimized.

Member

ogrisel commented Dec 30, 2014

I would also rename bhtsne.pyx to _barnes_hut_tsne.py for discoverability.

grad_output = np.array([[5.81128448e-05, -7.78033454e-06],
[-5.81526851e-05, 7.80976444e-06],
[4.24275173e-08, -3.69569698e-08],
[-2.58720939e-09, 7.52706374e-09]])

This comment has been minimized.

@ogrisel

ogrisel Dec 30, 2014

Member

How those numerical values were generated? Using the reference implementation? In any case it should be made explicit in a comment.

to be better than 3%. However, the exact method cannot scale to
millions of examples.
theta : float (default: 0.5)

This comment has been minimized.

@ogrisel

ogrisel Dec 30, 2014

Member

In general we try to avoid introducing new parameter names that are Greek variable from the paper notation when we can find a more descriptive plain English name. Maybe it could be named approximation instead. In the explanation you can add that this parameter is named "theta" in the original paper.

@ogrisel

This comment has been minimized.

Member

ogrisel commented Dec 30, 2014

Could you add some test on some toy data that compares the perplexity of the 2 methods and check that they are approximately similar?

@GaelVaroquaux

This comment has been minimized.

Member

GaelVaroquaux commented Jan 5, 2015

Hey Christopher,

Thanks a lot for contributing this. I haven't had time to review it, but
it is exciting! I think that the Barnes-Hut t-SNE algorithm is very
important to us to scale up.

@cemoody

This comment has been minimized.

Contributor

cemoody commented Jan 5, 2015

Thanks @GaelVaroquaux and @ogrisel for your comments! @nicktrav and I are slowly working through our TODO list and all of @ogrisel's wonderful sugestions (thank you!) -- we're hoping to be ready for a fuller review in the coming week or two.

Uses Barnes-Hut tree methods to calculate the gradient that
runs in O(NlogN) instead of O(N^2)
Parameters

This comment has been minimized.

@AlexanderFabisch

AlexanderFabisch Jan 5, 2015

Member

the documentation of neighbors and verbose is missing here, too

else:
distances = pairwise_distances(X, metric=self.metric)
assert np.all(distances >= 0), "All distances should be positive"

This comment has been minimized.

@AlexanderFabisch

AlexanderFabisch Jan 5, 2015

Member

you should probably raise a ValueError in this case, either the metric or the distances given as X are not correct

@cemoody

This comment has been minimized.

Contributor

cemoody commented Jan 6, 2015

Hi @GaelVaroquaux, @ogrisel & @AlexanderFabisch thanks for the comments! 11 hours ago I thought this was going to take a week or two, but I believe we're now ready for review :) I'll try to respond to everyone's comments all at once.

  • We've removed all OpenMP requirements -- the overhead of launching threads was bigger than the tradeoff, even for large datasets with 10k elements. Once we've figured out building scikit-learn with OpenMP we can revisit this question, but let's table the issue for the time being.
  • We've modified the narrative docs to reflect the new performance and a extra parameters. The change is minimal since the algorithm performs more or less the same, just faster.
  • Added toy data testing & comparing the perplexity of the 2 methods.

The rest of the changes we've immediately incorporated. Tests should now pass too, which I think that means this is ready for merge review, so I've changed the title to [MRG].

cc @nicktrav

@cemoody cemoody changed the title from [WIP] Cemoody/bhtsne Barnes-Hut t-SNE to [MRG] Cemoody/bhtsne Barnes-Hut t-SNE Jan 6, 2015

@@ -0,0 +1,3 @@
#define DEBUGFLAG 0

This comment has been minimized.

@ogrisel

ogrisel Jan 6, 2015

Member

Is it really necessary to introduce a new file just for this? As _debugvars.h is only included in _barnes_hut_tsne.pyx I would rather put a an inline definition of that constant. AFAIK there is no Cython keyword to default a const variable but maybe a cdef enum would do. It needs to be bench to make sure that it does not cause a performance regression but I think it should be fine.

This comment has been minimized.

@cemoody

cemoody Jan 6, 2015

Contributor

Yeah, including an extra file just for compile-time debug flags is gnarly. Didn't think about enums before now, but using an enum is a much more elegant solution, and performance-wise seems to make no (significant) impact. Thanks -- I've made this change.

Standard t-SNE that has been implemented here is usually much slower than
The Barnes-Hut t-SNE that has been implemented here is usually much slower than

This comment has been minimized.

@AlexanderFabisch

AlexanderFabisch Jan 6, 2015

Member

Is it really still slower than other manifold learning algorithms?

This comment has been minimized.

@cemoody

cemoody Jan 6, 2015

Contributor

Yeah, at least for the toy datasets: the speed up is only ~2-3x on the 2000 digits examples, which still makes it the slowest. Perhaps it's competitive with the next-slowest, MDS?

This comment has been minimized.

@ogrisel

ogrisel Jan 14, 2015

Member

On my box, TSNE (with the default Barnes Hut approximation) is always slightly faster than our implementation of MDS in all the examples/manifold scripts. Still it's slower when compared to other manifold methods.

This comment has been minimized.

@ogrisel

ogrisel Jan 14, 2015

Member

I would replace "is usually much slower than" by "can be significantly slower than".

@@ -48,15 +52,55 @@ def _joint_probabilities(distances, desired_perplexity, verbose):
# Compute conditional probabilities such that they approximately match
# the desired perplexity
conditional_P = _utils._binary_search_perplexity(
distances, desired_perplexity, verbose)
distances.astype(np.float), desired_perplexity, verbose)

This comment has been minimized.

@ogrisel

ogrisel Jan 6, 2015

Member

This will trigger a memory copy even if distances already has the right type. Please use sklearn.utils.fixes.astype and use it with astype(distances, np.float, copy=False). Also I would rather use an explicit precision with either np.float32 or np.float64 or better a dtype that reflect the precision of the input data instead of an arbitrary yet platform dependent np.float dtype.

This comment has been minimized.

@ogrisel

ogrisel Jan 6, 2015

Member

Maybe sklearn.utils.validation.as_float_array might be what you want here actually.

@@ -477,29 +758,54 @@ def _tsne(self, P, alpha, n_samples, random_state, X_embedded=None):
self.n_components)
params = X_embedded.ravel()
kwargs = {}

This comment has been minimized.

@AlexanderFabisch

AlexanderFabisch Jan 6, 2015

Member

I would write

opt_args = {"n_iter": 50, "momentum": 0.5, "learning_rate": self.learning_rate, "verbose": self.verbose,
            "it": 0, "n_iter_check": 25, "kwargs": dict(skip_num_points=skip_num_points)}
gradient function
alpha : float
Degrees of freedom of the Student's-t distribution.

This comment has been minimized.

@ogrisel

ogrisel Jan 6, 2015

Member

I think it should be named degrees_of_freedom to make the code more readable but as this is private API this is no big deal.

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Jan 6, 2015

Member

I think it should be named degrees_of_freedom to make the code more readable

+1

but as this is private API this is no big deal.

Still, it would make the code more readable.

method : string (default: 'barnes_hut')
By default the gradient calculation algorithm uses Barnes-Hut
approximation running in O(NlogN) time. method='standard' will use
will run on the slower, but exact, algorithm in O(N^2) time. The

This comment has been minimized.

@AlexanderFabisch

AlexanderFabisch Jan 6, 2015

Member

will use or will run?

raise ValueError("'angle' must be between 0.0 - 1.0")
if self.method == 'barnes_hut' and self.n_components > 3:
raise ValueError("method='barnes_hut' only available for "
"n_components < 4.")

This comment has been minimized.

@AlexanderFabisch

AlexanderFabisch Jan 6, 2015

Member

You should mention that in the docstring of the constructor.

@AlexanderFabisch

This comment has been minimized.

Member

AlexanderFabisch commented Jan 6, 2015

Could you add the performance comparison to examples/manifold and make sure that it does not take much longer than 10^1 seconds (n_samples=1000-2000)?

@@ -502,6 +502,7 @@ three parameters that control the optimization of t-SNE:
* early exaggeration factor
* learning rate
* maximum number of iterations
* angle

This comment has been minimized.

@ogrisel

ogrisel Jan 6, 2015

Member

The paragraph above needs an update: "there are three" => "there are four".

This comment has been minimized.

@AlexanderFabisch

AlexanderFabisch Jan 6, 2015

Member

Actually, angle does not really influence the optimization. It controls the approximation of the gradient. I would remove it completely and explain it in another paragraph.

This comment has been minimized.

@ogrisel

ogrisel Jan 6, 2015

Member

It is still an approximation of the true cost: while the default value should of 0.5 or lower should not impact the results much, setting it to a high value should impact the quality of the embedding.

This comment has been minimized.

@ogrisel

ogrisel Jan 6, 2015

Member

Maybe "optimization of t-SNE" should be rephrased "optimization of t-SNE and therefore possibly the quality of the resulting embedding".

This comment has been minimized.

@cemoody

cemoody Jan 7, 2015

Contributor

Yep, agreed that high values of angle should negatively impact the quality of the embedding, and so is as relevant as the learning rate or number of iterations.

[0]])
grad_output = np.array([[-2.37012478e-05, -6.29044398e-05],
[2.37012478e-05, 6.29044398e-05]])
yield _run_answer_test, pos_input, pos_output, neighbors, grad_output

This comment has been minimized.

@AlexanderFabisch

AlexanderFabisch Jan 6, 2015

Member

I think this is more difficult to understand than _run_answer_test(pos_input, pos_output, neighbors, grad_output) (in all three cases)

This comment has been minimized.

@ogrisel

ogrisel Jan 6, 2015

Member

+1 for unfolding this test and getting rid of the yield expression.

def test_no_4D_on_barnes_hut():
"""No sparse matrices allowed on Barnes-Hut"""

This comment has been minimized.

@AlexanderFabisch

AlexanderFabisch Jan 6, 2015

Member

This description is not correct

This comment has been minimized.

@ogrisel

ogrisel Jan 6, 2015

Member

The docstring does not reflect the test. As the test method name is explicit there is no need for a docstring anyway.

@ogrisel

This comment has been minimized.

Member

ogrisel commented Jun 24, 2015

I can reproduce your compile error with gcc 4.9.2-10ubuntu13 from ubuntu 15.04. I looks like a Cython bug: when calling printf("some string\n") in Cython, it creates a local char* typed variable that is not a const and that causes the -Werror=security-format to cause a compiler error with gcc 4.9.

When calling printf("some string with a formatted value: %d\n", i) then the generated C code has a const char* as first argument and everything is fine.

@ogrisel

This comment has been minimized.

Member

ogrisel commented Jun 24, 2015

The following hack seems to work:

diff --git a/sklearn/manifold/_barnes_hut_tsne.pyx b/sklearn/manifold/_barnes_hut_tsne.pyx
index a5579b5..3978c80 100644
--- a/sklearn/manifold/_barnes_hut_tsne.pyx
+++ b/sklearn/manifold/_barnes_hut_tsne.pyx
@@ -14,6 +14,8 @@ from libc.math cimport sqrt, log
 cimport numpy as np
 import numpy as np

+cdef char* EMPTY_STRING = ""
+
 cdef extern from "math.h":
     float fabsf(float x) nogil

@@ -751,13 +753,19 @@ def gradient(float[:,:] pij_input,
     err = insert_many(qt, pos_output)
     assert err == 0, "[t-SNE] Insertion failed"
     if verbose > 10:
-        printf("[t-SNE] Computing gradient\n")
+        # XXX: format hack to workaround lack of `const char *` type
+        # in the generated C code that triggers error with gcc 4.9
+        # and -Werror=format-security
+        printf("[t-SNE] Computing gradient\n%s", EMPTY_STRING)
     sum_Q = compute_gradient(pij_input, pos_output, neighbors, forces,
                              qt.root_node, theta, dof, skip_num_points, -1)
     C = compute_error(pij_input, pos_output, neighbors, sum_Q, n_dimensions,
                       verbose)
     if verbose > 10:
-        printf("[t-SNE] Checking tree consistency \n")
+        # XXX: format hack to workaround lack of `const char *` type
+        # in the generated C code
+        # and -Werror=format-security
+        printf("[t-SNE] Checking tree consistency\n%s", EMPTY_STRING)
     cdef long count = count_points(qt.root_node, 0)
     m = ("Tree consistency failed: unexpected number of points=%i "
          "at root node=%i" % (count, qt.root_node.cumulative_size))
@ogrisel

This comment has been minimized.

Member

ogrisel commented Jun 24, 2015

There are other occurrences of the same issue in the file. The above patch does not fix them all.

@cemoody

This comment has been minimized.

Contributor

cemoody commented Jun 26, 2015

Alas, I was overly optimistic in how much time I'd have to work on this. Can we make a fresh TODO list of what this PR needs?

@amueller

This comment has been minimized.

Member

amueller commented Jul 1, 2015

@cemoody welcome to the club ;)
Maybe reset it to #4887 and fix the errors.

@ogrisel is the cython bug fixed in newer cython? Should we report?

I think having a sparse matrix variant would be nice but I would be happy to merge without it.

@amueller

This comment has been minimized.

Member

amueller commented Aug 3, 2015

I added @ogrisel's fix to #4887. However, the transform method can't work the way it is implemented now. I haven't really looked at it, but it currently assumes that the input dimension is the same as the output dimensions somehow. It concatenates the embedding with the new data. I think it should just concatenate the training data with the new data, right? But then the tests should be tighter.

@amueller

This comment has been minimized.

Member

amueller commented Aug 3, 2015

Fun api issue: calling transform will change _X_fit, so calling transform changes again afterwards will change the results...

@amueller

This comment has been minimized.

Member

amueller commented Aug 3, 2015

Also, calling transform changes the embedding of the training data. I thought that was supposed to be avoided using skip_num_points.
transform also returns the transform of the training data plus the additional data, which breaks api.
Maybe we should try to merge this without the transform? Or does someone have time to look into it?

@zachmayer

This comment has been minimized.

zachmayer commented Oct 29, 2015

Is this project still in progress? I'd really love to be able to use this new algorithm.

@amueller

This comment has been minimized.

Member

amueller commented Oct 29, 2015

@zachmayer the algorithm is in master and will be in the imminent release. It uses dense data structures, though, so the memory requirements will be the same as before. There is no-one working on using sparse data structures atm afaik.

@zachmayer

This comment has been minimized.

zachmayer commented Oct 29, 2015

I can live with that! Awesome!

@cemoody

This comment has been minimized.

Contributor

cemoody commented Nov 5, 2015

Oh, I didn't realize it was in master! That's super awesome & I'm incredibly excited!!! Thanks @amueller for all of your hard work :)

@zachmayer

This comment has been minimized.

zachmayer commented Nov 5, 2015

@amueller Can you link to the PR/commit that puts it in master? Thanks!

@cemoody

This comment has been minimized.

Contributor

cemoody commented Nov 5, 2015

@zachmayer I think it's at this commit: fe09a07

@amueller

This comment has been minimized.

Member

amueller commented Nov 5, 2015

This one: #4887

I left your PR open, as I removed the "transform" method. It changed the training data, which is not really allowed. I think that might not have been what you intended, was it?

@jnothman

This comment has been minimized.

Member

jnothman commented Mar 11, 2016

I assume this should be closed?

@jnothman

This comment has been minimized.

Member

jnothman commented Mar 11, 2016

It can be reopened if that's appropriate.

@jnothman jnothman closed this Mar 11, 2016

@jamiis

This comment has been minimized.

jamiis commented May 26, 2016

@cemoody @amueller is there still ongoing work to optimize memory consumption occurring in calculating the pairwise distances? I'm getting a MemoryError on a 75,000x50 dataset.

@ogrisel

This comment has been minimized.

Member

ogrisel commented May 27, 2016

@jamiis AFAIK nobody is working on it at the moment. Do you want to give it a try?

@jlorince

This comment has been minimized.

jlorince commented Jun 1, 2016

In principle, shouldn't there be a way to avoid calculating the full distance matrix? The original author of t-SNE says "Barnes-Hut algorithms are intended to operate on large datasets for which you cannot keep such distance matrices in memory" (see lvdmaaten/bhtsne#7), so I'm wondering if the current approach in the sklearn version (starting by calculating the full distance matrix) is inherently flawed. Though honestly I'm not sure what the correct alternative is....

@AlexanderFabisch

This comment has been minimized.

Member

AlexanderFabisch commented Jun 1, 2016

I think the best way to implement it is a sparse distance matrix where only the relevant entries are included. However, it is a little bit complicated to handle all special cases correctly, e.g. precomputed distance matrices.

@ghost

This comment has been minimized.

ghost commented Jul 26, 2016

Any progress? I'm unable to find an implementation that lets you both work on large datasets and vary the number of iterations.

@amueller

This comment has been minimized.

Member

amueller commented Jul 26, 2016

Moving the discussion to #7089. PR welcome ;)

@tomMoral tomMoral referenced this pull request Jun 7, 2017

Merged

[MRG+1] Reducing t-SNE memory usage #9032

5 of 5 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment