Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: Tree consistency failed in TSNE #8992

yanshuaicao opened this issue Jun 6, 2017 · 10 comments

AssertionError: Tree consistency failed in TSNE #8992

yanshuaicao opened this issue Jun 6, 2017 · 10 comments


Copy link

@yanshuaicao yanshuaicao commented Jun 6, 2017


TSNE learning on digit dataset throws error for perplexity=8

Steps/Code to Reproduce

from sklearn import manifold, datasets
data = datasets.load_digits()
X = data['data']
n_components = 2
tsne = manifold.TSNE(n_components=n_components, init='pca', perplexity=8, 
                         random_state=0, verbose=0)
Y = tsne.fit_transform(X)

Expected Results

No error.

Actual Results

AssertionError                            Traceback (most recent call last)
<ipython-input-9-553b1868b3c8> in <module>()
     11 tsne = manifold.TSNE(n_components=n_components, init='pca', perplexity=8, 
     12                          random_state=0, verbose=0)
---> 13 Y = tsne.fit_transform(X)

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in fit_transform(self, X, y)
    895             Embedding of the training data in low-dimensional space.
    896         """
--> 897         embedding = self._fit(X)
    898         self.embedding_ = embedding
    899         return self.embedding_

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _fit(self, X, skip_num_points)
    792                           X_embedded=X_embedded,
    793                           neighbors=neighbors_nn,
--> 794                           skip_num_points=skip_num_points)
    796     @property

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _tsne(self, P, degrees_of_freedom, n_samples, random_state, X_embedded, neighbors, skip_num_points)
    868         opt_args['it'] = it + 1
    869         params, kl_divergence, it = _gradient_descent(obj_func, params,
--> 870                                                       **opt_args)
    872         if self.verbose:

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _gradient_descent(objective, p0, it, n_iter, objective_error, n_iter_check, n_iter_without_progress, momentum, learning_rate, min_gain, min_grad_norm, min_error_diff, verbose, args, kwargs)
    387     for i in range(it, n_iter):
--> 388         new_error, grad = objective(p, *args, **kwargs)
    389         grad_norm = linalg.norm(grad)

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _kl_divergence_bh(params, P, neighbors, degrees_of_freedom, n_samples, n_components, angle, skip_num_points, verbose)
    289     error = _barnes_hut_tsne.gradient(sP, X_embedded, neighbors,
    290                                       grad, angle, n_components, verbose,
--> 291                                       dof=degrees_of_freedom)
    292     c = 2.0 * (degrees_of_freedom + 1.0) / degrees_of_freedom
    293     grad = grad.ravel()

sklearn/manifold/_barnes_hut_tsne.pyx in sklearn.manifold._barnes_hut_tsne.gradient (sklearn/manifold/_barnes_hut_tsne.c:8155)()

AssertionError: Tree consistency failed: unexpected number of points=1796 at root node=1797


('Python', '2.7.12 (default, Jul 1 2016, 15:12:24) \n[GCC 5.4.0 20160609]')
('NumPy', '1.12.1')
('SciPy', '0.18.1')
('Scikit-Learn', '0.19.dev0')


No error encountered for perplexity = 7 or 9, or any other values tried.

Copy link

@amueller amueller commented Jun 6, 2017

Thanks for the report. That's... odd... to say the least.

Copy link

@lesteve lesteve commented Jun 6, 2017

Hmmm I can reproduce on master. Note this works fine on 0.18.1.

Copy link

@jnothman jnothman commented Jun 6, 2017

@lesteve lesteve added the Sprint label Jun 6, 2017
@lesteve lesteve added this to the 0.19 milestone Jun 7, 2017
@lesteve lesteve added the Blocker label Jun 7, 2017
Copy link

@Sentient07 Sentient07 commented Jun 7, 2017

Hi, I am not sure what the "blocker" tag is, but can i work on this ? If so, could someone please provide me an idea of where to start looking from?

Copy link

@jnothman jnothman commented Jun 7, 2017

Copy link

@amueller amueller commented Jun 20, 2017

@jnothman do you think this is realistic to tackle for 0.19?

Copy link

@jnothman jnothman commented Jun 20, 2017

Copy link

@ogrisel ogrisel commented Jun 23, 2017

Actually I am pretty confident that #9032 fixes this issue and is not behaving correctly.

Copy link

@ogrisel ogrisel commented Jun 28, 2017

seeing as it's not been diagnosed fully

@tomMoral and I diagnosed that this was caused by the master implementation of the QuadTree datastructure. Contiguous cells (or tiles) did not always have exactly matching boundaries due to floating point rounding. For deep enough trees with small cells, you get a non-zero chance to insert a point in between consecutive tiles in the QuadTree...

The reimplementation of the QuadTree datastructure in #9032 does not have this issue anymore. The max boundary of a cell is exactly the min boundary of the following cell.

Copy link

@jnothman jnothman commented Jul 13, 2017

Should be fixed in #9032

@jnothman jnothman closed this Jul 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants