Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: Tree consistency failed in TSNE #8992

Closed
yanshuaicao opened this issue Jun 6, 2017 · 10 comments
Closed

AssertionError: Tree consistency failed in TSNE #8992

yanshuaicao opened this issue Jun 6, 2017 · 10 comments
Milestone

Comments

@yanshuaicao
Copy link

@yanshuaicao yanshuaicao commented Jun 6, 2017

Description

TSNE learning on digit dataset throws error for perplexity=8

Steps/Code to Reproduce

from sklearn import manifold, datasets
data = datasets.load_digits()
X = data['data']
n_components = 2
tsne = manifold.TSNE(n_components=n_components, init='pca', perplexity=8, 
                         random_state=0, verbose=0)
Y = tsne.fit_transform(X)

Expected Results

No error.

Actual Results

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-9-553b1868b3c8> in <module>()
     11 tsne = manifold.TSNE(n_components=n_components, init='pca', perplexity=8, 
     12                          random_state=0, verbose=0)
---> 13 Y = tsne.fit_transform(X)

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in fit_transform(self, X, y)
    895             Embedding of the training data in low-dimensional space.
    896         """
--> 897         embedding = self._fit(X)
    898         self.embedding_ = embedding
    899         return self.embedding_

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _fit(self, X, skip_num_points)
    792                           X_embedded=X_embedded,
    793                           neighbors=neighbors_nn,
--> 794                           skip_num_points=skip_num_points)
    795 
    796     @property

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _tsne(self, P, degrees_of_freedom, n_samples, random_state, X_embedded, neighbors, skip_num_points)
    868         opt_args['it'] = it + 1
    869         params, kl_divergence, it = _gradient_descent(obj_func, params,
--> 870                                                       **opt_args)
    871 
    872         if self.verbose:

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _gradient_descent(objective, p0, it, n_iter, objective_error, n_iter_check, n_iter_without_progress, momentum, learning_rate, min_gain, min_grad_norm, min_error_diff, verbose, args, kwargs)
    386 
    387     for i in range(it, n_iter):
--> 388         new_error, grad = objective(p, *args, **kwargs)
    389         grad_norm = linalg.norm(grad)
    390 

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _kl_divergence_bh(params, P, neighbors, degrees_of_freedom, n_samples, n_components, angle, skip_num_points, verbose)
    289     error = _barnes_hut_tsne.gradient(sP, X_embedded, neighbors,
    290                                       grad, angle, n_components, verbose,
--> 291                                       dof=degrees_of_freedom)
    292     c = 2.0 * (degrees_of_freedom + 1.0) / degrees_of_freedom
    293     grad = grad.ravel()

sklearn/manifold/_barnes_hut_tsne.pyx in sklearn.manifold._barnes_hut_tsne.gradient (sklearn/manifold/_barnes_hut_tsne.c:8155)()

AssertionError: Tree consistency failed: unexpected number of points=1796 at root node=1797

Versions

Linux-4.4.0-78-generic-x86_64-with-Ubuntu-16.04-xenial
('Python', '2.7.12 (default, Jul 1 2016, 15:12:24) \n[GCC 5.4.0 20160609]')
('NumPy', '1.12.1')
('SciPy', '0.18.1')
('Scikit-Learn', '0.19.dev0')

Note:

No error encountered for perplexity = 7 or 9, or any other values tried.

@amueller
Copy link
Member

@amueller amueller commented Jun 6, 2017

Thanks for the report. That's... odd... to say the least.

@lesteve
Copy link
Member

@lesteve lesteve commented Jun 6, 2017

Hmmm I can reproduce on master. Note this works fine on 0.18.1.

@jnothman
Copy link
Member

@jnothman jnothman commented Jun 6, 2017

@lesteve lesteve added the Sprint label Jun 6, 2017
@lesteve lesteve added this to the 0.19 milestone Jun 7, 2017
@lesteve lesteve added the Blocker label Jun 7, 2017
@Sentient07
Copy link
Contributor

@Sentient07 Sentient07 commented Jun 7, 2017

Hi, I am not sure what the "blocker" tag is, but can i work on this ? If so, could someone please provide me an idea of where to start looking from?
Thanks,

@jnothman
Copy link
Member

@jnothman jnothman commented Jun 7, 2017

@amueller
Copy link
Member

@amueller amueller commented Jun 20, 2017

@jnothman do you think this is realistic to tackle for 0.19?

@jnothman
Copy link
Member

@jnothman jnothman commented Jun 20, 2017

@ogrisel
Copy link
Member

@ogrisel ogrisel commented Jun 23, 2017

Actually I am pretty confident that #9032 fixes this issue and is not behaving correctly.

@ogrisel
Copy link
Member

@ogrisel ogrisel commented Jun 28, 2017

seeing as it's not been diagnosed fully

@tomMoral and I diagnosed that this was caused by the master implementation of the QuadTree datastructure. Contiguous cells (or tiles) did not always have exactly matching boundaries due to floating point rounding. For deep enough trees with small cells, you get a non-zero chance to insert a point in between consecutive tiles in the QuadTree...

The reimplementation of the QuadTree datastructure in #9032 does not have this issue anymore. The max boundary of a cell is exactly the min boundary of the following cell.

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 13, 2017

Should be fixed in #9032

@jnothman jnothman closed this Jul 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants