New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: Tree consistency failed in TSNE #8992

Closed
yanshuaicao opened this Issue Jun 6, 2017 · 10 comments

Comments

Projects
None yet
6 participants
@yanshuaicao
Copy link

yanshuaicao commented Jun 6, 2017

Description

TSNE learning on digit dataset throws error for perplexity=8

Steps/Code to Reproduce

from sklearn import manifold, datasets
data = datasets.load_digits()
X = data['data']
n_components = 2
tsne = manifold.TSNE(n_components=n_components, init='pca', perplexity=8, 
                         random_state=0, verbose=0)
Y = tsne.fit_transform(X)

Expected Results

No error.

Actual Results

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-9-553b1868b3c8> in <module>()
     11 tsne = manifold.TSNE(n_components=n_components, init='pca', perplexity=8, 
     12                          random_state=0, verbose=0)
---> 13 Y = tsne.fit_transform(X)

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in fit_transform(self, X, y)
    895             Embedding of the training data in low-dimensional space.
    896         """
--> 897         embedding = self._fit(X)
    898         self.embedding_ = embedding
    899         return self.embedding_

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _fit(self, X, skip_num_points)
    792                           X_embedded=X_embedded,
    793                           neighbors=neighbors_nn,
--> 794                           skip_num_points=skip_num_points)
    795 
    796     @property

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _tsne(self, P, degrees_of_freedom, n_samples, random_state, X_embedded, neighbors, skip_num_points)
    868         opt_args['it'] = it + 1
    869         params, kl_divergence, it = _gradient_descent(obj_func, params,
--> 870                                                       **opt_args)
    871 
    872         if self.verbose:

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _gradient_descent(objective, p0, it, n_iter, objective_error, n_iter_check, n_iter_without_progress, momentum, learning_rate, min_gain, min_grad_norm, min_error_diff, verbose, args, kwargs)
    386 
    387     for i in range(it, n_iter):
--> 388         new_error, grad = objective(p, *args, **kwargs)
    389         grad_norm = linalg.norm(grad)
    390 

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _kl_divergence_bh(params, P, neighbors, degrees_of_freedom, n_samples, n_components, angle, skip_num_points, verbose)
    289     error = _barnes_hut_tsne.gradient(sP, X_embedded, neighbors,
    290                                       grad, angle, n_components, verbose,
--> 291                                       dof=degrees_of_freedom)
    292     c = 2.0 * (degrees_of_freedom + 1.0) / degrees_of_freedom
    293     grad = grad.ravel()

sklearn/manifold/_barnes_hut_tsne.pyx in sklearn.manifold._barnes_hut_tsne.gradient (sklearn/manifold/_barnes_hut_tsne.c:8155)()

AssertionError: Tree consistency failed: unexpected number of points=1796 at root node=1797

Versions

Linux-4.4.0-78-generic-x86_64-with-Ubuntu-16.04-xenial
('Python', '2.7.12 (default, Jul 1 2016, 15:12:24) \n[GCC 5.4.0 20160609]')
('NumPy', '1.12.1')
('SciPy', '0.18.1')
('Scikit-Learn', '0.19.dev0')

Note:

No error encountered for perplexity = 7 or 9, or any other values tried.

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Jun 6, 2017

Thanks for the report. That's... odd... to say the least.

@lesteve

This comment has been minimized.

Copy link
Member

lesteve commented Jun 6, 2017

Hmmm I can reproduce on master. Note this works fine on 0.18.1.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jun 6, 2017

@lesteve lesteve added the Sprint label Jun 6, 2017

@lesteve lesteve added this to the 0.19 milestone Jun 7, 2017

@lesteve lesteve added the Blocker label Jun 7, 2017

@Sentient07

This comment has been minimized.

Copy link
Contributor

Sentient07 commented Jun 7, 2017

Hi, I am not sure what the "blocker" tag is, but can i work on this ? If so, could someone please provide me an idea of where to start looking from?
Thanks,

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jun 7, 2017

@jnothman jnothman removed the Sprint label Jun 13, 2017

@jnothman jnothman referenced this issue Jun 13, 2017

Merged

[MRG+1] Reducing t-SNE memory usage #9032

5 of 5 tasks complete
@amueller

This comment has been minimized.

Copy link
Member

amueller commented Jun 20, 2017

@jnothman do you think this is realistic to tackle for 0.19?

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jun 20, 2017

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Jun 23, 2017

Actually I am pretty confident that #9032 fixes this issue and is not behaving correctly.

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Jun 28, 2017

seeing as it's not been diagnosed fully

@tomMoral and I diagnosed that this was caused by the master implementation of the QuadTree datastructure. Contiguous cells (or tiles) did not always have exactly matching boundaries due to floating point rounding. For deep enough trees with small cells, you get a non-zero chance to insert a point in between consecutive tiles in the QuadTree...

The reimplementation of the QuadTree datastructure in #9032 does not have this issue anymore. The max boundary of a cell is exactly the min boundary of the following cell.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jul 13, 2017

Should be fixed in #9032

@jnothman jnothman closed this Jul 13, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment