Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
AssertionError: Tree consistency failed in TSNE #8992
TSNE learning on digit dataset throws error for perplexity=8
Steps/Code to Reproduce
from sklearn import manifold, datasets data = datasets.load_digits() X = data['data'] n_components = 2 tsne = manifold.TSNE(n_components=n_components, init='pca', perplexity=8, random_state=0, verbose=0) Y = tsne.fit_transform(X)
No error encountered for perplexity = 7 or 9, or any other values tried.
(Time to do randomised testing?)…
On 6 June 2017 at 21:41, Loïc Estève ***@***.***> wrote: Hmmm I can reproduce on master. Note this works fine on 0.18.1. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#8992 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz642RO8637Zj0wOi3h3YGE2NiJ_Y5ks5sBTsEgaJpZM4Nw4YF> .
blocker means we can't release a version without fixing this. I think that's an exaggeration for this issue, but it's something that really needs attention (in part because we don't understand its cause). Debugging this may be hard, but you've got at least two good clues: * It only stopped working recently so you may be able to find the commit that broke it. * It breaks as a function of perplexity You could try other values of perplexity, e.g. 1 to 50 just to check if it fails on another.…
On 8 Jun 2017 8:44 am, "Ramana Subramanyam" ***@***.***> wrote: I am not sure what the "blocker" tag is, but can i work on this ? If so, could someone please provide me an idea of where to start looking from? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#8992 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz63eM9MOkueKRkgGR1X857Xqvz5BKks5sByfNgaJpZM4Nw4YF> .
seeing as it's not been diagnosed fully and we have basically a rewrite of tsne in the works for 0.20, i think we have to say no. we do have the option of reverting any fixes since 0.17 so that at least it is consistently broken…
On 21 Jun 2017 2:19 am, "Andreas Mueller" ***@***.***> wrote: @jnothman <https://github.com/jnothman> do you think this is realistic to tackle for 0.19? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8992 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz65ChrBv9p3ShjNLORvH_v5qRdiNUks5sF_EJgaJpZM4Nw4YF> .
@tomMoral and I diagnosed that this was caused by the master implementation of the QuadTree datastructure. Contiguous cells (or tiles) did not always have exactly matching boundaries due to floating point rounding. For deep enough trees with small cells, you get a non-zero chance to insert a point in between consecutive tiles in the QuadTree...
The reimplementation of the QuadTree datastructure in #9032 does not have this issue anymore. The max boundary of a cell is exactly the min boundary of the following cell.