RecursionError #99

sametdumankaya · 2018-08-02T07:59:04Z

Hi, thanks for the great package.

I have a dataset which has 200000 rows and 15 columns. I tried to apply UMAP as following

embedding = umap.UMAP(n_neighbors=5, min_dist=0.3, metric='correlation').fit_transform(data)

After 10 seconds, I got following exceptions :

RecursionError: maximum recursion depth exceeded while calling a Python object
return make_angular_tree(data, indices, rng_state, leaf_size)
SystemError: CPUDispatcher(<function angular_random_projection_split at 0x000001C8260D6378>)
returned a result with an error set

I set the system recursion limit to 10000 as below and tried again but then python exited with a code like -143537645 meaning exited with error.

sys.setrecursionlimit(10000)

Is there any solution, workaround or anything I can do for this problem?

Thanks.

The text was updated successfully, but these errors were encountered:

lmcinnes · 2018-08-02T16:52:15Z

This is an odd issue that I never fully tracked down -- it seems to depend on an odd data distribution (often involving duplicate points). What is happening is that the random projection tree recursively splits the data into smaller and smaller pieces. Apparently we hit the recursion limit. In practice we should expect the data to be split approximately in half each time, so the tree depth should be expected to be around log_2(200000) ~ 18. Somehow, instead we have a tree depth that has exceeded 10000, so the splitting is working very strangely. One potential solution is to add a small amount of noise to the data (smaller than the smallest distances between non-identical samples). This may work around the problem for you.

…

On Thu, Aug 2, 2018 at 3:59 AM Samet Dumankaya ***@***.***> wrote: Hi, thanks for the great package. I have a dataset which has 200000 rows and 15 columns. I tried to apply UMAP as following embedding = umap.UMAP(n_neighbors=5, min_dist=0.3, metric='correlation').fit_transform(data) After 10 seconds, I got following exceptions : - RecursionError: maximum recursion depth exceeded while calling a Python object - return make_angular_tree(data, indices, rng_state, leaf_size) SystemError: CPUDispatcher(<function angular_random_projection_split at 0x000001C8260D6378>) returned a result with an error set I set the system recursion limit to 10000 as below and tried again but then python exited with a code like -143537645 meaning exited with error. sys.setrecursionlimit(10000) Is there any solution, workaround or anything I can do for this problem? Thanks. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#99>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALaKBS-V8cELGXnEGgpvSLI5AnpHLlZcks5uMrFLgaJpZM4Vrz0B> .

diego-vicente · 2018-08-27T08:23:45Z

I just ran into the same issue, and indeed removing duplicate points did solved the problem. I used a simple workaround, but I am not sure if it is justified to do so or it actually affects the method: can we safely remove the duplicates, embed the unique points and then reconstruct the original array by re-duplicating the embeddings?

My intuitition of the technique is that if N points are exactly the same we could safely work with only one of them and map all of them to the result, but I would like confirmation from someone who actually understands all the math involved. If that assumption is correct, I guess I can start working in a pull request adding the fix.

lmcinnes · 2018-08-27T13:22:58Z

It is not necessarily ideal. I really need to understand better how duplicate points are resulting in a recursion error to be honest. My understanding is that they should not, but perhaps I have missed a corner case in the implementation somewhere.

…

On Mon, Aug 27, 2018 at 4:23 AM Diego Vicente ***@***.***> wrote: I just ran into the same issue, and indeed removing duplicate points did solved the problem. I used a simple workaround, but I am not sure if it is justified to do so or it actually affects the method: can we safely remove the duplicates, embed the unique points and then reconstruct the original array by re-duplicating the embeddings? My intuitition of the technique is that if N points are exactly the same we could safely work with only one of them and map all of them to the result, but I would like confirmation from someone who actually understands all the math involved. If that assumption is correct, I guess I can start working in a pull request adding the fix. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#99 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALaKBc3AGRq8Su2qvfo5xiS1OTNMV_MHks5uU6yagaJpZM4Vrz0B> .

madkoppa · 2018-09-04T16:07:21Z

I'm also having this problem. Adding an uncomfortably large amount of noise as a hack also works for me

lmcinnes · 2018-09-04T19:46:34Z

I think I really need a way to avoid the trees far "bad" data, although I am not sure that NN-descent will do much better with it. Perhaps I'll try to add a max depth to the trees and just accept whatever they give at a certain depth. That might be uncomfortable computationally, but such is life.

…

On Tue, Sep 4, 2018 at 12:07 PM fistR ***@***.***> wrote: I'm also having this problem. Adding an uncomfortably large amount of noise as a hack also works for me — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#99 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALaKBauNf4b4sDVeOvXkxw6l--SXy31Yks5uXqVLgaJpZM4Vrz0B> .

madkoppa · 2018-09-05T16:51:13Z

Just an update, it seems that when it complains about recursion depth but does not crash out, the result gets funky. It is apparent in visualisation purposes. Im trying to track down what the problem vectors are as the amount of noise I need to add for it to work ruins the end result too

lmcinnes · 2018-09-05T17:37:27Z

I would be very interested to know a set of vectors that specifically cause this problem. It's been a difficult issue to debug, or even to understand to be honest. If you can use the messed up embedding to extract out the actual causal vectors that would be a significant step and I would greatly appreciate such information! Thanks for your patience and persistence in working through this.

…

On Wed, Sep 5, 2018 at 12:51 PM fistR ***@***.***> wrote: Just an update, it seems that when it complains about recursion depth but does not crash out, the result gets funky. It is apparent in visualisation purposes. Im trying to track down what the problem vectors are as the amount of noise I need to add for it to work ruins the end result too — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#99 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALaKBVdeRdvjaVnbGNSD7nhUx6_vXGzPks5uYAEEgaJpZM4Vrz0B> .

yizhouyan · 2018-11-16T18:49:20Z

I encountered the same problem with 17.5M data points.

I used euclidean distance and scaled the data during preprocessing...
Here is the code:
X_scaled = preprocessing.scale(X)
fit = umap.UMAP(n_neighbors=20, min_dist=0.01, metric='euclidean', init='random',n_epochs=500)
The same parameter setting works on the sampling data with 100K data points.

Does anyone have any ideas about this?

Thanks!

lmcinnes · 2018-11-19T16:29:23Z

That's a lot of data points! I would check for duplicate data as that may cause some issues for the random projection trees. Beyond that I'm not entirely sure - -I've never been able to reliably reproduce the error to be able to properly track down the issue.

…

On Fri, Nov 16, 2018 at 1:49 PM Yizhou ***@***.***> wrote: I encountered the same problem with 17.5M data points. I used euclidean distance and scaled the data during preprocessing... Here is the code: X_scaled = preprocessing.scale(X) fit = umap.UMAP(n_neighbors=20, min_dist=0.01, metric='euclidean', init='random',n_epochs=500) The same parameter setting works on the sampling data with 100K data points. Does anyone have any ideas about this? Thanks! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#99 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALaKBRv0aiN48pvT2esx_BqBw-bjmYqJks5uvwizgaJpZM4Vrz0B> .

dsaxton · 2018-11-20T20:14:24Z

I was receiving the same error and adding a small amount of noise did help. Thanks!

scharron · 2018-12-11T15:36:49Z

@lmcinnes I attached you a 5000x100 numpy array that always makes UMAP crashes.
I'm reproducing the error with :
umap = UMAP(metric="cosine", n_components=2, min_dist=0.3).fit_transform(data)

data.npy.zip

lmcinnes · 2018-12-11T15:43:55Z

Thanks @scharron, reproducing examples are very useful. I'll try to look ta this when I get a little more time.

thommiano · 2019-03-01T23:31:16Z

Experiencing the same issue. Removing duplicates solved my problem, and was fine for my visualization purposes.

lucidyan · 2019-03-26T00:02:05Z

Same issue here

lmcinnes · 2019-03-26T02:00:18Z

I think some fixes to other issues may actually resolve this, so I'll try to roll out a patch release in the next few days that will hopefully solve this.

mkarbo · 2019-04-10T15:40:27Z

Did you roll out your patch yet?

lmcinnes · 2019-04-10T17:57:10Z

I should have come in the last patch release -- if that didn't fix things then there may be yet more deeper issues that remain elusive.

…

On Wed, Apr 10, 2019 at 11:40 AM Malthe Karbo ***@***.***> wrote: Did you roll out your patch yet? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#99 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALaKBVTKchrOG5G7C8wmsyftMuhZizocks5vfgXugaJpZM4Vrz0B> .

jlmelville · 2019-04-12T03:25:48Z

I have confirmed that @scharron's attached data causes a failure on the latest UMAP with metric="cosine". I don't know if it will reach the recursion limit and give an error as I always terminate the calculation after about 30 minutes of nothing happening.

The culprit seems to be the @numba.njit(fastmath=True) decorating angular_random_projection_split at:

umap/umap/rp_tree.py

Lines 28 to 29 in 6a32f62

    
           @numba.njit(fastmath=True) 
        
           def angular_random_projection_split(data, indices, rng_state):

Set fastmath=False and everything proceeds as normal (except slower). I have confirmed that this also affects the current master of https://github.com/lmcinnes/pynndescent.

I suppose this could also be affecting the Euclidean and sparse versions of the projection split and just waiting for a dataset to trigger them.

The downside of fastmath=False is that there is a noticeable speed decrease of of between 65%-120% for the nearest neighbor search with metric="euclidean", for the datasets I looked at (these are MNIST and bigger in size and dimensionality), although most of them were on the lower end of those numbers. And for metric="cosine", the slow down is less severe, more like 20%, although I haven't tested that very much.

According to the numba docs, it's possible to opt in to a subset of the fast math settings, so that might be worth looking into.

lmcinnes · 2019-04-12T14:14:37Z

Thanks @jlmelville ! I will have to look into the details of what subset of fastmath I can opt in to, and what the actual offending aspect is. Presumably the issue is in the generation of the hyperplane (resulting in a plane that does not split that data at all, which then gets generated repeatedly). Perhaps an alternative would be to have some sort of bail-out failsafe if no points are separated by a given node of the tree since this does seem to very much be an odd corner case.

jlmelville · 2019-04-13T18:56:26Z

I looked at trying out some subsets of fastmath, but they always caused the infinite loop.

Looking in more detail at this, the problem seems to arise because of checking float values for 0. There's a lot of it in rp_tree.py, but this line in particular seemed to cause the issue:

umap/umap/rp_tree.py

Line 297 in 6a32f62

if margin == 0:

If that changes to if abs(margin < EPS) for some suitable EPS (1.e-8?), the tree building succeeds, but I think we would want to change all such comparisons.

Alternatively, is it the case that make_angular_tree should never create nodes with zero membership, i.e. in:

umap/umap/rp_tree.py

Lines 465 to 466 in 6a32f62

    
           left_node = make_angular_tree(data, left_indices, rng_state, leaf_size) 
        
           right_node = make_angular_tree(data, right_indices, rng_state, leaf_size)

left_node.shape[0] and right_node.shape[0] should always be > 0? The infinite loop seen with @scharron's data is due to one of those nodes having size 0.

You can prevent the loop by bailing out by raiseing a RecursionError, but that's obviously a drastic step because I think it causes the entire RP forest routine to abort and you start the nearest neighbor descent with a random initialization? Would it be ok to just arbitrarily split the remaining nodes into two equally sized nodes at this point? This would allow the sizes to eventually reach leaf_size and the algorithm to proceed in the event of there being more than leaf_size exact duplicates in the data set, although I am not yet familiar with the implementation details to see if that will cause problems down the line. I think it would just affect the neighbor list for those points as it gets passed into the nearest neighbor descent routine. If so, that would definitely be a better initialization strategy than the RecursionError path.

I was hoping to try out some of these solutions and submit a pr through pynndescent first, but I am getting failing unit tests on that project at the moment. @lmcinnes, I am happy to work on a pr for this here if you have a sense of which (or both) of these strategies to pursue.

lmcinnes · 2019-04-13T23:11:02Z

Yes, the tree should never create nodes of size zero, so that is certainly part of the problem. I guess the question is where is the best place to catch this. Your observation that the margin == 0 line is a source of issue is potentially the right place to catch things. Here is my suspicion as to what is happening:

The data contains a number of points that, while distinct and non-zero, are all scalar multiples of each other. Thus, they all lie on a line from the origin, and an angular hyperplane cannot possibly separate them. Ideally the margin == 0 case was supposed to catch this. If points were identically aligned with the hyperplane (as would be the case for such points, as the generated hyperplane would be identically zero if the node contained only such points) they would produce a zero margin and then we can randomly assign them one way or the other. I suspect the fast_math is producing enough rounding error despite dot producting with the all zero vector, all the margins are slightly one side or the other of zero and the end result is infinite recursion.

Now, there are a few ways to fix this: we can fix the margin comparison, as you have done to handle issues with floats better (and make the similar required changes to the other splitting routines); we can go after the root of the problem (a bad hyperplane; all zeros for the hyperplane vector in all the cases I believe), and if that occurs then just automate the random splitting; finally we could take the other option you suggest and catch splits that result in zero points in one side and insert a forced random partitioning into two nodes there.

The first way should work well enough, and seems mostly clean aside from the catch that we'll very occasionally mis-assign points that are very close to the hyperplane margin. The second approach gets to the heart of the actual issue, but still requires actually detecting an all zero hyperplane (up to float tolerances). The third option is probably the most robust in that if there are other reasons why the split can go astray (and there may be) it will still catch them. It is a more awkward fix to implement however.

To be honest I'm happiest with option 1 (fix the margin comparison). It will get the job done with the least amount of fuss. I'll see what the issues with pynndescent are -- the tests are all a bit new, so it could well be errors in the testing more than anything.

#99: port recursion error fix from pynndescent

radames · 2019-04-28T06:17:02Z

The latest fixes merged on master seems to fix the problem here to my tests on ~100k points (n_neighbors=20, metric='cosine', min_dist=0.4 ,init='random', verbose=2) !

lmcinnes · 2019-04-28T13:50:41Z

That's greta news. I'll try to put out a release soon.

…

On Sun, Apr 28, 2019 at 2:17 AM Radamés Ajna ***@***.***> wrote: The latest fixes merged on master seems to fix the problem here to my tests on ~100k points (n_neighbors=20, metric='cosine', min_dist=0.4 ,init='random', verbose=2) ! — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#99 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC3IUBLZUESA4KABDSMVPKTPSU6OFANCNFSM4FNPHUAQ> .

This was referenced Apr 14, 2019

Windows test errors lmcinnes/pynndescent#22

Closed

Fix type signature test failures and recursion error lmcinnes/pynndescent#24

Merged

lmcinnes closed this as completed in lmcinnes/pynndescent#24 Apr 18, 2019

jlmelville added a commit to jlmelville/umap that referenced this issue Apr 19, 2019

lmcinnes#99: port recursion error fix from pynndescent

406de67

lmcinnes added a commit that referenced this issue Apr 19, 2019

Merge pull request #220 from jlmelville/master

1fa2ff5

#99: port recursion error fix from pynndescent

jlmelville mentioned this issue Apr 20, 2019

Extract timestamp into function. #221

Merged

andrewwbutler mentioned this issue May 10, 2019

Error in py_call_impl(callable, dots$args, dots$keywords) : SystemError: <built-in method __new__ of type object at 0x00007FFE15D39530> returned a result with an error set satijalab/seurat#1494

Closed

stallam-unb mentioned this issue May 5, 2022

Maximum recursion depth error #860

Open

This was referenced Feb 21, 2023

make_dense_tree() with angular=True can segfault on poorly-behaved datasets lmcinnes/pynndescent#209

Closed

Fix issues with infinite recursion in random projection tree generation lmcinnes/pynndescent#210

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RecursionError #99

RecursionError #99

sametdumankaya commented Aug 2, 2018

lmcinnes commented Aug 2, 2018 via email

diego-vicente commented Aug 27, 2018

lmcinnes commented Aug 27, 2018 via email

madkoppa commented Sep 4, 2018

lmcinnes commented Sep 4, 2018 via email

madkoppa commented Sep 5, 2018

lmcinnes commented Sep 5, 2018 via email

yizhouyan commented Nov 16, 2018

lmcinnes commented Nov 19, 2018 via email

dsaxton commented Nov 20, 2018

scharron commented Dec 11, 2018

lmcinnes commented Dec 11, 2018

thommiano commented Mar 1, 2019

lucidyan commented Mar 26, 2019

lmcinnes commented Mar 26, 2019

mkarbo commented Apr 10, 2019

lmcinnes commented Apr 10, 2019 via email

jlmelville commented Apr 12, 2019

lmcinnes commented Apr 12, 2019

jlmelville commented Apr 13, 2019

lmcinnes commented Apr 13, 2019

radames commented Apr 28, 2019

lmcinnes commented Apr 28, 2019 via email

RecursionError #99

RecursionError #99

Comments

sametdumankaya commented Aug 2, 2018

lmcinnes commented Aug 2, 2018 via email

diego-vicente commented Aug 27, 2018

lmcinnes commented Aug 27, 2018 via email

madkoppa commented Sep 4, 2018

lmcinnes commented Sep 4, 2018 via email

madkoppa commented Sep 5, 2018

lmcinnes commented Sep 5, 2018 via email

yizhouyan commented Nov 16, 2018

lmcinnes commented Nov 19, 2018 via email

dsaxton commented Nov 20, 2018

scharron commented Dec 11, 2018

lmcinnes commented Dec 11, 2018

thommiano commented Mar 1, 2019

lucidyan commented Mar 26, 2019

lmcinnes commented Mar 26, 2019

mkarbo commented Apr 10, 2019

lmcinnes commented Apr 10, 2019 via email

jlmelville commented Apr 12, 2019

lmcinnes commented Apr 12, 2019

jlmelville commented Apr 13, 2019

lmcinnes commented Apr 13, 2019

radames commented Apr 28, 2019

lmcinnes commented Apr 28, 2019 via email