Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get a RecursionError #17

Open
HanchenXiong opened this issue Nov 14, 2017 · 4 comments
Open

get a RecursionError #17

HanchenXiong opened this issue Nov 14, 2017 · 4 comments
Labels

Comments

@HanchenXiong
Copy link

working_uamp = umap.UMAP(n_neighbors=5,
n_components=2,
min_dist=0.3,
metric='euclidean')

input_feat = df.as_matrix([df.columns[1:1001]])
embeddings = working_uamp.fit_transform(input_feat)

---------------------------error message ------------------------------

make_tree(data, indices, leaf_size)
105 rng_state)
106 left_node = make_tree(data, left_indices, leaf_size)
--> 107 right_node = make_tree(data, right_indices, leaf_size)
108 node = RandomProjectionTreeNode(indices, False, left_node, right_node)
109 else:

RecursionError: maximum recursion depth exceeded

@lmcinnes
Copy link
Owner

That would mean that the random projection trees are failing to split your data. I hadn't really anticipated such a thing happening, hence the weird error. Some possibilities: you have many points that are identical; the data has a very strange distribution; you have a lot of data. I'm guessing one of the first two.

I'll see if I can manage to at least catch this and give a more informative error to start with. Then I'll have to see if I can provide an alternative approach (random initialisation for NN-descent would do, for example).

Thanks for report!

@HanchenXiong
Copy link
Author

My data size is not huge, 100k level. There might be some identical data, i can not tell how many now. but this can happen in MINST as well if you binarize them. Is there any quick to go around this, e.g. change the #neighbors , or metric ?

@lmcinnes
Copy link
Owner

Increasing n_neighbors may help. Better would be to avoid the rp-tree initialisation, but I don't have code for that yet. If you really want to just get it to work now you can comment out lines 238 to 253 in umap/umap_.py in the current master and that will force it to fall back to random initialisation. It may be slower, but it should at least work (barring other errors later in the code that such a data distribution may trigger).

lmcinnes added a commit that referenced this issue Nov 14, 2017
@lmcinnes
Copy link
Owner

I just committed what I hope is a fix that will at least allow the algorithm to continue. It will, unfortunately, be slower than it otherwise should be (something to be fixed later) but hopefully will get you passed this initial problem. If you have the opportunity I would appreciate it if you could clone from master and verify that this at least gets you beyond the current error.

@sleighsoft sleighsoft added the bug label Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants