Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: add info on recursion limit to KDTree docstring. #281

Merged
merged 1 commit into from Aug 6, 2012

Conversation

rgommers
Copy link
Member

@rgommers rgommers commented Aug 4, 2012

No description provided.

@jakevdp
Copy link
Member

jakevdp commented Aug 5, 2012

I think you could also address a recursion error by increasing the leaf size.

Thanks to Oleksandr Huziy and David Parker.
@rgommers
Copy link
Member Author

rgommers commented Aug 5, 2012

Thanks @jakevdp, added that.

@jakevdp
Copy link
Member

jakevdp commented Aug 6, 2012

By the way, what was the motivation for adding this note? The default recursion limit is usually ~1000, meaning that the query algorithm should be fine for up to about 2^1000 points, unless I'm mistaken. Is there an algorithmic detail I'm forgetting?

@rgommers
Copy link
Member Author

rgommers commented Aug 6, 2012

A user who ran into it: http://thread.gmane.org/gmane.comp.python.scientific.user/32087

Don't think he had 2^1000 points. He provided his code and data, if you want to look into it.

@jakevdp
Copy link
Member

jakevdp commented Aug 6, 2012

I checked this out, and I understand the issue now: I didn't realize that our KD tree used the sliding midpoint rule. With this split criterion, there's no guarantee of O[log(n)] nodes. For some structured datasets, the majority of nodes may actually be empty! I have some opinions about this (I think the benefits of sliding midpoint are overstated, and the potential for these sorts of node proliferation problems outweighs any potential performance gain) but that's probably a discussion for another thread.

@rgommers
Copy link
Member Author

rgommers commented Aug 6, 2012

So merge this, and revise or add to the implementation later?

Have you seen #262 by the way? Should be of interest to you.

@jakevdp
Copy link
Member

jakevdp commented Aug 6, 2012

I'd say merge this as is for now. We can think about optimal splitting criteria at some other point. I've been following #262 as well -- it's looking like a good contribution.

rgommers added a commit that referenced this pull request Aug 6, 2012
DOC: add info on recursion limit to KDTree docstring.
@rgommers rgommers merged commit fd5ceda into scipy:master Aug 6, 2012
@rgommers
Copy link
Member Author

rgommers commented Aug 6, 2012

OK, thanks. Merged.

@sturlamolden
Copy link
Contributor

The midpoint rule is very easy to change or even control with a keyword argument. I believe I used the median computed with mergesort originally (on SciPy's cookbook), but Anne changed it to the sliding midpoint. It is usually more efficent, but not always.

@sturlamolden
Copy link
Contributor

I think #262 should be merged, but wait until after 0.11.0 is released.

@rgommers
Copy link
Member Author

@sturlamolden #262 indeed looks good (thanks for all your work on that!), once Patrick is back from holiday and adds your last PR and perhaps the benchmark, I agree it can be merged. No need to wait for 0.11.0, because that's already in a separate branch. We don't live in the svn age anymore:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants