Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add k-d tree #447

Merged
merged 22 commits into from
Jul 20, 2021
Merged

add k-d tree #447

merged 22 commits into from
Jul 20, 2021

Conversation

Vindaar
Copy link
Collaborator

@Vindaar Vindaar commented May 3, 2020

This is almost a straight up port from scipy's KD tree (the Python implementation, not cKDTree, which is a C implementation).

In principle the implementation works and gives the same results as the scipy tree.

Things left to do:

  • change return value from seq[tuple] to something more reasonable, maybe tuple of Tensor[T], Tensor[int].
  • add specific query functions, like query ball
  • make use of real fancy indexing, now that it works
  • fix the < comparison of nodes
  • improve non-zero implementation
  • clean up code
  • add tests
  • add docs
  • maybe unify the metrics used here with a general concept of metrics in arraymancer (given that e.g. dbscan allows for different metrics, maybe kmeans does too, haven't checked). NOTE: this will be done after the DBSCAN PR is merged (I'll try to get that done over the weekend). Then we can have a general metrics in spatial that can be used in clustering and here and by itself.

We don't want to export `<` for tensors that does a element wise
comparison until something is smaller or larger in either argument.

As I couldn't make the compiler happy with a `bind `<`` in the code,
let's work around it by creating a `distinct` type that we only use in
the heap queue! 🥳
@Vindaar
Copy link
Collaborator Author

Vindaar commented Jul 7, 2021

Only have to turn a few scipy comparisons into a proper test now.

Seems to work fine now. An example of a 2D tree based on 1000 random points as a non balanced tree.

The color corresponds to the node ID in which the data is. The shape corresponds to the result of a ball query around the red point with a radius of 0.1. All found points in the search correspond are drawn as points. The rest as crosses.

kdtree

And the same input data to create a balanced tree (cutting at the median instead of the middle):

kdtree

Both are using the default leaf size of 16.

edit:

The plots and stuff

As is evident from these plots up there, I wrote some code to plot such trees. Of course the plotting part is only really possible in the 2D case. But the hyperrectangle + extracting the bounds of each node could come in handy for the k-d case too.

I wasn't sure whether to put it into this module directly. While looking around I saw that lonely tools directory we have. Should I put the code in there? Otherwise I can just put it into a gist of course.

@Vindaar Vindaar marked this pull request as ready for review July 7, 2021 18:42
@Vindaar Vindaar changed the title [WIP] add k-d tree add k-d tree Jul 7, 2021
@Vindaar
Copy link
Collaborator Author

Vindaar commented Jul 7, 2021

Ok, aside from the question about whether I should add the plotting / hyperrectangle stuff, this is done from my side. Since this is a bit bigger, I'd appreciate if someone could take a look. :)

@Vindaar
Copy link
Collaborator Author

Vindaar commented Jul 7, 2021

Ugh, I forgot about the annoying -d:nimLegacyRandomInitRand flag. Guess who's going to redo all the expected numbers now... 😞

@Vindaar
Copy link
Collaborator Author

Vindaar commented Jul 7, 2021

Let's hope it works now.

@Vindaar Vindaar mentioned this pull request Jul 9, 2021
5 tasks
@Vindaar Vindaar merged commit a1968d5 into mratsim:master Jul 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant