Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an initial interface for nearest neighbor queries with a simple implementation #213

Merged
merged 3 commits into from
Feb 5, 2022

Conversation

geoffreydstewart
Copy link
Member

I'm interested in a k-d tree implementation to get faster nearest neighbor queries, particularly to improve the Hdbscan training times. Before building this though, I thought I would propose how this might fit in by providing an initial interface, and a simple concrete implementation. The concrete implementation provided here gives brute-force query results.


private SGDVector[] getTestDataVectorArray() {
List<SGDVector> vectorList = new ArrayList<>();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same set of points used in Hdbscan tutorial. I didn't see an easy way to get points from a csv into SGDVectors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't an easy way, though I guess we could add one as a test time helper because it could be useful elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we'll see if there is further need for this. Just wanted to make sure I hadn't missed it.

@Craigacp Craigacp added Oracle employee This PR is from an Oracle employee squash-commits Squash the commits when merging this PR labels Jan 27, 2022
@Craigacp Craigacp self-assigned this Jan 27, 2022

private SGDVector[] getTestDataVectorArray() {
List<SGDVector> vectorList = new ArrayList<>();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't an easy way, though I guess we could add one as a test time helper because it could be useful elsewhere.

Copy link
Member

@Craigacp Craigacp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small things, but overall it looks good.

// Use an array to put the polled items from the queue into a sorted ascending order, by distance.
while (!queue.isEmpty()) {
MutablePair mutablePair = queue.poll();
indexDistanceArr[k - i++] = new Pair<>(mutablePair.index, mutablePair.value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the postfix increment out onto a separate line? It's the kind of thing that's easily missed when debugging or doing code reviews and so we try to make it more explicit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, that part was done quickly. Making it more obvious certainly facilitates inspecting values during debugging.

}

@SuppressWarnings("unchecked")
Pair<Integer, Double>[] indexDistanceArr = (Pair<Integer, Double>[]) new Pair[k];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an ArrayList too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left this as an array for now too, for the same reasons mentioned above.

Signed-off-by: Geoffrey Stewart <geoff.stewart@oracle.com>
@Craigacp Craigacp merged commit 407af05 into oracle:main Feb 5, 2022
@geoffreydstewart geoffreydstewart deleted the neighbors branch February 7, 2022 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Oracle employee This PR is from an Oracle employee squash-commits Squash the commits when merging this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants