-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an initial interface for nearest neighbor queries with a simple implementation #213
Conversation
Math/src/main/java/org/tribuo/math/neighbor/NeighborsBruteForce.java
Outdated
Show resolved
Hide resolved
Math/src/main/java/org/tribuo/math/neighbor/NeighborsQuery.java
Outdated
Show resolved
Hide resolved
|
||
private SGDVector[] getTestDataVectorArray() { | ||
List<SGDVector> vectorList = new ArrayList<>(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same set of points used in Hdbscan tutorial. I didn't see an easy way to get points from a csv into SGDVector
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't an easy way, though I guess we could add one as a test time helper because it could be useful elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we'll see if there is further need for this. Just wanted to make sure I hadn't missed it.
Math/src/main/java/org/tribuo/math/neighbor/NeighborsQuery.java
Outdated
Show resolved
Hide resolved
Math/src/main/java/org/tribuo/math/neighbor/NeighborsQuery.java
Outdated
Show resolved
Hide resolved
|
||
private SGDVector[] getTestDataVectorArray() { | ||
List<SGDVector> vectorList = new ArrayList<>(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't an easy way, though I guess we could add one as a test time helper because it could be useful elsewhere.
Math/src/main/java/org/tribuo/math/neighbor/NeighborsBruteForce.java
Outdated
Show resolved
Hide resolved
Math/src/main/java/org/tribuo/math/neighbor/NeighborsBruteForce.java
Outdated
Show resolved
Hide resolved
Math/src/main/java/org/tribuo/math/neighbor/NeighborsBruteForce.java
Outdated
Show resolved
Hide resolved
Math/src/main/java/org/tribuo/math/neighbor/NeighborsBruteForce.java
Outdated
Show resolved
Hide resolved
Math/src/main/java/org/tribuo/math/neighbor/NeighborsBruteForce.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small things, but overall it looks good.
Math/src/main/java/org/tribuo/math/neighbour/NeighboursQueryFactory.java
Show resolved
Hide resolved
// Use an array to put the polled items from the queue into a sorted ascending order, by distance. | ||
while (!queue.isEmpty()) { | ||
MutablePair mutablePair = queue.poll(); | ||
indexDistanceArr[k - i++] = new Pair<>(mutablePair.index, mutablePair.value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move the postfix increment out onto a separate line? It's the kind of thing that's easily missed when debugging or doing code reviews and so we try to make it more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, that part was done quickly. Making it more obvious certainly facilitates inspecting values during debugging.
Math/src/main/java/org/tribuo/math/neighbour/bruteforce/NeighboursBruteForce.java
Show resolved
Hide resolved
Math/src/main/java/org/tribuo/math/neighbour/bruteforce/NeighboursBruteForce.java
Show resolved
Hide resolved
} | ||
|
||
@SuppressWarnings("unchecked") | ||
Pair<Integer, Double>[] indexDistanceArr = (Pair<Integer, Double>[]) new Pair[k]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an ArrayList
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left this as an array for now too, for the same reasons mentioned above.
Math/src/main/java/org/tribuo/math/neighbour/bruteforce/NeighboursBruteForceFactory.java
Show resolved
Hide resolved
Math/src/main/java/org/tribuo/math/neighbour/bruteforce/NeighboursBruteForceFactory.java
Show resolved
Hide resolved
Math/src/test/java/org/tribuo/math/neighbour/TestNeighborsBruteForce.java
Outdated
Show resolved
Hide resolved
Math/src/test/java/org/tribuo/math/neighbour/TestNeighborsBruteForce.java
Outdated
Show resolved
Hide resolved
Math/src/test/java/org/tribuo/math/neighbour/TestNeighborsBruteForce.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Geoffrey Stewart <geoff.stewart@oracle.com>
I'm interested in a k-d tree implementation to get faster nearest neighbor queries, particularly to improve the Hdbscan training times. Before building this though, I thought I would propose how this might fit in by providing an initial interface, and a simple concrete implementation. The concrete implementation provided here gives brute-force query results.