-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add K Nearest Neighbor search into STR-Tree #75
Conversation
…t case Signed-off-by: Jia Yu <jiayu2@asu.edu>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a legal/licensing review, two files need license headers.
| @@ -0,0 +1,53 @@ | |||
| package org.locationtech.jts.index.strtree; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New files require a license header.
The form of the header is...
/*
- Copyright (c) 2016 Vivid Solutions.
- All rights reserved. This program and the accompanying materials
- are made available under the terms of the Eclipse Public License v1.0
- and Eclipse Distribution License v. 1.0 which accompanies this distribution.
- The Eclipse Public License is available at http://www.eclipse.org/legal/epl-v10.html
- and the Eclipse Distribution License is available at
- http://www.eclipse.org/org/documents/edl-v10.php.
*/
You can list yourself as having copyright.
| @@ -0,0 +1,58 @@ | |||
| package org.locationtech.jts.index; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto about the license headers for this file.
Signed-off-by: Jia Yu <jiayu2@asu.edu>
| * to provide an efficient search. This method implements the KNN algorithm described in the following paper: | ||
| * Roussopoulos, Nick, Stephen Kelley, and Frédéric Vincent. "Nearest neighbor queries." ACM sigmod record. Vol. 24. No. 2. ACM, 1995. | ||
| * <p> | ||
| * The query <tt>object</tt> does <b>not</b> have to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this say item rather than object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the query object actually is a Java Object class although it is used as "Item" in ItemBoundable. I am following the same name convention of the original NearestNeighbor() function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a look at http://stackoverflow.com/questions/1667212/how-to-add-reference-to-a-method-parameter-in-javadoc
It should probably say {@code item}, or else maybe {@ link Object} (although I think the former is more specific)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * @param k the K nearest items in kNearestNeighbour | ||
| * @return the K nearest items in this tree | ||
| */ | ||
| public Object[] kNearestNeighbour(Envelope env, Object item, ItemDistance itemDist,int k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to call this nearestNeighbour (using overloading), or else nearestNeighbourK (so that the methods are adjacent in the Javadoc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiayuasu What do you think about this suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dr-jts I agree with you. I think using overloading is better and easy for understanding. I will make it same with "nearestNeighbour" with an additional parameter "int k".
Signed-off-by: Jia Yu <jiayu2@asu.edu>
|
|
||
| Object[] result = new Object[kNearestNeighbors.size()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this use the Collection.toArray() method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dr-jts No, it cannot. The K Nearest Neighbor Priority Queue needs to contain top K ItemBoundable (item+its bounding envelope). We have two options (1) Use the existing way (2) Maintain an additional list of top k nearest neighbors' bounding envelopes all the time. The (1) is OK because normally the K is much smaller than the original dataset. The (2) is time-consuming because we need to update the additional list all the time along with KNN Priority Queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right, I missed that. In that case might be nice to pull this code out into a private static method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I will make it into a separate private static method.
| @@ -373,5 +402,91 @@ public Object nearestNeighbour(Envelope env, Object item, ItemDistance itemDist) | |||
| ((ItemBoundable) minPair.getBoundable(1)).getItem() | |||
| }; | |||
| } | |||
| private Object[] nearestNeighbour(BoundablePair initBndPair, double maxDistance, int k) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the original NN(1) method can just delegate to this one with k = 1 ? That would be fewer code pathways to maintain. Unless it is slower?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dr-jts Yes, you are right. If k = 1, the KNN method has the same result and same performance with the original NN(1) method because the maintained KNN Queue just becomes 1NN. However, their APIs are a little bit different. The return of the original NN(1) method is a different array in the following format:[Nearest item, Query Point]. The new NN(K) method returns an array [Nearest item1, item2, item3,...., item k] without including Query Point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, good point!
You know, in that case these method probably really should have different names (since the return data is different). So how about calling the new method nearestNeighbourK ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dr-jts Oh, sorry. I missed this: the original NN(1) has an additional API wrapper to return the item only.
public Object nearestNeighbour(Envelope env, Object item, ItemDistance itemDist)
{
Boundable bnd = new ItemBoundable(env, item);
BoundablePair bp = new BoundablePair(this.getRoot(), bnd, itemDist);
return nearestNeighbour(bp)[0];
}
So, the NN(1) and the NN(K) have the same return format from the user's perspective. Do you think I still need to use "nearestNeighbourK"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not. Although @bjornharrtell is going to complain I think! (because JSTS has problems with overloading...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Let's stick with the overloaded method name then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. So what should I do for this PR now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code is good. Give us a bit of time to finalize the acceptance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you like you could add a function to the TestBuilder to expose the kNN functionality. I added a NN function just now (see last commit to master)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dr-jts I exposed the kNN functionality following your NN function. Hope it is OK. 😁
Signed-off-by: Jia Yu <jiayu2@asu.edu>
Add STRtree NN function (#76)
Signed-off-by: Jia Yu <jiayu2@asu.edu>
| @@ -117,6 +117,14 @@ public static Geometry strTreeNN(Geometry geoms, Geometry geom) | |||
| return (Geometry) result; | |||
| } | |||
|
|
|||
| public static Geometry[] strTreeNNk(Geometry geoms, Geometry geom) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add int k as a parameter to this method, and it will be exposed in the TestBuilder UI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dr-jts I just try to add "int k" as a parameter. But it seems TestBuilder UI doesn't take more than 2 parameters. It only has two input areas. Not sure about this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add more parameters. An input box is provided on the Geometry Functions tab.
See this example:
| STRtree index = buildSTRtree(geoms); | ||
| int k = 10; | ||
| Object[] result = index.nearestNeighbour(geom.getEnvelopeInternal(), geom, new GeometryItemDistance(), k); | ||
| return (Geometry[]) result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure TestBuilder handles Geometry[] results. I'll have to check this, or perhaps you can run it and see.
It should be enhanced to automatically convert them to GeometryCollections, but that's not in place yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dr-jts I converted them to GeometryCollection manually because the TestBuilder doesn't take Geometry[].
Signed-off-by: Jia Yu <jiayu2@asu.edu>
| return (Geometry[]) result; | ||
| int k = 2; | ||
| Object[] knnObjects = index.nearestNeighbour(geom.getEnvelopeInternal(), geom, new GeometryItemDistance(), k); | ||
| Geometry[] knnGeoms = new Geometry[k]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dr-jts OK. It fixed now, 😁
Signed-off-by: Jia Yu <jiayu2@asu.edu>
Add K Nearest Neighbor search into STR-Tree and corresponding unite test case
Signed-off-by: Jia Yu jiayu2@asu.edu