Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add K Nearest Neighbor search into STR-Tree #75

Merged
merged 9 commits into from
Feb 21, 2017
Merged

Add K Nearest Neighbor search into STR-Tree #75

merged 9 commits into from
Feb 21, 2017

Conversation

jiayuasu
Copy link
Contributor

Add K Nearest Neighbor search into STR-Tree and corresponding unite test case

Signed-off-by: Jia Yu jiayu2@asu.edu

…t case

Signed-off-by: Jia Yu <jiayu2@asu.edu>
Copy link
Contributor

@jnh5y jnh5y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a legal/licensing review, two files need license headers.

@@ -0,0 +1,53 @@
package org.locationtech.jts.index.strtree;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New files require a license header.

The form of the header is...

/*

You can list yourself as having copyright.

@@ -0,0 +1,58 @@
package org.locationtech.jts.index;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto about the license headers for this file.

* to provide an efficient search. This method implements the KNN algorithm described in the following paper:
* Roussopoulos, Nick, Stephen Kelley, and Frédéric Vincent. "Nearest neighbor queries." ACM sigmod record. Vol. 24. No. 2. ACM, 1995.
* <p>
* The query <tt>object</tt> does <b>not</b> have to be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say item rather than object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the query object actually is a Java Object class although it is used as "Item" in ItemBoundable. I am following the same name convention of the original NearestNeighbor() function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look at http://stackoverflow.com/questions/1667212/how-to-add-reference-to-a-method-parameter-in-javadoc

It should probably say {@code item}, or else maybe {@ link Object} (although I think the former is more specific)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dr-jts Sure. I understand your point. I will use {@code item} style. I will push them now.

* @param k the K nearest items in kNearestNeighbour
* @return the K nearest items in this tree
*/
public Object[] kNearestNeighbour(Envelope env, Object item, ItemDistance itemDist,int k)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to call this nearestNeighbour (using overloading), or else nearestNeighbourK (so that the methods are adjacent in the Javadoc)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiayuasu What do you think about this suggestion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dr-jts I agree with you. I think using overloading is better and easy for understanding. I will make it same with "nearestNeighbour" with an additional parameter "int k".

Signed-off-by: Jia Yu <jiayu2@asu.edu>

Object[] result = new Object[kNearestNeighbors.size()];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this use the Collection.toArray() method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dr-jts No, it cannot. The K Nearest Neighbor Priority Queue needs to contain top K ItemBoundable (item+its bounding envelope). We have two options (1) Use the existing way (2) Maintain an additional list of top k nearest neighbors' bounding envelopes all the time. The (1) is OK because normally the K is much smaller than the original dataset. The (2) is time-consuming because we need to update the additional list all the time along with KNN Priority Queue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, I missed that. In that case might be nice to pull this code out into a private static method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will make it into a separate private static method.

@@ -373,5 +402,91 @@ public Object nearestNeighbour(Envelope env, Object item, ItemDistance itemDist)
((ItemBoundable) minPair.getBoundable(1)).getItem()
};
}
private Object[] nearestNeighbour(BoundablePair initBndPair, double maxDistance, int k)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the original NN(1) method can just delegate to this one with k = 1 ? That would be fewer code pathways to maintain. Unless it is slower?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dr-jts Yes, you are right. If k = 1, the KNN method has the same result and same performance with the original NN(1) method because the maintained KNN Queue just becomes 1NN. However, their APIs are a little bit different. The return of the original NN(1) method is a different array in the following format:[Nearest item, Query Point]. The new NN(K) method returns an array [Nearest item1, item2, item3,...., item k] without including Query Point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, good point!

You know, in that case these method probably really should have different names (since the return data is different). So how about calling the new method nearestNeighbourK ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dr-jts Oh, sorry. I missed this: the original NN(1) has an additional API wrapper to return the item only.

  public Object nearestNeighbour(Envelope env, Object item, ItemDistance itemDist)
  {
    Boundable bnd = new ItemBoundable(env, item);
    BoundablePair bp = new BoundablePair(this.getRoot(), bnd, itemDist);
    return nearestNeighbour(bp)[0];
  }

So, the NN(1) and the NN(K) have the same return format from the user's perspective. Do you think I still need to use "nearestNeighbourK"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not. Although @bjornharrtell is going to complain I think! (because JSTS has problems with overloading...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Let's stick with the overloaded method name then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. So what should I do for this PR now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code is good. Give us a bit of time to finalize the acceptance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like you could add a function to the TestBuilder to expose the kNN functionality. I added a NN function just now (see last commit to master)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dr-jts I exposed the kNN functionality following your NN function. Hope it is OK. 😁

Signed-off-by: Jia Yu <jiayu2@asu.edu>
Signed-off-by: Jia Yu <jiayu2@asu.edu>
@@ -117,6 +117,14 @@ public static Geometry strTreeNN(Geometry geoms, Geometry geom)
return (Geometry) result;
}

public static Geometry[] strTreeNNk(Geometry geoms, Geometry geom)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add int k as a parameter to this method, and it will be exposed in the TestBuilder UI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dr-jts I just try to add "int k" as a parameter. But it seems TestBuilder UI doesn't take more than 2 parameters. It only has two input areas. Not sure about this now.

Copy link
Contributor

@dr-jts dr-jts Feb 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add more parameters. An input box is provided on the Geometry Functions tab.

See this example:

https://github.com/locationtech/jts/blob/master/modules/app/src/main/java/org/locationtech/jtstest/function/CreateShapeFunctions.java#L73

STRtree index = buildSTRtree(geoms);
int k = 10;
Object[] result = index.nearestNeighbour(geom.getEnvelopeInternal(), geom, new GeometryItemDistance(), k);
return (Geometry[]) result;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure TestBuilder handles Geometry[] results. I'll have to check this, or perhaps you can run it and see.

It should be enhanced to automatically convert them to GeometryCollections, but that's not in place yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dr-jts I converted them to GeometryCollection manually because the TestBuilder doesn't take Geometry[].

Signed-off-by: Jia Yu <jiayu2@asu.edu>
return (Geometry[]) result;
int k = 2;
Object[] knnObjects = index.nearestNeighbour(geom.getEnvelopeInternal(), geom, new GeometryItemDistance(), k);
Geometry[] knnGeoms = new Geometry[k];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dr-jts OK. It fixed now, 😁

Signed-off-by: Jia Yu <jiayu2@asu.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants