Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arc distance in knnW #646

Closed
lanselin opened this issue Jun 24, 2015 · 8 comments · Fixed by #647
Closed

arc distance in knnW #646

lanselin opened this issue Jun 24, 2015 · 8 comments · Fixed by #647
Assignees
Labels

Comments

@lanselin
Copy link
Member

I may be missing something, but it seems to me that knnW in Distance.py is assuming that the distances are Euclidean and does not allow for arc distances. In practice, this will likely seldom come up as a problem, since, e.g., knnW_from_array (in user.py) takes a radius argument and passes a distance_metric argument to pysal.cg.KDTree before calling the worker bee knnW function, thus passing the correct KDTree.

However, when knnW is called directly (in Distance.py) there doesn't seem to be a way to pass a distance_metric argument and when only an array of coordinates is provided, the KDTree is called with only one argument (u) - on line 122.

Does this need to be addressed or is it taken care of in another way? One way to deal with this is to restrict knnW to always take a KDTree as an argument and remove the option to pass an array. This would then require a small helper function that creates the tree and deals with the distance_metric argument.

@jlaura
Copy link
Member

jlaura commented Jun 25, 2015

What about adding the arcdistance=False flag to the knnW function. If arc distance is passed as true, we can either (1) unpack the data from the current KDTree and load a ps.Arc_KDTree or (2) take the array and create a ps.Arc_KDTree? If this check occurred prior to the current if (line 111 in Distance.py) we could first check for arc, then for a subclass, and finally for datatype. This should maintain the logical ordering.

I can make this change if it looks like it will serve.

@jlaura jlaura self-assigned this Jun 26, 2015
@sjsrey
Copy link
Member

sjsrey commented Jun 27, 2015

+1 on the change. I think this would allow knnW to use arc distances if needed.

@jlaura
Copy link
Member

jlaura commented Jun 27, 2015

Now that I am digging in wanted to bounce some ideas around:

  1. In order to use the arc distance, we need to provide a radius. Right now this defaults to 1.0, but suspect that most people will want to specify ~6371.0(km). This would mean that supporting arc distance via this method would require the addition of arcdistance=False and radius=1.0. This is becoming less of a clean addition.

Some options:

  • Accept an optional **kwargs and pass through. I dislike this because it is significantly less clear to the user what is going on.
  • Include both kwargs. Not a huge fan of adding two flags. The user could pass arc distance and never specify radius. With r=1.0 the issues at the poles with KDTrees are likely to still be present.
  • Include just the radius option. If you pass a p argument, we use euclidean and if you pass a radius argument we use arc distance. I like this as it is only a single change, but it makes the API a little less clear.
  1. The coincident point check used for raw data makes a lot of sense. We can abstract that out into a private method so that it only exists in a single place. Right now, I see the addition of two if/else statements: (1) if radius and isinstance(data, pysal.cg.KDTree) we unpack the data and create a new ArcKDTree and (2) if radius and isinstance(data, (np.ndarray, np.generic) we simply create a new ArcKDTree.
  • Do we add the coincident point check in both instances? I think we should, but this is not something that we do if the user passes in an already created KDTree.

@sjsrey
Copy link
Member

sjsrey commented Jun 27, 2015

I think adding the optional kw parameter radius=None is the cleanest solution. If that is specified with a radius then arc distance is used.

I'm not sure about the second point you are raising. Presumably, if the user wants to use arc distances, the tree should be built with ArcKDTree - they wouldn't have constructed a regular KDTree and then passed that in as the first argument to knnW. Maybe I'm missing what you are getting at?

@jlaura
Copy link
Member

jlaura commented Jun 27, 2015

On the second point, we support passing in a KDTree. Should the logic be, if data is a KDTree (not an ArcKDTree) and radius is specified, we throw an error or should we try to handle that by unpacking the existing KDTree and repacking into an ArcKDTree?

Assuming we want to support that (for the confused user), do we run the coincident point check as well during the unpack, repack process?

@sjsrey
Copy link
Member

sjsrey commented Jun 27, 2015

Maybe Luc's original suggestion is cleaner?

  1. change the API for knnW to take only a KDTree as an argument. Internally we can then check if its arc based or euclidean and do the correct thing

  2. add small helper functions to build the trees.

@lanselin
Copy link
Member Author

Given the philosophy of PySAL as a library, I think it would be cleaner to let knnW just build the weights, given a KDTree as the argument. How that KDTree is constructed (Euclidean or Arc) is not considered by knnW. In fact, this is pretty much what happens in practice when one of the knnW_ user functions is invoked. They take care of the logic of array to KDTree conversion using the correct distance metric.

So I would argue for restricting knnW proper to only allow a KDTree as an argument, making it agnostic to how that tree is constructed and adding a couple of specific array to tree conversion function (to the extent that they are not there already).

I think right now there is a bit of tension between the objective of knnW being a core ("base") function and its design as more of a user function (with all the checks). What we've tried to do in spreg (although it's by no means perfect) is to limit the options and arguments for the base functions to the absolute minimum, and do all the checking and alternative entry points in the user function/class.

@sjsrey
Copy link
Member

sjsrey commented Jun 28, 2015

This example demonstrates how we can currently first build a KDTree using spherical coordinates and then pass it to knnW. If you load the linked files into qgis and follow along you will see that the calculations are correct in terms of distances.

So changing the api of knnW proper as @lanselin suggests seems a good way to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants