Skip to content

Conversation

@dlfajen
Copy link

@dlfajen dlfajen commented Jan 15, 2020

I found this existing library (https://github.com/AReallyGoodName/OfflineReverseGeocode) that:

  1. Reads a GeoNames place names file (http://download.geonames.org/export/dump/)
  2. Converts the lat/lon into a 3D point
  3. Stores the 3D point into a k-d tree (https://en.wikipedia.org/wiki/K-d_tree).
    This allows really fast lookups from lat/lon to a GeoNames place.

I added onto this library by:

  1. Also reading a GeoNames postal code file (https://download.geonames.org/export/zip/)
  2. Allowing the ability to lookup the postal code given a lat/lon with an optional maximum distance

This library will be used by the reference-data-sdk project to create a "ReversePostalCodeService" which will be used within pipeline-uphoria-processors to lookup the postal code after calculating the mobile location (lat/lon). The "allCountries.txt" file will be loaded into memory (takes ~ 3 seconds locally) within the pipeline-uphoria-processors server once and then used for lookups until the server is restarted. I am still finalizing the code for the other projects.

setPoint();
}

private void setPoint() {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how the 3D point is created

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume this is some kind of well-established vector algorithm... but I couldn't find the source with a cursory search. What's the objective of this point conversion?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah is it just trying to normalize our 2d points into 3d vectors?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of my code is identical to the GeoName.java file, which is already in the library. I couldn't find this exact math anywhere either so I was hoping that @caleboverman can look it over and we can tweak it if we want. I found some distance calculations that were somewhat similar (https://www.movable-type.co.uk/scripts/latlong.html).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My java is a little rusty, but it looks like they're using the squaredDistance to find the nearest postal code centroid? The haversine method is the preferred geo distance calculation (linked by @dlfajen) and is what we use in the distance_to_venue derived attribute.

There's also the distinct possibility that the nearest centroid is not the actual postal code that your lat/lon indicate. A better solution would be using a true reverse geocode API that relies on shapefiles to determine a postal code. That might be a future iteration...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah... I'm not finding where I found that distance equation. I'll change it to match what you've posted because that link is exactly what this other code is doing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I changed the distance in the max distance check to use the haversine distance instead. I'm wondering if I should do anything with the other distance checks that are in the KD tree lookups. I think he coded it like this as more of a quick approximation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah my impression was it used a quick check to narrow the results. So it’s probably ok to only use haversine once the search space has been narrowed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a variation of the haversine algorithm for vectors? The (haversine) links above always just use coordinates directly so if you wanted to use that you could skip the vectorization right?

Otherwise, given the converted vectors you can do the more simple calculations based on the distance between two vectors on a sphere (given the radius) which is the function I quoted above with arctan and the cross product of the vectors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to use the distance calculation using arctan(cross product, dot product). Also added a junit test for it.

@dlfajen
Copy link
Author

dlfajen commented Jan 17, 2020

I still need to add README and CONTRIBUTING files too, once I figure all that out.

@jameswyly
Copy link

This is my most up-to-date CONTRIBUTING.md for reference if you want https://github.com/sporting-innovations/pipeline-uphoria-processors/blob/master/CONTRIBUTING.md

This was the only library-ish README that I've made so far... most of them have been for applications which require a lot of configurations. https://github.com/sporting-innovations/bigquery-utils/blob/master/README.md .... for this project is the input file configurable?

And the CHANGELOG is straightforward enough, just need to copy the headers. https://github.com/sporting-innovations/bigquery-utils/blob/master/CHANGELOG.md

It might make sense to use the original forked version with all of these metadata pieces added as maybe version 0.0.1, then you can add all of your updates here as part of 1.0.0 release.... though I guess I don't know if you can do that reasonably with the direct fork. If not, since you're changing the namespace I think your first release here can still be version 1.0.0

@dlfajen
Copy link
Author

dlfajen commented Jan 17, 2020

So I didn't realize that forked repositories can't be private, but @jbkc85 pointed out that this sporting-innovations/OfflineReverseGeocode repository that I created from a public repository is now also public. Do we think that will be an issue? There isn't anything in the code that is specific to FTS, it's all just math that you can get off the internet. The only thing I can see that I might want to change is the unit test specifying the FTS location. I could call it Kansas City instead.

@nicolaskline
Copy link

@dlfajen I have no issues with it being a public repository. In fact I still suggest you PR back to the original project.

@dlfajen
Copy link
Author

dlfajen commented Jan 17, 2020

Thanks! I plan on doing that once we get it all finalized here.

@dlfajen
Copy link
Author

dlfajen commented Jan 17, 2020

Merging this to master to work on getting a jar file built. I'll still work on any changes that anyone suggests.

@dlfajen dlfajen merged commit ed697f7 into master Jan 17, 2020
return 2 * asin(sqrt(a)) * radius;
protected double distance(GeoName other, double radius) {
// distance = atan2(cross product, dot product) * radius
return atan2(cross(other), dot(other)) * radius;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, as do the cross and dot product functions 👍

Took me a second to figure out where the magnitude calculation came from but it was right too it looks like

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants