-
Notifications
You must be signed in to change notification settings - Fork 0
FTS-2266 Added reverse postal code lookup with max distance #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| setPoint(); | ||
| } | ||
|
|
||
| private void setPoint() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how the 3D point is created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume this is some kind of well-established vector algorithm... but I couldn't find the source with a cursory search. What's the objective of this point conversion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah is it just trying to normalize our 2d points into 3d vectors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of my code is identical to the GeoName.java file, which is already in the library. I couldn't find this exact math anywhere either so I was hoping that @caleboverman can look it over and we can tweak it if we want. I found some distance calculations that were somewhat similar (https://www.movable-type.co.uk/scripts/latlong.html).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My java is a little rusty, but it looks like they're using the squaredDistance to find the nearest postal code centroid? The haversine method is the preferred geo distance calculation (linked by @dlfajen) and is what we use in the distance_to_venue derived attribute.
There's also the distinct possibility that the nearest centroid is not the actual postal code that your lat/lon indicate. A better solution would be using a true reverse geocode API that relies on shapefiles to determine a postal code. That might be a future iteration...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah... I'm not finding where I found that distance equation. I'll change it to match what you've posted because that link is exactly what this other code is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I changed the distance in the max distance check to use the haversine distance instead. I'm wondering if I should do anything with the other distance checks that are in the KD tree lookups. I think he coded it like this as more of a quick approximation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah my impression was it used a quick check to narrow the results. So it’s probably ok to only use haversine once the search space has been narrowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a variation of the haversine algorithm for vectors? The (haversine) links above always just use coordinates directly so if you wanted to use that you could skip the vectorization right?
Otherwise, given the converted vectors you can do the more simple calculations based on the distance between two vectors on a sphere (given the radius) which is the function I quoted above with arctan and the cross product of the vectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to use the distance calculation using arctan(cross product, dot product). Also added a junit test for it.
|
I still need to add README and CONTRIBUTING files too, once I figure all that out. |
|
This is my most up-to-date CONTRIBUTING.md for reference if you want https://github.com/sporting-innovations/pipeline-uphoria-processors/blob/master/CONTRIBUTING.md This was the only library-ish README that I've made so far... most of them have been for applications which require a lot of configurations. https://github.com/sporting-innovations/bigquery-utils/blob/master/README.md .... for this project is the input file configurable? And the CHANGELOG is straightforward enough, just need to copy the headers. https://github.com/sporting-innovations/bigquery-utils/blob/master/CHANGELOG.md It might make sense to use the original forked version with all of these metadata pieces added as maybe version 0.0.1, then you can add all of your updates here as part of 1.0.0 release.... though I guess I don't know if you can do that reasonably with the direct fork. If not, since you're changing the namespace I think your first release here can still be version 1.0.0 |
|
So I didn't realize that forked repositories can't be private, but @jbkc85 pointed out that this sporting-innovations/OfflineReverseGeocode repository that I created from a public repository is now also public. Do we think that will be an issue? There isn't anything in the code that is specific to FTS, it's all just math that you can get off the internet. The only thing I can see that I might want to change is the unit test specifying the FTS location. I could call it Kansas City instead. |
|
@dlfajen I have no issues with it being a public repository. In fact I still suggest you PR back to the original project. |
|
Thanks! I plan on doing that once we get it all finalized here. |
|
Merging this to master to work on getting a jar file built. I'll still work on any changes that anyone suggests. |
| return 2 * asin(sqrt(a)) * radius; | ||
| protected double distance(GeoName other, double radius) { | ||
| // distance = atan2(cross product, dot product) * radius | ||
| return atan2(cross(other), dot(other)) * radius; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, as do the cross and dot product functions 👍
Took me a second to figure out where the magnitude calculation came from but it was right too it looks like
I found this existing library (https://github.com/AReallyGoodName/OfflineReverseGeocode) that:
This allows really fast lookups from lat/lon to a GeoNames place.
I added onto this library by:
This library will be used by the reference-data-sdk project to create a "ReversePostalCodeService" which will be used within pipeline-uphoria-processors to lookup the postal code after calculating the mobile location (lat/lon). The "allCountries.txt" file will be loaded into memory (takes ~ 3 seconds locally) within the pipeline-uphoria-processors server once and then used for lookups until the server is restarted. I am still finalizing the code for the other projects.