Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible solution for issue #154 (Geolocation to Country transformer) #196

Conversation

ybh6336
Copy link

@ybh6336 ybh6336 commented Dec 16, 2018

Uses reverse geolocation lookup code at https://github.com/AReallyGoodName/OfflineReverseGeocode

Related issues
#154

Describe the proposed solution
The proposed solution, implemented in com.salesforce.op.stages.impl.feature.GeolocationToCountryTransformer, uses latitude and longitude of Geolocation to look up the country code. It uses an offline reverse geocode lookup code at https://github.com/AReallyGoodName/OfflineReverseGeocode. The database for offline lookup is in file core/src/main/resources/geolocation/cities1000.txt. This file has been downloaded from http://download.geonames.org/export/dump/.
The companion object, GeolocationToCountryTransformer, loads the file once, which is then referenced as a static field.

Describe alternatives you've considered
An alternative approach is to perform a reverse lookup using a remote service, like the ones listed below:
http://www.geonames.org/export/web-services.html#countrysubdiv
https://developers.google.com/maps/documentation/geocoding/start?csw=1#ReverseGeocoding
Drawbacks:

  • needs API license
  • latency
  • would need a fallback offline approach

Additional context
None

@salesforce-cla
Copy link

Thanks for the contribution! Unfortunately we can't verify the commit author(s): Yogesh <y***@s***.com>. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request.

@salesforce-cla
Copy link

Thanks for the contribution! It looks like @ybh6336 is an internal user so signing the CLA is not required. However, we need to confirm this.

@ybh6336
Copy link
Author

ybh6336 commented Dec 16, 2018

Unfortunately, https://github.com/AReallyGoodName/OfflineReverseGeocode has not been published to maven central, so had to copy code here (with copyright header and link to the original repository).

@codecov
Copy link

codecov bot commented Dec 16, 2018

Codecov Report

Merging #196 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #196      +/-   ##
==========================================
- Coverage   86.36%   86.35%   -0.02%     
==========================================
  Files         311      311              
  Lines       10137    10141       +4     
  Branches      565      564       -1     
==========================================
+ Hits         8755     8757       +2     
- Misses       1382     1384       +2
Impacted Files Coverage Δ
...in/scala/com/salesforce/op/cli/CommandParser.scala 98.11% <100%> (+0.03%) ⬆️
...cala/com/salesforce/op/cli/gen/ProblemSchema.scala 96.55% <100%> (+0.06%) ⬆️
...src/main/scala/com/salesforce/op/cli/CliExec.scala 80% <100%> (+2.22%) ⬆️
...es/src/main/scala/com/salesforce/op/OpParams.scala 85.71% <0%> (-4.09%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dfc5eb6...5947729. Read the comment docs.

@tovbinm
Copy link
Collaborator

tovbinm commented Dec 19, 2018

@ybh6336 thank you for your contribution.

The dataset cities1000.txt is under Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0). I need to verify with our legal department if it's ok to include or not.

@tovbinm
Copy link
Collaborator

tovbinm commented Dec 19, 2018

Another thought that I had, is that we already have lucene-spatial3d library which we can use (instead of copying the code) to build an index and then query it by lat/lon coordinates. A similar discussion was found here.

It's also worth looking into https://github.com/elastic/elasticsearch for the inspiration or ideas. They have built a ton of code around lucene and spatial coordinates. For example - https://github.com/elastic/elasticsearch/search?q=geopoint&unscoped_q=geopoint

@ybh6336
Copy link
Author

ybh6336 commented Dec 19, 2018

Thanks @tovbinm. I will look into lucene-spatial3d.

@tovbinm
Copy link
Collaborator

tovbinm commented Dec 19, 2018

Also have a look on this one - https://s2geometry.io/. They might have a java interface that we could use. This is assuming we find a dataset with country borders and then we can check polygon containment for lat/lon coordinates.

@tovbinm
Copy link
Collaborator

tovbinm commented Dec 19, 2018

@ybh6336 try this one using lucene. It's as simple as this: 1. index all cities 2. query then with radius around lat/lon - https://medium.com/@janakachathuranga/indexing-geographical-data-with-lucene-6-4ade78eac1ab

@tovbinm
Copy link
Collaborator

tovbinm commented Jan 12, 2019

@ybh6336 How are you doing?

I just got back an answer from our legal department. Unfortunately we wont be able to include this dataset in our repository. Unless we find another dataset with BSD-3 or Apache 2.0 license, create our own instead.

Alternatively we can add instructions for users on how they can download and include the dataset themselves. Based on my previous experience, it would be cumbersome, error prone and would create bad user experience. So this is the least preferred option.

@vpatryshev
Copy link
Contributor

vpatryshev commented Jan 12, 2019 via email

@tovbinm
Copy link
Collaborator

tovbinm commented Feb 15, 2019

#227

@tovbinm tovbinm closed this Feb 15, 2019
@tovbinm tovbinm mentioned this pull request Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants