Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable reverse geocoding #19

Closed
christophlingg opened this issue May 1, 2014 · 19 comments
Closed

enable reverse geocoding #19

christophlingg opened this issue May 1, 2014 · 19 comments

Comments

@christophlingg
Copy link
Member

... and let other people know your feature on the project website

@christophlingg christophlingg added this to the sprint #2 milestone May 1, 2014
@g4vroche
Copy link

g4vroche commented May 7, 2014

Hi, I am currently trying to implement this on my photon powered app.
I have really descents results in term of quality, but speed disappoints me a little :it takes about 750ms to get results from Solr.

I was even wondering if querying Postgres directly would not be a better option here, although it adds a tiers on production environnement.

I will dig more into this and Solr doc this week, and I will update if I get better results.

My approach so far was to use geofilt + geodist. Any advices about that ?

@christophlingg
Copy link
Member Author

since postgis 2.0 you have very performant queries of that kind using these operators: http://workshops.boundlessgeo.com/postgis-intro/knn.html

750 ms for solr is really surprising there should be ways to do it more performant. we'll implement that in the next version of photon, which will be based on elastic search. this happens on our next sprint in a week. we'll let you know about our findings.

@yohanboniface
Copy link
Collaborator

A geodistance sort on lat,lng should do the job on ElasticSearch.

@yohanboniface
Copy link
Collaborator

Example using Java API:

MatchAllQueryBuilder query = QueryBuilders.matchAllQuery();
SearchRequestBuilder searchRequest = client.prepareSearch(INDEX);
searchRequest.setQuery(query).addSort(SortBuilders.geoDistanceSort("latlon").point(lat, lon).order(SortOrder.ASC));
searchRequest.setSize(1);

@g4vroche
Copy link

g4vroche commented May 7, 2014

I eventually found out the problem with my query and I now get results under 100ms.

Not sure the story will be relevant for ES since it's not clear for me what is provided by Lucene and what is provided by Solr / ES.

The thing is you can't just sort a *:* query using geodist(), it's way too long, at least when you have the whole OSM planet in your index, which is my case.
I tried it despite the warnings in Solr doc, and it took about 25 seconds...

The solution is to filter results first, to get the items of a given area arround the coordinate to reverse geocode.
I first used the geofilt function, which is precise but not that fast.

  • Response time > 650ms

I then switched to use bbox function, which is way faster at the cost of precision of the area boundaries, but we don't care about that in this context since we only want the nearest(s) point(s).

  • Response time < 100ms

So relevants parts of my final query are:

&fq={!bbox sfield=coordinate}
&d=3 // Km "radius"
&q={!func}geodist()&sfield=coordinate
&pt=<lat,lon here>
&sort=score+asc
&rows=5

Note that the d parameter is very important here. Response time on my stack starts exceeding 100ms at about 20km.

Also note that I return more than the first result, since from a UI point of view, I find it better to suggest near places.


Relevant Solr documentation

https://cwiki.apache.org/confluence/display/solr/Spatial+Search

Full Solr query:

select?fl=name+coordinate+street+city+country+osm_id+osm_key+osm_value&fq=name:["" TO *]&fq=-osm_key:boundary&fq={!bbox sfield=coordinate}&d=3&q={!func}geodist()&sfield=coordinate&pt=`<lat,lon>`&sort=score+asc&rows=5

Test stack to relativise results

  • Nodejs app on my laptop querying a distant Solr server (via FTTH).
  • Server is the same one I used to import Planet : Intel Xeon W3565 2.3 Ghz+ (4 cores / 8 threads), 48 GB RAM, 2 x 2 SATA software raid

@karussell
Copy link
Collaborator

This is the geo query necessary for elasticsearch:

{
  "sort": [
    {
      "_geo_distance": {
        "coordinate": [
          11.003661,
          49.598095
        ],
        "order": "asc"
      }
    }
  ],
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "geo_distance": {
          "distance": "5km",
          "coordinate": [
            11.003661,
            49.598095
          ]
        }
      }
    }
  }
}

The point has to be changed of course :). It executes in 530ms for 100km and 460ms for 1km - not much difference for my dataset (world wide feed). But avoiding the filter leads to response times of over 3000ms.

I think here again could be a pain point of edge ngram which produces too many entries and could therefor lead to slower response times.

@Svantulden
Copy link
Contributor

I am experimenting a bit with enabling reverse geocoding for my Photon installation. Is @karussell's query above still valid for the current version of ES? I am getting a SearchParseException with a nested No query registered for [sort] exception when I QUERY_AND_FETCH. I do not have much experience with ES, but maybe one of you can help me?

@christophlingg
Copy link
Member Author

hm, we updated elasticsearch from 1.1 to 1.3.1 and dynamic scripting isn't allowed anymore. for the sake of security you need to place the script in the script folder and add your file here to be copied to the right place.

I am not totally sure that is your problem, but it would make sense.

There is also a good docu for sorting by distance btw: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/sorting-by-distance.html

@karussell
Copy link
Collaborator

@christophlingg There should not be a need for dynamic scripting for this. @Svantulden are you using some external library or directly the JSON request? If external lib make sure you use it correctly: http://stackoverflow.com/a/20175407/194609

@Svantulden
Copy link
Contributor

Thank you both for your very quick reply! @karussell you were indeed right that the ES library Photon uses already adds query = { } to the json query which resulted in the parse exception.

I got reverse geocoding to work in my Photon installation with the following code:

The query:

{
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "geo_distance": {
          "distance": "5km",
          "coordinate": [
            ${lon},
            ${lat}
          ]
        }
      }
    }
}

The searching & sorting code:

SearchResponse response = client.prepareSearch("photon").setSearchType(SearchType.QUERY_AND_FETCH)
                        .setQuery(query)
                        .addSort(SortBuilders.geoDistanceSort("coordinate").point(lat, lon).order(SortOrder.ASC))
                        .setSize(1)
                        .setTimeout(TimeValue.timeValueSeconds(7))
                        .execute()
                        .actionGet();

I didn't get the sorting to work as a JSON String, but I may be doing something wrong there. The query executes in an average of 35ms, but this data is pretty irrelevant because I have a pretty small index (only NL/BE/DE/LUX and filtered tags).

The data seems pretty accurate on first glance, need to test some more though. Would you be interested if I made my (finished) reverse geocoding work public via a PR?

@karussell
Copy link
Collaborator

I think PR is always appreciated, @christophlingg can veto, of course ;)

For the Java code: this is not working and your phyton query works? Did you replaced the lat,lon parameters before passing the string to setQuery like done here and used the correct template?

@christophlingg
Copy link
Member Author

reverse geocoding is something many users will be very happy about!

If i recall right, an optimization step was necessary. It is too expensive to calculate the distance of between the location and all photon documents (currently more than 100 million). the distance calculation is not trivial and doing it that often will cause long queries -> A preselection was necessary: take all documents in a certain bbox (geo index is used to make this one performant) and take the closest within this bbox.

btw: when it comes to performance we cannot beat nominatim here. postgis has a better geoindex support than what elasticsearch has. I think they do it via geohashes.

@karussell
Copy link
Collaborator

If i recall right, an optimization step was necessary

I think thatswhy I added the filter where most of the calculation is done via quadtree not based on normal distance calculation.

we cannot beat nominatim here.
I think they do it via geohashes

I don't think ElasticSearch should be slower because of the used algorithm. And a read from a Quadtree (or similar index) should be similarly fast to fetching from a large hashmap. If they really are slower then some devs from ES will help.

@Svantulden
Copy link
Contributor

For the Java code: this is not working and your phyton query works?

Sorry, I phrased that badly. I meant that having the sort as part of the query JSON string did not work (probably because the ES library expects just a query and not a sort). So I used addSort(SortBuilders.geoDistanceSort("coordinate").point(lat, lon).order(SortOrder.ASC)) to build the sort in Java instead of JSON. The query is just a JSON query template with replaceable ${lat} & ${lon}.

I will first do some timings on the global dataset and some refactoring (working here) before I submit a PR.

@mantesat
Copy link

Any news on this? This is a much needed feature :)

@Svantulden
Copy link
Contributor

In a week I'll have enough time to submit a PR for this on the recent refactored Photon version. If anyone wants it before that, you can use the query above and write a small modification to the RequestHandler to do it yourself.

@avently
Copy link

avently commented Apr 8, 2015

Any news on this? :)

@Svantulden
Copy link
Contributor

I've made a PR with my Reverse Geocoding here: #164

@christophlingg
Copy link
Member Author

was closed by #164

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants