Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase offset for decay function #296

Closed
karussell opened this issue Jan 16, 2018 · 28 comments
Closed

Increase offset for decay function #296

karussell opened this issue Jan 16, 2018 · 28 comments
Milestone

Comments

@karussell
Copy link
Collaborator

As discussed on the mailing list we need a location bias that behaves a bit more like the old functionality and one possibility could be to increase the offset so that results "far away" are still considered with an enabled location bias.

@karussell
Copy link
Collaborator Author

karussell commented Jan 17, 2018

I think the main problem is with dist*1000 in this commit 6e89723

The new method planeDistance has meter as unit and so we need dist/1000.0! See the breaking changes for ElasticSearch 5.5 here

@hbruch
Copy link
Collaborator

hbruch commented Jan 17, 2018

Good catch! Perhaps the newly introduced distance_sort=true then can be reverted, as suggested by @lonvia in #294?

However, introducing a zoomlevel dependent offset or decay still might improve user experience: If zoomed on city level, I'd have a preference for POIs and addresses near by, zoomed out with the same center coord probably not...

@karussell
Copy link
Collaborator Author

karussell commented Jan 17, 2018

Good catch!

Hmmh, it still does not fix the location bias. I'm currently experimenting with a better script like:

0.9 + 1.0 / (1.0 + dist * 0.001 / 10.0)
or
0.05 + 0.5 / (1.0 + (dist*0.001-5) / 10.0)

but this is far from easy to have a good balance and a zoomlevel dependent solution might be good where the normalization factor 10.0 can be changed :)

Perhaps the newly introduced distance_sort=true then can be reverted, as suggested by @lonvia in #294?

The default behaviour is already fixed via #297.

@karussell
Copy link
Collaborator Author

We really need better testing here. Is this possible with the geocoder-tester tool?

@hbruch
Copy link
Collaborator

hbruch commented Jan 17, 2018

For Ile de France, there seem to be few test cases.

@karussell
Copy link
Collaborator Author

Thanks, cool, that this is already possible! We have to add a few negative (where location bias has to be ignored) and positive (where location bias can be used for a better search)

@lonvia
Copy link
Collaborator

lonvia commented Jan 17, 2018

The location bias should automatically be applied soon as the lat/lon parameters are added. I don't think we need an additional parameter there. A zoom parameter is a good idea as it is something users can understand (and automatically apply from the map that is shown). Nominatim uses a viewbox parameter and then has a binary bias of inside/outside that box. But that is not ideal either. Zoom sounds better.

What do you think about setting up an instance with an explicit location bias parameter that allows experimenting a bit?

@karussell
Copy link
Collaborator Author

What do you think about setting up an instance with an explicit location bias parameter that allows experimenting a bit?

I can do that, yes. Should I make the whole formula changeable via a URL parameter for the time being?

@lonvia
Copy link
Collaborator

lonvia commented Jan 17, 2018

I can do that, yes. Should I make the whole formula changeable via a URL parameter for the time being?

Yes, I think just setting the overall number for the location bias would be good. That allows to collect a few data points and then retrofit them to a formula.

@karussell
Copy link
Collaborator Author

Now I am confused :) ... so complete access to a formula or just a distance as location bias?

@lonvia
Copy link
Collaborator

lonvia commented Jan 17, 2018

The formular computes some number, right? I would suggest to set this number directly via a paramter.

@karussell
Copy link
Collaborator Author

The formular computes some number, right? I would suggest to set this number directly via a paramter

This is not how it works as this depends on the distance to the document specific coordinate. Also scores are relative for ElasticSearch and cannot be used in an absolute manner, ie. you cannot compare scores of two different searches.

@lonvia
Copy link
Collaborator

lonvia commented Jan 17, 2018

Right. Now I see what you mean by whole formula. Would be cool but only if it is not too much work.

@karussell
Copy link
Collaborator Author

Ok, see this branch and here you can test it:

http://46.4.67.134:2322/api?q=dresden

E.g. the query erlange with a location in Berlin can have now different results, depending on the formula:

Please make sure you use the proper URL encoding %2B for +, everything else should be done properly from the browser.

@karussell
Copy link
Collaborator Author

karussell commented Jan 17, 2018

For me the formula 0.1 + 10 / (1.0 + dist *0.001 / 10.0) seems to work best. Maybe we can keep some of the parameters for debugging purposes, like

bias_offset + bias_scale / (1.0 + dist *0.001 / bias_radius)

@hbruch
Copy link
Collaborator

hbruch commented Jan 18, 2018

Additonally, I'd propose to introduce kind of elasticsearch's 'distance' offset,

bias_offset + bias_scale / (1.0 + Math.max(0, distance_offset - dist) *0.001 / bias_radius)

where distance_offset is zoom level dependent, e.g. 0.1 + 10 / (1.0 + Math.max(0, dist-1000.0) *0.001 / 10.0)

In consequence, all results closer than distance_offset would have the same (max) location boost. OSM-Wiki/Zoom_levels offers a formula, which could be used to calculate a distance_offset internally:

distance_offset = w * C * cos(lat)/2^(z+8)
where...
w is a viewport width/2 in pixels (not necessarily parameterized), e.g. 500,
C is the (equatorial) circumference of the Earth (for photon: in metres)
z is the zoom level
y is the latitude of the request coordinate.

This would boost all results in a pixel radius of e.g. 500 pixels around the map center with the same location bias, decaying would start only for coords farer away.

bias_radius should then be zoom level dependent as well, I think...

@karussell
Copy link
Collaborator Author

If we have a distance_offset we do not need a bias_radius.

distance_offset = w * C * cos(lat)/2^(z+8)

We would need to performance test this because of the two expensive operations (cos + exp). Our current formula just has multiplication

@karussell
Copy link
Collaborator Author

BTW: did you mean Math.max(0, dist - distance_offset)? as otherwise the function will be completely different

@hbruch
Copy link
Collaborator

hbruch commented Jan 19, 2018

If we have a distance_offset we do not need a bias_radius.

bias_radius would influence the decay. It somewhat corresponds to the 'scale' parameter in the ES decay fuctions.

BTW: did you mean Math.max(0, dist - distance_offset)? as otherwise the function will be completely different

Oh, yes, you are right!

We would need to performance test this because of the two expensive operations (cos + exp). Our current formula just has multiplication

The distance_offset would be calculated once per query and then passed in the formula as parameter, so performance should not be an issue, I think.

@karussell
Copy link
Collaborator Author

karussell commented Jan 19, 2018

bias_radius would influence the decay

Indeed, it influences the function differently. But 4 variables just for location bias is a bit ugly and potentially too flexible. Maybe we tune all the parameters for development and expose only distance_offset? Maybe we should name it radius like the similar parameter for reverse search or zoom_level?

The distance_offset would be calculated once per query and then passed in the formula as parameter, so performance should not be an issue, I think.

Ah, got it - makes sense :) !

@karussell
Copy link
Collaborator Author

I have created some tests for this feature here: geocoders/geocoder-tester#36

And so we use this new photon release including the proposed function with dist_offset=5 in production at GH for 2 days without complaints :D

The interesting thing is that the dist_offset does not have much influence in the results in my manual tests, the bias_* parameters do a bit more but also not that much. Probably we just need bias_offset or even decide for a fixed function for now until we have a better reason to offer more flexibility.

@hbruch
Copy link
Collaborator

hbruch commented Jan 24, 2018

Sounds good!

Concerning the dist_offset: a value of 5 (m) would treat coordinates around 5m of the given coord to have the max boost, right?

This yields a good result if zoomed in on street level. E.g. for a map centre in Kassel, a search for München returns a good POI match.
If zoomed out on e.g. zoom level 6, I'd rather expect to have the city of Munich as a best result, WDYT?

The interesting thing is that the dist_offset does not have much influence in the results in my manual tests

Which range did you use for testing? As they need to be in meter, the probably need to be rather huge to show significant effects

@karussell
Copy link
Collaborator Author

Concerning the dist_offset: a value of 5 (m) would treat coordinates around 5m of the given coord to have the max boost, right?

Ah, no I convert the distance into km before, so 5km:

String strCode = "double dist = doc['coordinate'].planeDistance(params.lat, params.lon); " +
                "double score = 0.1 + 0.5 / (1.0 + ( dist * 0.001 - " + radius + ") / 10.0); " +
                "score";

Which range did you use for testing? As they need to be in meter, the probably need to be rather huge to show significant effects

Tested also hundreds of km

@hbruch
Copy link
Collaborator

hbruch commented Jan 26, 2018

Could you try it with the Math.max call included:


String strCode = "double dist = doc['coordinate'].planeDistance(params.lat, params.lon); " +
                "double score = 0.1 + 0.5 / (1.0 + Math.max( dist * 0.001 - " + radius + ", 0) / 10.0); " +
                "score";

@karussell
Copy link
Collaborator Author

Ah, probably even this:

double score = 0.1 + 0.5 / (1.0 + Math.max( dist * 0.001, " + radius + ") / 10.0);

would be better as otherwise the close points are 0 or worse like in my case negative?

@hbruch
Copy link
Collaborator

hbruch commented Jan 26, 2018

double score = 0.1 + 0.5 / (1.0 + Math.max( dist * 0.001, " + radius + ") / 10.0);

In this case the maximum score would depend on the externally provided radius, which IMHO is hard to understand. Points closer than radius should not get a negative score, but just evaluate to maximum score:

0.1 + 0.5 / (1.0 + Math.max( -radius, 0) / 10.0) = 0.1 + 0.5 / 1.0 = 0.6
.

@karussell
Copy link
Collaborator Author

karussell commented Jan 26, 2018

Now I understand. Thanks!

Now the problem was that my new location bias tests fail for radius=5

Bautzener;51.440661;14.260983;Bautzener Allee;Germany;02977;1 => was 'Bautzener Berg' in Kamenz, so relative far away
Bahnhof;53.094488;8.806057;Lloyd-Bahnhof;Germany;28215;1 => was 'Ottersberg Bahnhof' in Ottersberg, so also a few km away

To fix this the location bias scale needed an increase from 0.5 to 1.8. Then still the Hoyerswerda test fails as there is a "Bautzener Brücke" close to the "Bautzener Allee" but the Allee is the correct one and the problem behind this is the radius parameter which makes all results in 5km more or less the same. IMO radius should be 0, please see my PR #306 for this proposal.

@karussell
Copy link
Collaborator Author

For now fixed via #306

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants