Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deterministic score tiebreaker #130

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

missinglink
Copy link
Member

the current scoring algorithm sorts documents with the exact same score in a non-deterministic way.
this makes the tests brittle and jittery, this PR aims to resolve this by adding a second sorting condition to 'break the tie'.

@emacgillavry
Copy link

@missinglink hitting this wall in the case when addresses along a street are returned. I'd expect these to be available in numeric order (Dorpstraat 1, Dorpstraat 2, Dorpstraat 3, Dorpstraat 4...), but somehow Dorpstraat 3 only shows up later in the results. The _id field would depend on the insert order?

@missinglink
Copy link
Member Author

missinglink commented Nov 23, 2022

Hi @emacgillavry, in the case where results have the exact same score then the order of results is non-deterministic.

It seems that the order is consistent for the same build but inconsistent between builds, I believe this is because of the internal segment sequence assigned to each document rater than the _id.

The linked PR adds _id as a second sorting condition with the aim of making scoring deterministic between builds, but the problem is that any field used for scoring would need the doc_values option enabled.

Doc values take up a fair bit of RAM and since the source-id field would have few duplicates, it wouldn't lend itself to compression and therefore take a lot of RAM.

Using _id also wouldn't solve your specific issue, but using the address house number field in DESC sorting should work.

I don't have the bandwidth right now to do the memory and performance testing required to change this, but hopefully that helps to explain what's going on.

@missinglink
Copy link
Member Author

I'd be interested to see the query you're using to test and what other geocoding engines do, I'm not sure sorting DESC is actually the best idea, some engines seem to show them in order of importance, so prominent address on the street (such as businesses) come first in results

@emacgillavry
Copy link

Thnx @missinglink for your explanation! Sorry for having high-jacked this issue. We're simply searching (autocomplete and search) addresses within a locality (&boundary.gid=whosonfirst:locality:), that we've imported using the OpenAddresses importer. Boosting some business addresses based on popularity would be an added benefit. In case these are just residential addresses, we'd like to show these in descending order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants