New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MBS-12154: Add a default sort to all cores to break equal score tiebreakers #62
base: master
Are you sure you want to change the base?
Conversation
MBS-12154 In some kinds of searches (such as "release-group arid:[mbid]"), the score of all resulting documents is constant. If there are more documents matching this result than the page size, it becomes difficult to consistently page through the results. To fix this, add an arbitrary sort after the score to break it in case that it is equal. I just chose the mbid or id for each core, as it's also the primary key and is guaranteed to be in the document and set.
According to https://stackoverflow.com/questions/10310194/how-are-results-ordered-in-solr-in-a-match-all-docs-query,
which is interesting, in that case I kind of would have expected results to be "random, but consistent", but from the bug report this isn't the case. I'm unsure why this behaviour might not be what we see. |
Hi @alastair, thank you very very very much! Could you use Sorting an incremental But in any case you are ruling out the randomness so it's really really really great for the query web services! |
Hi @jesus2099, in fact this doesn't appear to be possible - the search index for entities that have an mbid do not include the database id. Personally, I'm not sure if the ordering of the database id has any inherent meaning here, because this would represent the date that an item was added to the musicbrainz database, not any date related with the item (birth date, release date, etc). Some items don't have a relevant date anyway (area? url?) |
"Earliest-created" is a valid ordering option, but I agree that if that's not available, consistent mbid sort is a lot better than the current situation anyway. |
MBS-5636 would not be a solution for here because we need a sort order for which there are no equal values, ever. |
You know why no post May 2021 PR were merged any more? |
We're working on updating the search server (which still uses deprecated python 2) and making a release of that, but we've been struggling with issues related to it, so no releases have happened in the meantime, I think. @yvanzo could give you more details :) |
Fixes MBS-12154
In some kinds of searches (such as "release-group arid:[mbid]"), the score of all resulting documents is constant. If there are more documents matching this result than the page size, it becomes difficult to consistently page through the results.
To fix this, add an arbitrary sort after the score to break it in case that it is equal. I just chose the mbid or id for each core, as it's also the primary key and is guaranteed to be in the document and set.
Testing
I inspected
schema.xml
for each core to see if its primary key wasid
ormbid
. I think I got them right. I performed a re-import of the whole search engine and the import completed successfully, I need to perform a search on each core to ensure that I got these values correct.Deployment steps
We should be able to add this config to the cores without reindexing everything. It would involve shutting down the core, editing the relevant file inside the data directory, and then starting it back up again.