Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MBS-12154: Add a default sort to all cores to break equal score tiebreakers #62

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

alastair
Copy link
Contributor

@alastair alastair commented Sep 22, 2022

Fixes MBS-12154

In some kinds of searches (such as "release-group arid:[mbid]"), the score of all resulting documents is constant. If there are more documents matching this result than the page size, it becomes difficult to consistently page through the results.
To fix this, add an arbitrary sort after the score to break it in case that it is equal. I just chose the mbid or id for each core, as it's also the primary key and is guaranteed to be in the document and set.

Testing

I inspected schema.xml for each core to see if its primary key was id or mbid. I think I got them right. I performed a re-import of the whole search engine and the import completed successfully, I need to perform a search on each core to ensure that I got these values correct.

Deployment steps

We should be able to add this config to the cores without reindexing everything. It would involve shutting down the core, editing the relevant file inside the data directory, and then starting it back up again.

MBS-12154
In some kinds of searches (such as "release-group arid:[mbid]"), the score
of all resulting documents is constant. If there are more documents
matching this result than the page size, it becomes difficult to
consistently page through the results.
To fix this, add an arbitrary sort after the score to break it in case
that it is equal. I just chose the mbid or id for each core, as it's
also the primary key and is guaranteed to be in the document and set.
@alastair
Copy link
Contributor Author

According to https://stackoverflow.com/questions/10310194/how-are-results-ordered-in-solr-in-a-match-all-docs-query,

When two documents have the same score, Lucene sorts them by index order (the first which has been indexed first) so that running a query twice returns documents in the same order.

which is interesting, in that case I kind of would have expected results to be "random, but consistent", but from the bug report this isn't the case. I'm unsure why this behaviour might not be what we see.

@jesus2099
Copy link

jesus2099 commented Sep 23, 2022

Hi @alastair, thank you very very very much!

Could you use id asc consistently, even on the entities that have an mbid?

Sorting an incremental id has more meaning (old to young) and could be useful, rather than sorting an mbid which is random letters and digits (the order does not represent anything).

But in any case you are ruling out the randomness so it's really really really great for the query web services!

@alastair
Copy link
Contributor Author

Hi @jesus2099, in fact this doesn't appear to be possible - the search index for entities that have an mbid do not include the database id.

Personally, I'm not sure if the ordering of the database id has any inherent meaning here, because this would represent the date that an item was added to the musicbrainz database, not any date related with the item (birth date, release date, etc). Some items don't have a relevant date anyway (area? url?)
In this case, I think it would be better to wait for MBS-5636 to be able to identify a real, relevant sort order if you require it for some reason.

@reosarevok
Copy link
Member

"Earliest-created" is a valid ordering option, but I agree that if that's not available, consistent mbid sort is a lot better than the current situation anyway.

@jesus2099
Copy link

MBS-5636 would not be a solution for here because we need a sort order for which there are no equal values, ever.
So the id is super for me, but as you say it is not available, take the gid (MBID), it is supposed to be unique, too. :)

@jesus2099
Copy link

You know why no post May 2021 PR were merged any more?

@reosarevok
Copy link
Member

We're working on updating the search server (which still uses deprecated python 2) and making a release of that, but we've been struggling with issues related to it, so no releases have happened in the meantime, I think. @yvanzo could give you more details :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants