Skip to content

Conversation

pmpailis
Copy link
Contributor

It seems that the expected scores specified in the yaml test match when we have only 1 shard. If due to randomization we may end up with more than one shards (e.g. when using -Dtests.seed=1449F1BCAD7A1D83) then the scores differ and the test fails. So, in this PR we just ensure that we have only 1 shard in all cases.

Closes #111999

@pmpailis pmpailis added >test Issues or PRs that are addressing/adding tests :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v9.0.0 labels Oct 16, 2024
@pmpailis pmpailis requested a review from kderusso October 16, 2024 07:09
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

@benwtrent benwtrent added :Search Relevance/Ranking Scoring, rescoring, rank evaluation. and removed :SearchOrg/Relevance Label for the Search (solution/org) Relevance team labels Oct 16, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Oct 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Comment on lines +11 to +12
settings:
number_of_shards: 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmpailis just so you know, it is possible in the CCS test cases that you end up with effectively 2 shards which has one remote and one local.

What makes this sensitive to shard count?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue is with the idf component of the query_string. Looking into explain for the first doc we have:

  • 2 shards case:
 {
                                    "value": 0.13353139,
                                    "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                                    "details":
                                    [
                                        {
                                            "value": 3,
                                            "description": "n, number of documents containing term",
                                            "details":
                                            []
                                        },
                                        {
                                            "value": 3,
                                            "description": "N, total number of documents with field",
                                            "details":
                                            []
                                        }
                                    ]
                                },

1 shard case:

{
                                    "value": 0.087011375,
                                    "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                                    "details":
                                    [
                                        {
                                            "value": 5,
                                            "description": "n, number of documents containing term",
                                            "details":
                                            []
                                        },
                                        {
                                            "value": 5,
                                            "description": "N, total number of documents with field",
                                            "details":
                                            []
                                        }
                                    ]
                                },

Which is why the final score computed at the end is not the same.

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this. My suggestion is to merge this fix, and if we believe there are improvements we can make to this type of testing strategy due to CCS apply them in a followup PR.

@pmpailis pmpailis merged commit d7aa33e into elastic:main Oct 16, 2024
16 checks passed
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Oct 25, 2024
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Nov 4, 2024
@pmpailis pmpailis deleted the fixing_random_retriever_expected_scores branch May 27, 2025 03:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch >test Issues or PRs that are addressing/adding tests v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] InferenceRestIT test {p0=inference/80_random_rerank_retriever/Random rerank retriever predictably shuffles results} failing

4 participants