Fixing randomization issue in random rank retriever yaml tests #114877

pmpailis · 2024-10-16T07:09:26Z

It seems that the expected scores specified in the yaml test match when we have only 1 shard. If due to randomization we may end up with more than one shards (e.g. when using -Dtests.seed=1449F1BCAD7A1D83) then the scores differ and the test fails. So, in this PR we just ensure that we have only 1 shard in all cases.

Closes #111999

…idation

elasticsearchmachine · 2024-10-16T07:09:50Z

Pinging @elastic/search-eng (Team:SearchOrg)

elasticsearchmachine · 2024-10-16T07:09:51Z

Pinging @elastic/search-relevance (Team:Search - Relevance)

elasticsearchmachine · 2024-10-16T11:26:30Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

benwtrent · 2024-10-16T11:40:46Z

...rence/src/yamlRestTest/resources/rest-api-spec/test/inference/80_random_rerank_retriever.yml

+          settings:
+            number_of_shards: 1


@pmpailis just so you know, it is possible in the CCS test cases that you end up with effectively 2 shards which has one remote and one local.

What makes this sensitive to shard count?

The main issue is with the idf component of the query_string. Looking into explain for the first doc we have:

2 shards case:

{ "value": 0.13353139, "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:", "details": [ { "value": 3, "description": "n, number of documents containing term", "details": [] }, { "value": 3, "description": "N, total number of documents with field", "details": [] } ] },

1 shard case:

{ "value": 0.087011375, "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:", "details": [ { "value": 5, "description": "n, number of documents containing term", "details": [] }, { "value": 5, "description": "N, total number of documents with field", "details": [] } ] },

Which is why the final score computed at the end is not the same.

kderusso

Thanks for fixing this. My suggestion is to merge this fix, and if we believe there are improvements we can make to this type of testing strategy due to CCS apply them in a followup PR.

…core validation (elastic#114877)

Setting 1 shard for random_rerank_retriever tests to ensure score val…

63e48c0

…idation

pmpailis added >test Issues or PRs that are addressing/adding tests :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v9.0.0 labels Oct 16, 2024

pmpailis requested a review from kderusso October 16, 2024 07:09

benwtrent added :Search Relevance/Ranking Scoring, rescoring, rank evaluation. and removed :SearchOrg/Relevance Label for the Search (solution/org) Relevance team labels Oct 16, 2024

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Oct 16, 2024

benwtrent reviewed Oct 16, 2024

View reviewed changes

kderusso approved these changes Oct 16, 2024

View reviewed changes

pmpailis merged commit d7aa33e into elastic:main Oct 16, 2024
16 checks passed

georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Oct 25, 2024

Fixing number of shards for random_rerank_retriever tests to ensure s…

94f3f92

…core validation (elastic#114877)

jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Nov 4, 2024

Fixing number of shards for random_rerank_retriever tests to ensure s…

49df844

…core validation (elastic#114877)

pmpailis deleted the fixing_random_retriever_expected_scores branch May 27, 2025 03:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixing randomization issue in random rank retriever yaml tests #114877

Fixing randomization issue in random rank retriever yaml tests #114877

Uh oh!

pmpailis commented Oct 16, 2024

Uh oh!

elasticsearchmachine commented Oct 16, 2024

Uh oh!

elasticsearchmachine commented Oct 16, 2024

Uh oh!

elasticsearchmachine commented Oct 16, 2024

Uh oh!

benwtrent Oct 16, 2024

Uh oh!

pmpailis Oct 16, 2024

Uh oh!

kderusso left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fixing randomization issue in random rank retriever yaml tests #114877

Fixing randomization issue in random rank retriever yaml tests #114877

Uh oh!

Conversation

pmpailis commented Oct 16, 2024

Uh oh!

elasticsearchmachine commented Oct 16, 2024

Uh oh!

elasticsearchmachine commented Oct 16, 2024

Uh oh!

elasticsearchmachine commented Oct 16, 2024

Uh oh!

benwtrent Oct 16, 2024

Choose a reason for hiding this comment

Uh oh!

pmpailis Oct 16, 2024

Choose a reason for hiding this comment

Uh oh!

kderusso left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants