Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OldRepositoryAccessIT.testOldRepoAccess fails on Lucene Snapshot #107168

Closed
benwtrent opened this issue Apr 5, 2024 · 3 comments · Fixed by #107194
Closed

OldRepositoryAccessIT.testOldRepoAccess fails on Lucene Snapshot #107168

benwtrent opened this issue Apr 5, 2024 · 3 comments · Fixed by #107194
Labels
low-risk An open issue or test failure that is a low risk to future releases :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team >test-failure Triaged test failures from CI

Comments

@benwtrent
Copy link
Member

CI Link

https://gradle-enterprise.elastic.co/s/5pageamepci42

Repro line

./gradlew ':x-pack:qa:repository-old-versions:javaRestTestBeforeRestart#5_0_0' --tests "org.elasticsearch.oldrepos.OldRepositoryAccessIT.testOldRepoAccess" -Dtests.seed=73644E41C915C560 -Dtests.locale=fi-FI -Dtests.timezone=America/Matamoros -Druntime.java=22

Does it reproduce?

Yes

Applicable branches

lucene_snapshot

Failure history

No response

Failure excerpt

org.elasticsearch.oldrepos.OldRepositoryAccessIT > testOldRepoAccess FAILED
    org.elasticsearch.client.ResponseException: method [POST], host [http://127.0.0.1:36787], URI [/restored_test_index/_search], status line [HTTP/1.1 500 Internal Server Error]
    {"error":{"root_cause":[{"type":"unsupported_operation_exception","reason":"only metadata operations allowed"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"restored_test_index","node":"I2lFbGJFQhefLBNLukLBDA","reason":{"type":"unsupported_operation_exception","reason":"only metadata operations allowed"}}],"caused_by":{"type":"unsupported_operation_exception","reason":"only metadata operations allowed","caused_by":{"type":"unsupported_operation_exception","reason":"only metadata operations allowed"}}},"status":500}
        at __randomizedtesting.SeedInfo.seed([73644E41C915C560:81133B078C8C3B55]:0)
        at app//org.elasticsearch.client.RestClient.convertResponse(RestClient.java:351)
        at app//org.elasticsearch.client.RestClient.performRequest(RestClient.java:317)
        at app//org.elasticsearch.client.RestClient.performRequest(RestClient.java:292)
        at app//org.elasticsearch.oldrepos.OldRepositoryAccessIT.search(OldRepositoryAccessIT.java:500)
        at app//org.elasticsearch.oldrepos.OldRepositoryAccessIT.assertDocs(OldRepositoryAccessIT.java:415)
        at app//org.elasticsearch.oldrepos.OldRepositoryAccessIT.restoreMountAndVerify(OldRepositoryAccessIT.java:324)
        at app//org.elasticsearch.oldrepos.OldRepositoryAccessIT.beforeRestart(OldRepositoryAccessIT.java:250)
        at app//org.elasticsearch.oldrepos.OldRepositoryAccessIT.runTest(OldRepositoryAccessIT.java:109)
        at app//org.elasticsearch.oldrepos.OldRepositoryAccessIT.testOldRepoAccess(OldRepositoryAccessIT.java:77)

But there are some interesting exceptions in the logs:

[2024-04-05T16:13:23,706][WARN ][r.suppressed             ] [5-0-0-1] path: /restored_test_index/_search, params: {index=restored_test_index}, status: 500 Failed to execute phase [query], all shards failed; shardFailures {[I2lFbGJFQhefLBNLukLBDA][restored_test_index][0]: org.elasticsearch.transport.RemoteTransportException: [5-0-0-1][127.0.0.1:36661][indices:data/read/search[phase/query]]
»  Caused by: org.elasticsearch.search.query.QueryPhaseExecutionException: Query Failed [Failed to execute main query]
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.search.query.QueryPhase.addCollectorsAndSearch(QueryPhase.java:229)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.search.query.QueryPhase.executeQuery(QueryPhase.java:135)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:63)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:522)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:680)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:549)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:78)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:75)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:100)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
»       at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
»       at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
»       at java.base/java.lang.Thread.run(Thread.java:1570)
»  Caused by: java.lang.UnsupportedOperationException: only metadata operations allowed
»       at org.elasticsearch.xpack.lucene.bwc.codecs.lucene60.MetadataOnlyBKDReader.getPointTree(MetadataOnlyBKDReader.java:92)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.search.internal.ExitableDirectoryReader$ExitablePointValues.getPointTree(ExitableDirectoryReader.java:272)
»       at org.apache.lucene.core@9.11.0-snapshot-b9844481e3a/org.apache.lucene.search.comparators.NumericComparator$NumericLeafComparator.<init>(NumericComparator.java:133)
»       at org.apache.lucene.core@9.11.0-snapshot-b9844481e3a/org.apache.lucene.search.comparators.LongComparator$LongLeafComparator.<init>(LongComparator.java:66)
»       at org.apache.lucene.core@9.11.0-snapshot-b9844481e3a/org.apache.lucene.search.SortedNumericSortField$3$1.<init>(SortedNumericSortField.java:306)
»       at org.apache.lucene.core@9.11.0-snapshot-b9844481e3a/org.apache.lucene.search.SortedNumericSortField$3.getLeafComparator(SortedNumericSortField.java:306)
»       at org.apache.lucene.core@9.11.0-snapshot-b9844481e3a/org.apache.lucene.search.FieldValueHitQueue.getComparators(FieldValueHitQueue.java:181)
»       at org.apache.lucene.core@9.11.0-snapshot-b9844481e3a/org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector.<init>(TopFieldCollector.java:63)
»       at org.apache.lucene.core@9.11.0-snapshot-b9844481e3a/org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector$1.<init>(TopFieldCollector.java:197)
»       at org.apache.lucene.core@9.11.0-snapshot-b9844481e3a/org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector.getLeafCollector(TopFieldCollector.java:197)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.search.query.QueryPhaseCollector.getLeafCollector(QueryPhaseCollector.java:146)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:415)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:360)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.search.internal.ContextIndexSearcher.lambda$search$4(ContextIndexSearcher.java:345)
»       at org.apache.lucene.core@9.11.0-snapshot-b9844481e3a/org.apache.lucene.search.TaskExecutor$TaskGroup.lambda$createTask$0(TaskExecutor.java:117)
»       at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
»       ... 6 more
»  }
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:712)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:404)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:744)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:497)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:335)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:73)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.DelegatingActionListener.onFailure(DelegatingActionListener.java:31)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:53)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:634)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.transport.TransportService$UnregisterChildTransportResponseHandler.handleException(TransportService.java:1752)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1476)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1610)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1585)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:44)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:44)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.ActionRunnable.onFailure(ActionRunnable.java:151)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:28)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
»       at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
»       at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
»       at java.base/java.lang.Thread.run(Thread.java:1570)
»  Caused by: org.elasticsearch.ElasticsearchException$1: only metadata operations allowed
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:704)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:402)
»       ... 22 more
»  Caused by: java.lang.UnsupportedOperationException: only metadata operations allowed
»       at org.elasticsearch.xpack.lucene.bwc.codecs.lucene60.MetadataOnlyBKDReader.getPointTree(MetadataOnlyBKDReader.java:92)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.search.internal.ExitableDirectoryReader$ExitablePointValues.getPointTree(ExitableDirectoryReader.java:272)
»       at org.apache.lucene.core@9.11.0-snapshot-b9844481e3a/org.apache.lucene.search.comparators.NumericComparator$NumericLeafComparator.<init>(NumericComparator.java:133)
»       at org.apache.lucene.core@9.11.0-snapshot-b9844481e3a/org.apache.lucene.search.comparators.LongComparator$LongLeafComparator.<init>(LongComparator.java:66)
»       at org.apache.lucene.core@9.11.0-snapshot-b9844481e3a/org.apache.lucene.search.SortedNumericSortField$3$1.<init>(SortedNumericSortField.java:306)
»       at org.apache.lucene.core@9.11.0-snapshot-b9844481e3a/org.apache.lucene.search.SortedNumericSortField$3.getLeafComparator(SortedNumericSortField.java:306)
»       at org.apache.lucene.core@9.11.0-snapshot-b9844481e3a/org.apache.lucene.search.FieldValueHitQueue.getComparators(FieldValueHitQueue.java:181)
»       at org.apache.lucene.core@9.11.0-snapshot-b9844481e3a/org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector.<init>(TopFieldCollector.java:63)
»       at org.apache.lucene.core@9.11.0-snapshot-b9844481e3a/org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector$1.<init>(TopFieldCollector.java:197)
»       at org.apache.lucene.core@9.11.0-snapshot-b9844481e3a/org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector.getLeafCollector(TopFieldCollector.java:197)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.search.query.QueryPhaseCollector.getLeafCollector(QueryPhaseCollector.java:146)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:415)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:360)
»       at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.search.internal.ContextIndexSearcher.lambda$search$4(ContextIndexSearcher.java:345)
»       at org.apache.lucene.core@9.11.0-snapshot-b9844481e3a/org.apache.lucene.search.TaskExecutor$TaskGroup.lambda$createTask$0(TaskExecutor.java:117)
»       at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
»       ... 6 more
»
@benwtrent benwtrent added :Search/Search Search-related issues that do not fall into other categories >test-failure Triaged test failures from CI labels Apr 5, 2024
@elasticsearchmachine elasticsearchmachine added blocker Team:Search Meta label for search team labels Apr 5, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@benwtrent benwtrent added low-risk An open issue or test failure that is a low risk to future releases and removed blocker labels Apr 5, 2024
@benwtrent
Copy link
Member Author

Its a blocker for Lucene Snapshot, but I don't know how to communicate that via labels. And I don't want folks to think its a blocker for anything else.

So, low-risk for now.

@jpountz
Copy link
Contributor

jpountz commented Apr 5, 2024

This is due to a recent Lucene change, however I don't think that the Lucene change is the problem but that it just makes an existing problem visible now that the numeric comparator pull a point tree eagerly in its constructor rather than lazily. We were not seeing the problem before because we're searching over less than IndexSearcher#TOTAL_HITS_THRESHOLD so the dynamic pruning logic never kicks in.

Lucene's sorting logic assumes that if there are points, then they can be used for dynamic pruning. But this doesn't work since the old file format only supports metadata operations. A fix could consist of hiding points, e.g. by disabling them in field infos, but then the index wouldn't be able to use them for can_match phases.

jpountz added a commit to jpountz/elasticsearch that referenced this issue Apr 8, 2024
In order to know whether it can apply dynamic pruning using the points index,
Lucene simply looks at whether a field has points. Unfortunately, this doesn't
work well with our support for archive indexes, where numeric/date fields
report that they have points, but they only support metadata operations on
these points (min/max values, doc count), with the goal of quickly filtering
out such archive indexes during the `can_match` phase.

In order to address this discrepancy, dynamic pruning is now disabled when
mappings report that a field is not indexed. This works because archive indexes
automatically set `index: false` to make sure that filters run on doc values
and not points. However, this is not a great fix as this increases our reliance
on disabling dynamic pruning, which is currently marked as deprecated and
scheduled for removal in the next Lucene major. So we'll need to either add it
back to Lucene or find another approach.

Closes elastic#107168
jpountz added a commit that referenced this issue Apr 9, 2024
In order to know whether it can apply dynamic pruning using the points index,
Lucene simply looks at whether a field has points. Unfortunately, this doesn't
work well with our support for archive indexes, where numeric/date fields
report that they have points, but they only support metadata operations on
these points (min/max values, doc count), with the goal of quickly filtering
out such archive indexes during the `can_match` phase.

In order to address this discrepancy, dynamic pruning is now disabled when
mappings report that a field is not indexed. This works because archive indexes
automatically set `index: false` to make sure that filters run on doc values
and not points. However, this is not a great fix as this increases our reliance
on disabling dynamic pruning, which is currently marked as deprecated and
scheduled for removal in the next Lucene major. So we'll need to either add it
back to Lucene or find another approach.

Closes #107168
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low-risk An open issue or test failure that is a low risk to future releases :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants