Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FrozenIndexShardTests.testRecoverFromFrozenPrimary fails with new Lucene snapshot #110898

Open
benwtrent opened this issue Jul 15, 2024 · 5 comments
Labels
low-risk An open issue or test failure that is a low risk to future releases :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team >test Issues or PRs that are addressing/adding tests >test-failure Triaged test failures from CI

Comments

@benwtrent
Copy link
Member

CI Link

https://gradle-enterprise.elastic.co/s/b77e65ulskhjg

Repro line

./gradlew ":x-pack:plugin:frozen-indices:test" --tests "org.elasticsearch.index.engine.frozen.FrozenIndexShardTests.testRecoverFromFrozenPrimary" -Dtests.seed=2B3BFA46A5FBD920 -Dtests.locale=es-BO -Dtests.timezone=America/North_Dakota/New_Salem -Druntime.java=22

Does it reproduce?

Yes

Applicable branches

lucene_snapshot

Failure history

No response

Failure excerpt

java.lang.AssertionError: org.elasticsearch.indices.recovery.RecoveryFailedException: [index][0]: Recovery failed from {ZGkPqnuVVU}{ZGkPqnuVVU}{KiQWN1HJSqa90E12uMaXeQ}{ZGkPqnuVVU}{0.0.0.0}{0.0.0.0:3}{IScdfhilmrstvw}{8.16.0}{7000099-8600000} into {bqiDOJnrrE}{bqiDOJnrrE}{7RUGdJ3vTLKGlr2T5ziOuA}{bqiDOJnrrE}{0.0.0.0}{0.0.0.0:4}{IScdfhilmrstvw}{8.16.0}{7000099-8600000}
        at __randomizedtesting.SeedInfo.seed([2B3BFA46A5FBD920:450D0E6510631AC8]:0)
        at org.elasticsearch.index.shard.IndexShardTestCase$2.onRecoveryFailure(IndexShardTestCase.java:149)
        at org.elasticsearch.indices.recovery.RecoveryTarget.notifyListener(RecoveryTarget.java:316)
        at org.elasticsearch.indices.recovery.RecoveryTarget.fail(RecoveryTarget.java:303)
        at org.elasticsearch.index.shard.IndexShardTestCase.recoverUnstartedReplica(IndexShardTestCase.java:883)
        at org.elasticsearch.index.shard.IndexShardTestCase.recoverReplica(IndexShardTestCase.java:812)
        at org.elasticsearch.index.shard.IndexShardTestCase.recoverReplica(IndexShardTestCase.java:788)
        at org.elasticsearch.index.engine.frozen.FrozenIndexShardTests.testRecoverFromFrozenPrimary(FrozenIndexShardTests.java:46)

        Caused by:
        org.elasticsearch.indices.recovery.RecoveryFailedException: [index][0]: Recovery failed from {ZGkPqnuVVU}{ZGkPqnuVVU}{KiQWN1HJSqa90E12uMaXeQ}{ZGkPqnuVVU}{0.0.0.0}{0.0.0.0:3}{IScdfhilmrstvw}{8.16.0}{7000099-8600000} into {bqiDOJnrrE}{bqiDOJnrrE}{7RUGdJ3vTLKGlr2T5ziOuA}{bqiDOJnrrE}{0.0.0.0}{0.0.0.0:4}{IScdfhilmrstvw}{8.16.0}{7000099-8600000}
            at app//org.elasticsearch.index.shard.IndexShardTestCase.recoverUnstartedReplica(IndexShardTestCase.java:883)
            ... 3 more

            Caused by:
            java.lang.WrongThreadException: Attempted access outside owning thread
                at java.base/jdk.internal.foreign.MemorySessionImpl.wrongThread(MemorySessionImpl.java:314)
                at java.base/jdk.internal.misc.ScopedMemoryAccess$ScopedAccessError.newRuntimeException(ScopedMemoryAccess.java:113)
                at java.base/jdk.internal.foreign.MemorySessionImpl.checkValidState(MemorySessionImpl.java:209)
                at java.base/jdk.internal.foreign.ConfinedSession.justClose(ConfinedSession.java:82)
                at java.base/jdk.internal.foreign.MemorySessionImpl.close(MemorySessionImpl.java:232)
                at java.base/jdk.internal.foreign.ArenaImpl.close(ArenaImpl.java:50)
                at org.apache.lucene.store.MemorySegmentIndexInput.close(MemorySegmentIndexInput.java:514)
                at org.apache.lucene.tests.store.MockIndexInputWrapper.close(MockIndexInputWrapper.java:81)
                at org.elasticsearch.core.IOUtils.close(IOUtils.java:71)
                at org.elasticsearch.core.IOUtils.close(IOUtils.java:87)
                at org.elasticsearch.core.IOUtils.close(IOUtils.java:63)
                at org.elasticsearch.indices.recovery.RecoverySourceHandler$2.close(RecoverySourceHandler.java:1426)
                at org.elasticsearch.core.IOUtils.close(IOUtils.java:71)
                at org.elasticsearch.core.IOUtils.close(IOUtils.java:87)
                at org.elasticsearch.indices.recovery.MultiChunkTransfer.onCompleted(MultiChunkTransfer.java:144)
                at org.elasticsearch.indices.recovery.MultiChunkTransfer.handleItems(MultiChunkTransfer.java:113)
                at org.elasticsearch.indices.recovery.MultiChunkTransfer$1.write(MultiChunkTransfer.java:72)
                at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:97)
                at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.drainAndProcessAndRelease(AsyncIOProcessor.java:85)
                at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.put(AsyncIOProcessor.java:73)
                at org.elasticsearch.indices.recovery.MultiChunkTransfer.addItem(MultiChunkTransfer.java:83)
                at org.elasticsearch.indices.recovery.MultiChunkTransfer.lambda$handleItems$4(MultiChunkTransfer.java:120)
                at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:249)
                at org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener.onResponse(ActionListenerImplementations.java:307)
                at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:392)
                at org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener.onResponse(ActionListenerImplementations.java:307)
                at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:392)
                at org.elasticsearch.indices.recovery.RecoveryTarget.writeFileChunk(RecoveryTarget.java:583)
                at org.elasticsearch.indices.recovery.AsyncRecoveryTarget.lambda$writeFileChunk$6(AsyncRecoveryTarget.java:118)
                at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917)
                at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
                at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
                at java.base/java.lang.Thread.run(Thread.java:1570)
@benwtrent benwtrent added >test Issues or PRs that are addressing/adding tests >test-failure Triaged test failures from CI :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. needs:risk Requires assignment of a risk label (low, medium, blocker) labels Jul 15, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label Jul 15, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@benwtrent
Copy link
Member Author

I wanted to mark as blocker, but this is only in the lucene snapshot branch. My concern is that since that branch is long lived, we will forget about this :/

@benwtrent
Copy link
Member Author

Actually, this "access from the wrong thread" is causing other tests to fail as well.

@henningandersen henningandersen added low-risk An open issue or test failure that is a low risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Jul 30, 2024
@arteam arteam self-assigned this Aug 1, 2024
@arteam arteam added :Search/Search Search-related issues that do not fall into other categories and removed :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. labels Aug 6, 2024
@elasticsearchmachine elasticsearchmachine added Team:Search Meta label for search team and removed Team:Distributed Meta label for distributed team labels Aug 6, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@arteam
Copy link
Contributor

arteam commented Aug 6, 2024

I've looked at the the test and it doesn't seem that we do anything non-standard in that test, RecoverySourceHandler just closes opened resources, including MemorySegmentIndexInput. I feel that the issue on Lucene's side in the implementation of MemorySegmentIndexInput and how it uses the foreign-memaccess API.

@arteam arteam removed their assignment Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low-risk An open issue or test failure that is a low risk to future releases :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team >test Issues or PRs that are addressing/adding tests >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

4 participants