Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] SearchableSnapshotRecoveryStateIntegrationTests testRecoveryStateRecoveredBytesMatchPhysicalCacheState failing #95994

Closed
iverase opened this issue May 10, 2023 · 4 comments · Fixed by #95987 or #97278
Assignees
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI

Comments

@iverase
Copy link
Contributor

iverase commented May 10, 2023

Build scan:
https://gradle-enterprise.elastic.co/s/w3y6276h3es4g/tests/:x-pack:plugin:searchable-snapshots:internalClusterTest/org.elasticsearch.xpack.searchablesnapshots.recovery.SearchableSnapshotRecoveryStateIntegrationTests/testRecoveryStateRecoveredBytesMatchPhysicalCacheState

Reproduction line:

./gradlew ':x-pack:plugin:searchable-snapshots:internalClusterTest' --tests "org.elasticsearch.xpack.searchablesnapshots.recovery.SearchableSnapshotRecoveryStateIntegrationTests.testRecoveryStateRecoveredBytesMatchPhysicalCacheState" -Dtests.seed=37CF1F75CEB7D168 -Dtests.locale=el-CY -Dtests.timezone=America/Indiana/Petersburg -Druntime.java=20

Applicable branches:
8.8

Reproduces locally?:
No

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.xpack.searchablesnapshots.recovery.SearchableSnapshotRecoveryStateIntegrationTests&tests.test=testRecoveryStateRecoveredBytesMatchPhysicalCacheState

Failure excerpt:

java.lang.AssertionError: Physical cache size doesn't match with recovery state data
Expected: <9744L>
     but: was <10768L>

  at __randomizedtesting.SeedInfo.seed([37CF1F75CEB7D168:792DC5E477402C60]:0)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
  at org.junit.Assert.assertThat(Assert.java:956)
  at org.elasticsearch.xpack.searchablesnapshots.recovery.SearchableSnapshotRecoveryStateIntegrationTests.testRecoveryStateRecoveredBytesMatchPhysicalCacheState(SearchableSnapshotRecoveryStateIntegrationTests.java:106)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
  at java.lang.reflect.Method.invoke(Method.java:578)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1623)

@iverase iverase added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels May 10, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label May 10, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@tlrx
Copy link
Member

tlrx commented May 10, 2023

Probably also fixed by #95987

@kingherc
Copy link
Contributor

Another one at https://gradle-enterprise.elastic.co/s/pau6gphccegsq/tests/:x-pack:plugin:searchable-snapshots:internalClusterTest/org.elasticsearch.xpack.searchablesnapshots.recovery.SearchableSnapshotRecoveryStateIntegrationTests/testRecoveryStateRecoveredBytesMatchPhysicalCacheState?top-execution=1

java.lang.AssertionError: Physical cache size doesn't match with recovery state data |  
-- | --
  | Expected: <17925L> |  
  | but: was <18859L> |  

...

REPRODUCE WITH: ./gradlew ':x-pack:plugin:searchable-snapshots:internalClusterTest' --tests "org.elasticsearch.xpack.searchablesnapshots.recovery.SearchableSnapshotRecoveryStateIntegrationTests.testRecoveryStateRecoveredBytesMatchPhysicalCacheState" -Dtests.seed=1C63BBE242E3FCBC -Dtests.locale=sr-ME -Dtests.timezone=Pacific/Fiji -Druntime.java=20

I am muting on main and 8.8. PR upcoming that you will see linked. @tlrx please consider assigning this ticket to you if your PR fixes this, and unmuting the test in main and 8.8.x, and referencing the ticket in your PR to close this.

kingherc added a commit to kingherc/elasticsearch that referenced this issue May 16, 2023
kingherc added a commit to kingherc/elasticsearch that referenced this issue May 16, 2023
@tlrx tlrx self-assigned this Jun 22, 2023
tlrx added a commit that referenced this issue Jun 26, 2023
…d from cache (#95987)

Before #95891 a file was considered as reused in recovery if
it was fully cached in the cold persistent cache. Otherwise
the full file length was reported as recovered from the
blob store during prewarming.

While working on #95891 I changed the
CachedBlobContainerIndexInput.prefetchPart()
method to be more precise on the number of bytes that were
effectively read from the blob store during prewarming.

But this obviously broke some tests (#95970) because a file
cannot be partially recovered from disk and from remote. This
change restores the previous behavior with one adjustment:
the file is considered as reused if the prewarm method
effectively prefetched 0 bytes.

Closes #95970
Closes #95994
tlrx added a commit to tlrx/elasticsearch that referenced this issue Jun 26, 2023
…d from cache (elastic#95987)

Before elastic#95891 a file was considered as reused in recovery if
it was fully cached in the cold persistent cache. Otherwise
the full file length was reported as recovered from the
blob store during prewarming.

While working on elastic#95891 I changed the
CachedBlobContainerIndexInput.prefetchPart()
method to be more precise on the number of bytes that were
effectively read from the blob store during prewarming.

But this obviously broke some tests (elastic#95970) because a file
cannot be partially recovered from disk and from remote. This
change restores the previous behavior with one adjustment:
the file is considered as reused if the prewarm method
effectively prefetched 0 bytes.

Closes elastic#95970
Closes elastic#95994
tlrx added a commit to tlrx/elasticsearch that referenced this issue Jun 26, 2023
…d from cache (elastic#95987)

Before elastic#95891 a file was considered as reused in recovery if
it was fully cached in the cold persistent cache. Otherwise
the full file length was reported as recovered from the
blob store during prewarming.

While working on elastic#95891 I changed the
CachedBlobContainerIndexInput.prefetchPart()
method to be more precise on the number of bytes that were
effectively read from the blob store during prewarming.

But this obviously broke some tests (elastic#95970) because a file
cannot be partially recovered from disk and from remote. This
change restores the previous behavior with one adjustment:
the file is considered as reused if the prewarm method
effectively prefetched 0 bytes.

Closes elastic#95970
Closes elastic#95994
elasticsearchmachine pushed a commit that referenced this issue Jun 26, 2023
…d from cache (#95987) (#97103)

Before #95891 a file was considered as reused in recovery if
it was fully cached in the cold persistent cache. Otherwise
the full file length was reported as recovered from the
blob store during prewarming.

While working on #95891 I changed the
CachedBlobContainerIndexInput.prefetchPart()
method to be more precise on the number of bytes that were
effectively read from the blob store during prewarming.

But this obviously broke some tests (#95970) because a file
cannot be partially recovered from disk and from remote. This
change restores the previous behavior with one adjustment:
the file is considered as reused if the prewarm method
effectively prefetched 0 bytes.

Closes #95970
Closes #95994
elasticsearchmachine pushed a commit that referenced this issue Jun 26, 2023
…d from cache (#95987) (#97102)

Before #95891 a file was considered as reused in recovery if
it was fully cached in the cold persistent cache. Otherwise
the full file length was reported as recovered from the
blob store during prewarming.

While working on #95891 I changed the
CachedBlobContainerIndexInput.prefetchPart()
method to be more precise on the number of bytes that were
effectively read from the blob store during prewarming.

But this obviously broke some tests (#95970) because a file
cannot be partially recovered from disk and from remote. This
change restores the previous behavior with one adjustment:
the file is considered as reused if the prewarm method
effectively prefetched 0 bytes.

Closes #95970
Closes #95994
@ywangd ywangd reopened this Jun 30, 2023
tlrx added a commit to tlrx/elasticsearch that referenced this issue Jun 30, 2023
tlrx added a commit that referenced this issue Jul 4, 2023
…97278)

Fix in #95987 wasn't complete, I did not restore all the previous behavior.

Closes #95994
tlrx added a commit to tlrx/elasticsearch that referenced this issue Jul 4, 2023
…lastic#97278)

Fix in elastic#95987 wasn't complete, I did not restore all the previous behavior.

Closes elastic#95994
tlrx added a commit to tlrx/elasticsearch that referenced this issue Jul 4, 2023
…lastic#97278)

Fix in elastic#95987 wasn't complete, I did not restore all the previous behavior.

Closes elastic#95994
elasticsearchmachine pushed a commit that referenced this issue Jul 4, 2023
…97278) (#97348)

Fix in #95987 wasn't complete, I did not restore all the previous behavior.

Closes #95994
elasticsearchmachine pushed a commit that referenced this issue Jul 4, 2023
…97278) (#97347)

Fix in #95987 wasn't complete, I did not restore all the previous behavior.

Closes #95994
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI
Projects
None yet
6 participants