Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] SearchableSnapshotsIntegTests testCreateAndRestoreSearchableSnapshot failing #95970

Closed
ywangd opened this issue May 10, 2023 · 2 comments · Fixed by #95987
Closed

[CI] SearchableSnapshotsIntegTests testCreateAndRestoreSearchableSnapshot failing #95970

ywangd opened this issue May 10, 2023 · 2 comments · Fixed by #95987
Assignees
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI

Comments

@ywangd
Copy link
Member

ywangd commented May 10, 2023

This failed on my PR but it also failed on main
https://gradle-enterprise.elastic.co/s/f7u3hmmpeq4ea/tests/:x-pack:plugin:searchable-snapshots:internalClusterTest/org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests/testCreateAndRestoreSearchableSnapshot?top-execution=1

Build scan:
https://gradle-enterprise.elastic.co/s/tmgmqv3w6wqbi/tests/:x-pack:plugin:searchable-snapshots:internalClusterTest/org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests/testCreateAndRestoreSearchableSnapshot

Reproduction line:

./gradlew ':x-pack:plugin:searchable-snapshots:internalClusterTest' --tests "org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests.testCreateAndRestoreSearchableSnapshot" -Dtests.seed=F254F4CE380F344 -Dtests.locale=ko -Dtests.timezone=Pacific/Tahiti -Druntime.java=20

Applicable branches:
main

Reproduces locally?:
No

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests&tests.test=testCreateAndRestoreSearchableSnapshot

Failure excerpt:

java.lang.AssertionError: {
  "id" : 0,
  "type" : "SNAPSHOT",
  "stage" : "DONE",
  "primary" : true,
  "start_time" : "2023-05-10T02:44:59.350Z",
  "start_time_in_millis" : 1683686699350,
  "stop_time" : "2023-05-10T02:44:59.494Z",
  "stop_time_in_millis" : 1683686699494,
  "total_time" : "144ms",
  "total_time_in_millis" : 144,
  "source" : {
    "repository" : "WXIUlVPqla",
    "snapshot" : "ztjlngbijj",
    "version" : "8.9.0",
    "index" : "bzgoilgdco",
    "restoreUUID" : "_no_api_"
  },
  "target" : {
    "id" : "j9x9FzydQdGKjmyyhRU_AA",
    "host" : "127.0.0.1",
    "transport_address" : "127.0.0.1:26622",
    "ip" : "127.0.0.1",
    "name" : "node_s1"
  },
  "index" : {
    "size" : {
      "total" : "56.9kb",
      "total_in_bytes" : 58266,
      "reused" : "1kb",
      "reused_in_bytes" : 1080,
      "recovered" : "55.4kb",
      "recovered_in_bytes" : 56798,
      "recovered_from_snapshot" : "55.4kb",
      "recovered_from_snapshot_in_bytes" : 56798,
      "percent" : "99.3%"
    },
    "files" : {
      "total" : 7,
      "reused" : 3,
      "recovered" : 3,
      "percent" : "75.0%"
    },
    "total_time" : "122ms",
    "total_time_in_millis" : 122,
    "source_throttle_time" : "-1",
    "source_throttle_time_in_millis" : 0,
    "target_throttle_time" : "-1",
    "target_throttle_time_in_millis" : 0
  },
  "translog" : {
    "recovered" : 0,
    "total" : 0,
    "percent" : "100.0%",
    "total_on_start" : 0,
    "total_time" : "33ms",
    "total_time_in_millis" : 33
  },
  "verify_index" : {
    "check_index_time" : "0s",
    "check_index_time_in_millis" : 0,
    "total_time" : "0s",
    "total_time_in_millis" : 0
  }
}
Expected: <4>
     but: was <3>

  at __randomizedtesting.SeedInfo.seed([F254F4CE380F344:854638377017A2F7]:0)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
  at org.junit.Assert.assertThat(Assert.java:956)
  at org.elasticsearch.xpack.searchablesnapshots.BaseSearchableSnapshotsIntegTestCase.lambda$assertRecoveryStats$2(BaseSearchableSnapshotsIntegTestCase.java:330)
  at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1158)
  at org.elasticsearch.xpack.searchablesnapshots.BaseSearchableSnapshotsIntegTestCase.assertRecoveryStats(BaseSearchableSnapshotsIntegTestCase.java:323)
  at org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests.testCreateAndRestoreSearchableSnapshot(SearchableSnapshotsIntegTests.java:293)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
  at java.lang.reflect.Method.invoke(Method.java:578)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1623)

@ywangd ywangd added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels May 10, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label May 10, 2023
@idegtiarenko idegtiarenko self-assigned this May 10, 2023
@idegtiarenko
Copy link
Contributor

I was able to reproduce it locally with -Dtests.seed=E34FB3ED80F62A05.
git bisect points to this pr: #95891

tlrx added a commit to tlrx/elasticsearch that referenced this issue May 10, 2023
…d from cache

Before elastic#95891 a file was considered as reused in recovery if
it was fully cached in the cold persistent cache. Otherwise
the full file length was reported as recovered from the
blob store during prewarming.

While working on elastic#95891 I changed the
CachedBlobContainerIndexInput.prefetchPart()
method to be more precise on the number of bytes that were
effectively read from the blob store during prewarming.

But this obviously broke some tests (elastic#95970) because a file
cannot be partially recovered from disk and from remote. This
change restores the previous behavior with one adjustment:
the file is considered as reused if less than 50% of the file
is prewarmed from the blob store.

Relates elastic#95891
@tlrx tlrx assigned tlrx and unassigned idegtiarenko Jun 22, 2023
tlrx added a commit that referenced this issue Jun 26, 2023
…d from cache (#95987)

Before #95891 a file was considered as reused in recovery if
it was fully cached in the cold persistent cache. Otherwise
the full file length was reported as recovered from the
blob store during prewarming.

While working on #95891 I changed the
CachedBlobContainerIndexInput.prefetchPart()
method to be more precise on the number of bytes that were
effectively read from the blob store during prewarming.

But this obviously broke some tests (#95970) because a file
cannot be partially recovered from disk and from remote. This
change restores the previous behavior with one adjustment:
the file is considered as reused if the prewarm method
effectively prefetched 0 bytes.

Closes #95970
Closes #95994
tlrx added a commit to tlrx/elasticsearch that referenced this issue Jun 26, 2023
…d from cache (elastic#95987)

Before elastic#95891 a file was considered as reused in recovery if
it was fully cached in the cold persistent cache. Otherwise
the full file length was reported as recovered from the
blob store during prewarming.

While working on elastic#95891 I changed the
CachedBlobContainerIndexInput.prefetchPart()
method to be more precise on the number of bytes that were
effectively read from the blob store during prewarming.

But this obviously broke some tests (elastic#95970) because a file
cannot be partially recovered from disk and from remote. This
change restores the previous behavior with one adjustment:
the file is considered as reused if the prewarm method
effectively prefetched 0 bytes.

Closes elastic#95970
Closes elastic#95994
tlrx added a commit to tlrx/elasticsearch that referenced this issue Jun 26, 2023
…d from cache (elastic#95987)

Before elastic#95891 a file was considered as reused in recovery if
it was fully cached in the cold persistent cache. Otherwise
the full file length was reported as recovered from the
blob store during prewarming.

While working on elastic#95891 I changed the
CachedBlobContainerIndexInput.prefetchPart()
method to be more precise on the number of bytes that were
effectively read from the blob store during prewarming.

But this obviously broke some tests (elastic#95970) because a file
cannot be partially recovered from disk and from remote. This
change restores the previous behavior with one adjustment:
the file is considered as reused if the prewarm method
effectively prefetched 0 bytes.

Closes elastic#95970
Closes elastic#95994
elasticsearchmachine pushed a commit that referenced this issue Jun 26, 2023
…d from cache (#95987) (#97103)

Before #95891 a file was considered as reused in recovery if
it was fully cached in the cold persistent cache. Otherwise
the full file length was reported as recovered from the
blob store during prewarming.

While working on #95891 I changed the
CachedBlobContainerIndexInput.prefetchPart()
method to be more precise on the number of bytes that were
effectively read from the blob store during prewarming.

But this obviously broke some tests (#95970) because a file
cannot be partially recovered from disk and from remote. This
change restores the previous behavior with one adjustment:
the file is considered as reused if the prewarm method
effectively prefetched 0 bytes.

Closes #95970
Closes #95994
elasticsearchmachine pushed a commit that referenced this issue Jun 26, 2023
…d from cache (#95987) (#97102)

Before #95891 a file was considered as reused in recovery if
it was fully cached in the cold persistent cache. Otherwise
the full file length was reported as recovered from the
blob store during prewarming.

While working on #95891 I changed the
CachedBlobContainerIndexInput.prefetchPart()
method to be more precise on the number of bytes that were
effectively read from the blob store during prewarming.

But this obviously broke some tests (#95970) because a file
cannot be partially recovered from disk and from remote. This
change restores the previous behavior with one adjustment:
the file is considered as reused if the prewarm method
effectively prefetched 0 bytes.

Closes #95970
Closes #95994
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants