Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] DedicatedClusterSnapshotRestoreIT testSnapshotWithStuckNode failing #101573

Closed
maxhniebergall opened this issue Oct 30, 2023 · 4 comments · Fixed by #102398
Closed

[CI] DedicatedClusterSnapshotRestoreIT testSnapshotWithStuckNode failing #101573

maxhniebergall opened this issue Oct 30, 2023 · 4 comments · Fixed by #102398
Assignees
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs low-risk An open issue or test failure that is a low risk to future releases Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI

Comments

@maxhniebergall
Copy link
Member

Build scan:
https://gradle-enterprise.elastic.co/s/n64nkpz4phazq/tests/:server:internalClusterTest/org.elasticsearch.snapshots.DedicatedClusterSnapshotRestoreIT/testSnapshotWithStuckNode
Reproduction line:

./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.snapshots.DedicatedClusterSnapshotRestoreIT.testSnapshotWithStuckNode" -Dtests.seed=DB70314A781ECF7A -Dtests.locale=vi-VN -Dtests.timezone=Turkey -Druntime.java=21

Applicable branches:
main

Reproduces locally?:
No

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.snapshots.DedicatedClusterSnapshotRestoreIT&tests.test=testSnapshotWithStuckNode

Failure excerpt:

java.lang.AssertionError: Unexpected file count, found: [[/opt/local-ssd/buildkite/builds/bk-agent-prod-gcp-1698696497056340972/elastic/elasticsearch-periodic-platform-support/server/build/testrun/internalClusterTest/temp/org.elasticsearch.snapshots.DedicatedClusterSnapshotRestoreIT_DB70314A781ECF7A-001/tempDir-015/repos/RvhRbGNjiU/index.latest, /opt/local-ssd/buildkite/builds/bk-agent-prod-gcp-1698696497056340972/elastic/elasticsearch-periodic-platform-support/server/build/testrun/internalClusterTest/temp/org.elasticsearch.snapshots.DedicatedClusterSnapshotRestoreIT_DB70314A781ECF7A-001/tempDir-015/repos/RvhRbGNjiU/index-2, /opt/local-ssd/buildkite/builds/bk-agent-prod-gcp-1698696497056340972/elastic/elasticsearch-periodic-platform-support/server/build/testrun/internalClusterTest/temp/org.elasticsearch.snapshots.DedicatedClusterSnapshotRestoreIT_DB70314A781ECF7A-001/tempDir-015/repos/RvhRbGNjiU/index-1]]. expected:<2> but was:<3>

  at __randomizedtesting.SeedInfo.seed([DB70314A781ECF7A:1095F63B45A88B15]:0)
  at org.junit.Assert.fail(Assert.java:88)
  at org.junit.Assert.failNotEquals(Assert.java:834)
  at org.junit.Assert.assertEquals(Assert.java:645)
  at org.elasticsearch.snapshots.AbstractSnapshotIntegTestCase.assertFileCount(AbstractSnapshotIntegTestCase.java:195)
  at org.elasticsearch.snapshots.DedicatedClusterSnapshotRestoreIT.testSnapshotWithStuckNode(DedicatedClusterSnapshotRestoreIT.java:208)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)

@maxhniebergall maxhniebergall added the >test-failure Triaged test failures from CI label Oct 30, 2023
@elasticsearchmachine elasticsearchmachine added blocker needs:triage Requires assignment of a team area label labels Oct 30, 2023
@kingherc kingherc added the :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Nov 1, 2023
@elasticsearchmachine elasticsearchmachine added Team:Distributed Meta label for distributed team and removed needs:triage Requires assignment of a team area label labels Nov 1, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@arteam arteam self-assigned this Nov 1, 2023
@arteam
Copy link
Contributor

arteam commented Nov 1, 2023

#43537 seems to be related

@arteam arteam added low-risk An open issue or test failure that is a low risk to future releases and removed blocker labels Nov 2, 2023
@arteam
Copy link
Contributor

arteam commented Nov 2, 2023

Lowering the prioririty since it's a just a transient issue during the cleanup.

@arteam arteam removed their assignment Nov 20, 2023
@DaveCTurner DaveCTurner self-assigned this Nov 21, 2023
@DaveCTurner
Copy link
Contributor

It's not a transient issue, but it is known (and ok) that repo cleanup leaves an extra index-N blob behind. I opened #102398 to account for that.

elasticmachine pushed a commit to gmarouli/elasticsearch that referenced this issue Nov 21, 2023
This test sometimes relies on repository cleanup to remove all but the
`index.latest` and `index-N` blobs, but in fact repo cleanup will leave
behind the `index-(N-1)` blob too. This commit relaxes the test to
account for this, but then strengthens it to assert that the blobs left
in the repo are exactly the ones we expect.

Closes elastic#101573
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jun 17, 2024
This test sometimes relies on repository cleanup to remove all but the
`index.latest` and `index-N` blobs, but in fact repo cleanup will leave
behind the `index-(N-1)` blob too. This commit relaxes the test to
account for this, but then strengthens it to assert that the blobs left
in the repo are exactly the ones we expect.

Closes elastic#101573
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs low-risk An open issue or test failure that is a low risk to future releases Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants