Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] DiskThresholdDeciderIT testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards failing #105331

Open
iverase opened this issue Feb 9, 2024 · 9 comments
Assignees
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs medium-risk An open issue or test failure that is a medium risk to future releases Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI

Comments

@iverase
Copy link
Contributor

iverase commented Feb 9, 2024

Build scan:
https://gradle-enterprise.elastic.co/s/kxdqmnytyenuq/tests/:server:internalClusterTest/org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT/testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards

Reproduction line:

./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards" -Dtests.seed=ED5C019BB52E4B53 -Dtests.locale=is -Dtests.timezone=America/Anchorage -Druntime.java=21

Applicable branches:
main

Reproduces locally?:
No

Failure history:
Failure dashboard for org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT#testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards

Failure excerpt:

java.lang.Exception: Test abandoned because suite timeout was reached.

  at __randomizedtesting.SeedInfo.seed([ED5C019BB52E4B53]:0)

@iverase iverase added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Feb 9, 2024
@elasticsearchmachine elasticsearchmachine added blocker Team:Distributed Meta label for distributed team labels Feb 9, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@idegtiarenko
Copy link
Contributor

Oh, interesting, according to:

"TEST-DiskThresholdDeciderIT.testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards-seed#[ED5C019BB52E4B53]" ID=2402 WAITING on org.elasticsearch.action.support.PlainActionFuture$Sync@dcadc57
	at java.base@21.0.2/jdk.internal.misc.Unsafe.park(Native Method)
	- waiting on org.elasticsearch.action.support.PlainActionFuture$Sync@dcadc57
	at java.base@21.0.2/java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
	at java.base@21.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
	at java.base@21.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1099)
	at app//org.elasticsearch.action.support.PlainActionFuture$Sync.get(PlainActionFuture.java:273)
	at app//org.elasticsearch.action.support.PlainActionFuture.get(PlainActionFuture.java:93)
	at app//org.elasticsearch.client.internal.support.AbstractClient$RefCountedFuture.get(AbstractClient.java:1534)
	at app//org.elasticsearch.client.internal.support.AbstractClient$RefCountedFuture.get(AbstractClient.java:1514)
	at app//org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:45)
	at app//org.elasticsearch.action.support.PlainActionFuture.actionGet(PlainActionFuture.java:152)
	at app//org.elasticsearch.action.ActionRequestBuilder.get(ActionRequestBuilder.java:43)
	at app//org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards(DiskThresholdDeciderIT.java:204)
	at java.base@21.0.2/java.lang.invoke.LambdaForm$DMH/0x00007f2a3c300000.invokeVirtual(LambdaForm$DMH)
	at java.base@21.0.2/java.lang.invoke.LambdaForm$MH/0x00007f2a3c0a0800.invoke(LambdaForm$MH)

it was stuck waiting at this line:

final RestoreSnapshotResponse restoreSnapshotResponse = clusterAdmin().prepareRestoreSnapshot("repo", "snap")
.setWaitForCompletion(true)
.get();

@idegtiarenko
Copy link
Contributor

Output also contains:

WARNING: Uncaught exception in thread: Thread[#2574,elasticsearch[node_t0][clusterApplierService#updateTask][T#1],5,TGRP-DiskThresholdDeciderIT]
java.lang.AssertionError: 
Expected: <0>
     but: was <1>
	at __randomizedtesting.SeedInfo.seed([ED5C019BB52E4B53]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
	at org.elasticsearch.test.ESTestCase.assertThat(ESTestCase.java:2119)
	at org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.lambda$testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards$2(DiskThresholdDeciderIT.java:176)
	at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateListener(ClusterApplierService.java:560)
	at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateListeners(ClusterApplierService.java:546)
	at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:505)
	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:429)
	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:154)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:217)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:183)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

so the actual failure happened at:

internalCluster().getCurrentMasterNodeInstance(ClusterService.class).addListener(event -> {
ClusterInfoServiceUtils.refresh(clusterInfoService);
if (allowRelocations.get() == false) {
assertThat(numberOfShardsWithState(event.state().getRoutingNodes(), ShardRoutingState.RELOCATING), equalTo(0));
}
});

@arteam arteam self-assigned this Feb 9, 2024
DaveCTurner added a commit that referenced this issue Feb 12, 2024
@arteam arteam assigned idegtiarenko and unassigned arteam Feb 13, 2024
@arteam arteam added medium-risk An open issue or test failure that is a medium risk to future releases and removed blocker labels Feb 15, 2024
@idegtiarenko
Copy link
Contributor

idegtiarenko commented Mar 4, 2024

According to the recent failure info, the test created the following shards:

ShardSize[shardId=[btydusdduwrc][3], size=16919]
ShardSize[shardId=[btydusdduwrc][0], size=17886]
ShardSize[shardId=[btydusdduwrc][5], size=17972]
ShardSize[shardId=[btydusdduwrc][1], size=18195]
ShardSize[shardId=[btydusdduwrc][4], size=18198]
ShardSize[shardId=[btydusdduwrc][2], size=18472]

During the first allocation, only 5 shard had computed balance and were allocated accordingly:

[2024-03-03T17:22:30,358][TRACE][o.e.c.r.a.a.DesiredBalanceShardsAllocator] [node_t0] Reconciling desired balance: DesiredBalance[lastConvergedIndex=13, assignments={
[btydusdduwrc][0]=ShardAssignment[nodeIds=[CTSLnDyTQj2xDmatjmYTtQ], total=1, unassigned=0, ignored=0],
[btydusdduwrc][1]=ShardAssignment[nodeIds=[suT6FRFCTBiMDfiBCAU_nQ], total=1, unassigned=0, ignored=0],
[btydusdduwrc][4]=ShardAssignment[nodeIds=[suT6FRFCTBiMDfiBCAU_nQ], total=1, unassigned=0, ignored=0],
[btydusdduwrc][5]=ShardAssignment[nodeIds=[suT6FRFCTBiMDfiBCAU_nQ], total=1, unassigned=0, ignored=0],
[btydusdduwrc][2]=ShardAssignment[nodeIds=[suT6FRFCTBiMDfiBCAU_nQ], total=1, unassigned=0, ignored=0],
[btydusdduwrc][3]=ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1]
}]

Notice the smallest one is ignored. Possibly as the size was still not computed from the repository.

Later the shard balance was computed as following:

[2024-03-03T17:22:30,400][TRACE][o.e.c.r.a.a.DesiredBalanceShardsAllocator] [node_t0] Desired balance updated: DesiredBalance[lastConvergedIndex=14, assignments={[btydusdduwrc][0]=ShardAssignment[nodeIds=[CTSLnDyTQj2xDmatjmYTtQ], total=1, unassigned=0, ignored=0], [btydusdduwrc][1]=ShardAssignment[nodeIds=[suT6FRFCTBiMDfiBCAU_nQ], total=1, unassigned=0, ignored=0], [btydusdduwrc][4]=ShardAssignment[nodeIds=[suT6FRFCTBiMDfiBCAU_nQ], total=1, unassigned=0, ignored=0], [btydusdduwrc][5]=ShardAssignment[nodeIds=[suT6FRFCTBiMDfiBCAU_nQ], total=1, unassigned=0, ignored=0], [btydusdduwrc][2]=ShardAssignment[nodeIds=[suT6FRFCTBiMDfiBCAU_nQ], total=1, unassigned=0, ignored=0], [btydusdduwrc][3]=ShardAssignment[nodeIds=[CTSLnDyTQj2xDmatjmYTtQ], total=1, unassigned=0, ignored=0]}]. Diff: 
[btydusdduwrc][3]: ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1] -> ShardAssignment[nodeIds=[CTSLnDyTQj2xDmatjmYTtQ], total=1, unassigned=0, ignored=0]

This suggests that the computation failed to take into account another non-empty shard (still?) initializing on the same node.

@idegtiarenko
Copy link
Contributor

I wonder if in this case the shard has started, but corresponding information was still not available in ClusterInfo and the second allocation round was happening as if CTSLnDyTQj2xDmatjmYTtQ is empty.

@idegtiarenko
Copy link
Contributor

I checked the latest failure logs and they contain:

[2024-03-11T18:25:16,539][TRACE][o.e.c.r.a.a.DesiredBalanceComputer] [node_t0] Recomputing desired balance for [15]: DesiredBalance[lastConvergedIndex=14, assignments={[wggatuinx][2]=ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1], [wggatuinx][3]=ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1], [wggatuinx][4]=ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1], [wggatuinx][5]=ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1], [wggatuinx][0]=ShardAssignment[nodeIds=[CwlFGBJYSXywcA9Qqt25xg], total=1, unassigned=0, ignored=0], [wggatuinx][1]=ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1]}], routing_nodes:
-----node_id[CwlFGBJYSXywcA9Qqt25xg][V]
--------[wggatuinx][0], node[CwlFGBJYSXywcA9Qqt25xg], [P], recovery_source[snapshot recovery [Us0WnflFQ-eNYAoK_hOaBQ] from repo:snap/gLsztBgmR1G2j8MufCwrSQ], s[INITIALIZING], a[id=lHmaALapQ1qElJRtpQAgug], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-03-11T12:25:16.245Z], delayed=false, details[restore_source[repo/snap]], allocation_status[no_attempt]], failed_attempts[0], expected_shard_size[11386]
-----node_id[B-MnLxupT2qkXpWstJb6mA][V]
---- unassigned
--------[wggatuinx][1], node[null], [P], recovery_source[snapshot recovery [Us0WnflFQ-eNYAoK_hOaBQ] from repo:snap/gLsztBgmR1G2j8MufCwrSQ], s[UNASSIGNED], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-03-11T12:25:16.245Z], delayed=false, details[restore_source[repo/snap]], allocation_status[no_attempt]], failed_attempts[0]
--------[wggatuinx][2], node[null], [P], recovery_source[snapshot recovery [Us0WnflFQ-eNYAoK_hOaBQ] from repo:snap/gLsztBgmR1G2j8MufCwrSQ], s[UNASSIGNED], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-03-11T12:25:16.245Z], delayed=false, details[restore_source[repo/snap]], allocation_status[no_attempt]], failed_attempts[0]
--------[wggatuinx][3], node[null], [P], recovery_source[snapshot recovery [Us0WnflFQ-eNYAoK_hOaBQ] from repo:snap/gLsztBgmR1G2j8MufCwrSQ], s[UNASSIGNED], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-03-11T12:25:16.245Z], delayed=false, details[restore_source[repo/snap]], allocation_status[no_attempt]], failed_attempts[0]
--------[wggatuinx][4], node[null], [P], recovery_source[snapshot recovery [Us0WnflFQ-eNYAoK_hOaBQ] from repo:snap/gLsztBgmR1G2j8MufCwrSQ], s[UNASSIGNED], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-03-11T12:25:16.245Z], delayed=false, details[restore_source[repo/snap]], allocation_status[no_attempt]], failed_attempts[0]
--------[wggatuinx][5], node[null], [P], recovery_source[snapshot recovery [Us0WnflFQ-eNYAoK_hOaBQ] from repo:snap/gLsztBgmR1G2j8MufCwrSQ], s[UNASSIGNED], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-03-11T12:25:16.245Z], delayed=false, details[restore_source[repo/snap]], allocation_status[no_attempt]], failed_attempts[0]
, 


{
  "nodes" : {
    "CwlFGBJYSXywcA9Qqt25xg" : {
      "node_name" : "node_t2",
      "least_available" : {
        "path" : "/dev/shm/bk/bk-agent-prod-gcp-1710158680974142795/elastic/elasticsearch-periodic/server/build/testrun/internalClusterTest/temp/org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT_7FA2BD7AB6A11F7E-001/tempDir-003/node-2",
        "total_bytes" : 21632,
        "used_bytes" : 0,
        "free_bytes" : 21632,
        "free_disk_percent" : 100.0,
        "used_disk_percent" : 0.0
      },
      "most_available" : {
        "path" : "/dev/shm/bk/bk-agent-prod-gcp-1710158680974142795/elastic/elasticsearch-periodic/server/build/testrun/internalClusterTest/temp/org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT_7FA2BD7AB6A11F7E-001/tempDir-003/node-2",
        "total_bytes" : 21632,
        "used_bytes" : 0,
        "free_bytes" : 21632,
        "free_disk_percent" : 100.0,
        "used_disk_percent" : 0.0
      }
    },
    "B-MnLxupT2qkXpWstJb6mA" : {
      "node_name" : "node_t1",
      "least_available" : {
        "path" : "/dev/shm/bk/bk-agent-prod-gcp-1710158680974142795/elastic/elasticsearch-periodic/server/build/testrun/internalClusterTest/temp/org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT_7FA2BD7AB6A11F7E-001/tempDir-003/node-1",
        "total_bytes" : 63302156288,
        "used_bytes" : 9738104832,
        "free_bytes" : 53564051456,
        "free_disk_percent" : 84.6,
        "used_disk_percent" : 15.4
      },
      "most_available" : {
        "path" : "/dev/shm/bk/bk-agent-prod-gcp-1710158680974142795/elastic/elasticsearch-periodic/server/build/testrun/internalClusterTest/temp/org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT_7FA2BD7AB6A11F7E-001/tempDir-003/node-1",
        "total_bytes" : 63302156288,
        "used_bytes" : 9738104832,
        "free_bytes" : 53564051456,
        "free_disk_percent" : 84.6,
        "used_disk_percent" : 15.4
      }
    }
  },
  "shard_sizes" : {
    "[wggatuinx][0][p]_bytes" : 11386
  },
  "shard_data_set_sizes" : {
    "[wggatuinx][0]_bytes" : 11386
  },
  "shard_paths" : {
    "NodeAndShard[nodeId=CwlFGBJYSXywcA9Qqt25xg, shardId=[wggatuinx][0]]" : "/dev/shm/bk/bk-agent-prod-gcp-1710158680974142795/elastic/elasticsearch-periodic/server/build/testrun/internalClusterTest/temp/org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT_7FA2BD7AB6A11F7E-001/tempDir-003/node-2"
  },
  "reserved_sizes" : [
    {
      "node_id" : "CwlFGBJYSXywcA9Qqt25xg",
      "path" : "/dev/shm/bk/bk-agent-prod-gcp-1710158680974142795/elastic/elasticsearch-periodic/server/build/testrun/internalClusterTest/temp/org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT_7FA2BD7AB6A11F7E-001/tempDir-003/node-2",
      "total" : 0,
      "shards" : [
        "[wggatuinx][0]"
      ]
    }
  ]
}, 
^^^ cluster info is complete and correct

SnapshotShardSizeInfo{snapshotShardSizes=[[snapshot=repo:snap/gLsztBgmR1G2j8MufCwrSQ, index=[wggatuinx/neUKsrfBR2iP0GAWlUpV2g], shard=[wggatuinx][2]]=>11469, [snapshot=repo:snap/gLsztBgmR1G2j8MufCwrSQ, index=[wggatuinx/neUKsrfBR2iP0GAWlUpV2g], shard=[wggatuinx][3]]=>11869, [snapshot=repo:snap/gLsztBgmR1G2j8MufCwrSQ, index=[wggatuinx/neUKsrfBR2iP0GAWlUpV2g], shard=[wggatuinx][4]]=>11392, [snapshot=repo:snap/gLsztBgmR1G2j8MufCwrSQ, index=[wggatuinx/neUKsrfBR2iP0GAWlUpV2g], shard=[wggatuinx][5]]=>12075, [snapshot=repo:snap/gLsztBgmR1G2j8MufCwrSQ, index=[wggatuinx/neUKsrfBR2iP0GAWlUpV2g], shard=[wggatuinx][1]]=>12389]}
^^^ all snapshot sizes are known and available. They are also available for the prior allocation round

[2024-03-11T18:25:16,540][TRACE][o.e.c.r.a.a.DesiredBalanceComputer] [node_t0] [wggatuinx][0] marked shard as started (routing: [wggatuinx][0], node[CwlFGBJYSXywcA9Qqt25xg], [P], recovery_source[snapshot recovery [Us0WnflFQ-eNYAoK_hOaBQ] from repo:snap/gLsztBgmR1G2j8MufCwrSQ], s[INITIALIZING], a[id=lHmaALapQ1qElJRtpQAgug], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-03-11T12:25:16.245Z], delayed=false, details[restore_source[repo/snap]], allocation_status[no_attempt]], failed_attempts[0], expected_shard_size[11386])
^^^ allocated correctly in the prior round

[2024-03-11T18:25:16,540][TRACE][o.e.c.r.a.a.DesiredBalanceComputer] [node_t0] running delegate allocator
[2024-03-11T18:25:16,542][TRACE][o.e.c.r.a.a.DesiredBalanceComputer] [node_t0] [wggatuinx][4] marked shard as started (routing: [wggatuinx][4], node[CwlFGBJYSXywcA9Qqt25xg], [P], recovery_source[snapshot recovery [Us0WnflFQ-eNYAoK_hOaBQ] from repo:snap/gLsztBgmR1G2j8MufCwrSQ], s[INITIALIZING], a[id=-gFVMSNgRoeo1Fv53EIBhQ], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-03-11T12:25:16.245Z], delayed=false, details[restore_source[repo/snap]], allocation_status[no_attempt]], failed_attempts[0], expected_shard_size[11392])
^^^ allocated incorrectly using in the balanced allocator using all available information

[2024-03-11T18:25:16,542][TRACE][o.e.c.r.a.a.DesiredBalanceComputer] [node_t0] [wggatuinx][1] marked shard as started (routing: [wggatuinx][1], node[B-MnLxupT2qkXpWstJb6mA], [P], recovery_source[snapshot recovery [Us0WnflFQ-eNYAoK_hOaBQ] from repo:snap/gLsztBgmR1G2j8MufCwrSQ], s[INITIALIZING], a[id=Ehs2RXxxTaaYu-puiIjWOg], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-03-11T12:25:16.245Z], delayed=false, details[restore_source[repo/snap]], allocation_status[no_attempt]], failed_attempts[0], expected_shard_size[12389])
[2024-03-11T18:25:16,543][TRACE][o.e.c.r.a.a.DesiredBalanceComputer] [node_t0] [wggatuinx][2] marked shard as started (routing: [wggatuinx][2], node[B-MnLxupT2qkXpWstJb6mA], [P], recovery_source[snapshot recovery [Us0WnflFQ-eNYAoK_hOaBQ] from repo:snap/gLsztBgmR1G2j8MufCwrSQ], s[INITIALIZING], a[id=v3Yh-Ie1SEWjezhmdIiwfg], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-03-11T12:25:16.245Z], delayed=false, details[restore_source[repo/snap]], allocation_status[no_attempt]], failed_attempts[0], expected_shard_size[11469])
[2024-03-11T18:25:16,543][TRACE][o.e.c.r.a.a.DesiredBalanceComputer] [node_t0] [wggatuinx][3] marked shard as started (routing: [wggatuinx][3], node[B-MnLxupT2qkXpWstJb6mA], [P], recovery_source[snapshot recovery [Us0WnflFQ-eNYAoK_hOaBQ] from repo:snap/gLsztBgmR1G2j8MufCwrSQ], s[INITIALIZING], a[id=wz8am3sZSGm7LteXF6GN2Q], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-03-11T12:25:16.245Z], delayed=false, details[restore_source[repo/snap]], allocation_status[no_attempt]], failed_attempts[0], expected_shard_size[11869])
[2024-03-11T18:25:16,543][TRACE][o.e.c.r.a.a.DesiredBalanceComputer] [node_t0] [wggatuinx][5] marked shard as started (routing: [wggatuinx][5], node[B-MnLxupT2qkXpWstJb6mA], [P], recovery_source[snapshot recovery [Us0WnflFQ-eNYAoK_hOaBQ] from repo:snap/gLsztBgmR1G2j8MufCwrSQ], s[INITIALIZING], a[id=BJ7H_lu6QKyV8WvVZsaHRA], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-03-11T12:25:16.245Z], delayed=false, details[restore_source[repo/snap]], allocation_status[no_attempt]], failed_attempts[0], expected_shard_size[12075])
[2024-03-11T18:25:16,543][TRACE][o.e.c.r.a.a.DesiredBalanceComputer] [node_t0] Desired balance computation for [15] is still not converged after [0s] and [1] iterations
[2024-03-11T18:25:16,543][TRACE][o.e.c.r.a.a.DesiredBalanceComputer] [node_t0] running delegate allocator
[2024-03-11T18:25:16,549][DEBUG][o.e.c.r.a.a.DesiredBalanceComputer] [node_t0] Desired balance computation for [15] converged after [0s] and [2] iterations
[2024-03-11T18:25:16,550][TRACE][o.e.c.r.a.a.DesiredBalanceShardsAllocator] [node_t0] Desired balance updated: DesiredBalance[lastConvergedIndex=15, assignments={[wggatuinx][2]=ShardAssignment[nodeIds=[B-MnLxupT2qkXpWstJb6mA], total=1, unassigned=0, ignored=0], [wggatuinx][3]=ShardAssignment[nodeIds=[B-MnLxupT2qkXpWstJb6mA], total=1, unassigned=0, ignored=0], [wggatuinx][4]=ShardAssignment[nodeIds=[CwlFGBJYSXywcA9Qqt25xg], total=1, unassigned=0, ignored=0], [wggatuinx][5]=ShardAssignment[nodeIds=[B-MnLxupT2qkXpWstJb6mA], total=1, unassigned=0, ignored=0], [wggatuinx][0]=ShardAssignment[nodeIds=[CwlFGBJYSXywcA9Qqt25xg], total=1, unassigned=0, ignored=0], [wggatuinx][1]=ShardAssignment[nodeIds=[B-MnLxupT2qkXpWstJb6mA], total=1, unassigned=0, ignored=0]}]. Diff: 
[wggatuinx][2]: ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1] -> ShardAssignment[nodeIds=[B-MnLxupT2qkXpWstJb6mA], total=1, unassigned=0, ignored=0]
[wggatuinx][3]: ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1] -> ShardAssignment[nodeIds=[B-MnLxupT2qkXpWstJb6mA], total=1, unassigned=0, ignored=0]
[wggatuinx][4]: ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1] -> ShardAssignment[nodeIds=[CwlFGBJYSXywcA9Qqt25xg], total=1, unassigned=0, ignored=0]
[wggatuinx][5]: ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1] -> ShardAssignment[nodeIds=[B-MnLxupT2qkXpWstJb6mA], total=1, unassigned=0, ignored=0]
[wggatuinx][1]: ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1] -> ShardAssignment[nodeIds=[B-MnLxupT2qkXpWstJb6mA], total=1, unassigned=0, ignored=0]

@idegtiarenko
Copy link
Contributor

According to the logs from the latest failure:

Shard sizes
shard=[zoxdkcqpuouh][0]=>12166
shard=[zoxdkcqpuouh][3]=>12451
shard=[zoxdkcqpuouh][2]=>12535
shard=[zoxdkcqpuouh][4]=>12594
shard=[zoxdkcqpuouh][5]=>13006

Nodes
"r4ludfGvTh2wZ7UGP-SjWA" / "node_t1" <-- big node
"zIF6uDFXRYKPVDXFSMTRQQ" / "node_t2" <-- small node

Allocation of all shards happens in 2 rounds.
Round 1:

routing_nodes:
-----node_id[zIF6uDFXRYKPVDXFSMTRQQ][V]
-----node_id[r4ludfGvTh2wZ7UGP-SjWA][V]
---- unassigned
--------[zoxdkcqpuouh][0], node[null], [P], recovery_source[snapshot recovery [ArTTnq9aTsel6jxW7feZ6Q] from repo:snap/qM6kXprETqqT0R_Qb0Wx-g], s[UNASSIGNED], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-04-06T20:17:35.294Z], delayed=false, details[restore_source[repo/snap]], allocation_status[fetching_shard_data]], failed_attempts[0]
--------[zoxdkcqpuouh][1], node[null], [P], recovery_source[snapshot recovery [ArTTnq9aTsel6jxW7feZ6Q] from repo:snap/qM6kXprETqqT0R_Qb0Wx-g], s[UNASSIGNED], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-04-06T20:17:35.294Z], delayed=false, details[restore_source[repo/snap]], allocation_status[fetching_shard_data]], failed_attempts[0]
--------[zoxdkcqpuouh][2], node[null], [P], recovery_source[snapshot recovery [ArTTnq9aTsel6jxW7feZ6Q] from repo:snap/qM6kXprETqqT0R_Qb0Wx-g], s[UNASSIGNED], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-04-06T20:17:35.294Z], delayed=false, details[restore_source[repo/snap]], allocation_status[fetching_shard_data]], failed_attempts[0]
--------[zoxdkcqpuouh][3], node[null], [P], recovery_source[snapshot recovery [ArTTnq9aTsel6jxW7feZ6Q] from repo:snap/qM6kXprETqqT0R_Qb0Wx-g], s[UNASSIGNED], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-04-06T20:17:35.294Z], delayed=false, details[restore_source[repo/snap]], allocation_status[fetching_shard_data]], failed_attempts[0]
--------[zoxdkcqpuouh][4], node[null], [P], recovery_source[snapshot recovery [ArTTnq9aTsel6jxW7feZ6Q] from repo:snap/qM6kXprETqqT0R_Qb0Wx-g], s[UNASSIGNED], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-04-06T20:17:35.294Z], delayed=false, details[restore_source[repo/snap]], allocation_status[fetching_shard_data]], failed_attempts[0]
--------[zoxdkcqpuouh][5], node[null], [P], recovery_source[snapshot recovery [ArTTnq9aTsel6jxW7feZ6Q] from repo:snap/qM6kXprETqqT0R_Qb0Wx-g], s[UNASSIGNED], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-04-06T20:17:35.294Z], delayed=false, details[restore_source[repo/snap]], allocation_status[fetching_shard_data]], failed_attempts[0]

  "shard_sizes" : { },
  "reserved_sizes" : [ ]

SnapshotShardSizeInfo{snapshotShardSizes=[
[snapshot=repo:snap/qM6kXprETqqT0R_Qb0Wx-g, index=[zoxdkcqpuouh/ZsO0w7BsQRy3hAFwsIpZ2g], shard=[zoxdkcqpuouh][0]]=>12166, 
[snapshot=repo:snap/qM6kXprETqqT0R_Qb0Wx-g, index=[zoxdkcqpuouh/ZsO0w7BsQRy3hAFwsIpZ2g], shard=[zoxdkcqpuouh][1]]=>13187, 
[snapshot=repo:snap/qM6kXprETqqT0R_Qb0Wx-g, index=[zoxdkcqpuouh/ZsO0w7BsQRy3hAFwsIpZ2g], shard=[zoxdkcqpuouh][2]]=>12535, 
[snapshot=repo:snap/qM6kXprETqqT0R_Qb0Wx-g, index=[zoxdkcqpuouh/ZsO0w7BsQRy3hAFwsIpZ2g], shard=[zoxdkcqpuouh][5]]=>13006, 
[snapshot=repo:snap/qM6kXprETqqT0R_Qb0Wx-g, index=[zoxdkcqpuouh/ZsO0w7BsQRy3hAFwsIpZ2g], shard=[zoxdkcqpuouh][4]]=>12594, 
[snapshot=repo:snap/qM6kXprETqqT0R_Qb0Wx-g, index=[zoxdkcqpuouh/ZsO0w7BsQRy3hAFwsIpZ2g], shard=[zoxdkcqpuouh][3]]=>12451
]}

[2024-04-06T16:17:35,503][TRACE][o.e.c.r.a.a.DesiredBalanceShardsAllocator] [node_t0] Desired balance updated: DesiredBalance[lastConvergedIndex=16, assignments={
[zoxdkcqpuouh][1]=ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1],
[zoxdkcqpuouh][0]=ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1],
[zoxdkcqpuouh][5]=ShardAssignment[nodeIds=[r4ludfGvTh2wZ7UGP-SjWA], total=1, unassigned=0, ignored=0],
[zoxdkcqpuouh][4]=ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1],
[zoxdkcqpuouh][3]=ShardAssignment[nodeIds=[zIF6uDFXRYKPVDXFSMTRQQ], total=1, unassigned=0, ignored=0],
[zoxdkcqpuouh][2]=ShardAssignment[nodeIds=[r4ludfGvTh2wZ7UGP-SjWA], total=1, unassigned=0, ignored=0]
}]. Diff:
[zoxdkcqpuouh][5]: ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1] -> ShardAssignment[nodeIds=[r4ludfGvTh2wZ7UGP-SjWA], total=1, unassigned=0, ignored=0]
[zoxdkcqpuouh][3]: ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1] -> ShardAssignment[nodeIds=[zIF6uDFXRYKPVDXFSMTRQQ], total=1, unassigned=0, ignored=0]
[zoxdkcqpuouh][2]: ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1] -> ShardAssignment[nodeIds=[r4ludfGvTh2wZ7UGP-SjWA], total=1, unassigned=0, ignored=0]

Nothing out of ordinary here.

Round 2:

routing_nodes:
-----node_id[zIF6uDFXRYKPVDXFSMTRQQ][V]
--------[zoxdkcqpuouh][3], node[zIF6uDFXRYKPVDXFSMTRQQ], [P], recovery_source[snapshot recovery [ArTTnq9aTsel6jxW7feZ6Q] from repo:snap/qM6kXprETqqT0R_Qb0Wx-g], s[INITIALIZING], a[id=092Hbr2cSEakrjwApnkZow], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-04-06T20:17:35.294Z], delayed=false, details[restore_source[repo/snap]], allocation_status[no_attempt]], failed_attempts[0], expected_shard_size[12451]
-----node_id[r4ludfGvTh2wZ7UGP-SjWA][V]
--------[zoxdkcqpuouh][2], node[r4ludfGvTh2wZ7UGP-SjWA], [P], recovery_source[snapshot recovery [ArTTnq9aTsel6jxW7feZ6Q] from repo:snap/qM6kXprETqqT0R_Qb0Wx-g], s[INITIALIZING], a[id=TpE9Bw8VTAy1CJz-pDXquQ], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-04-06T20:17:35.294Z], delayed=false, details[restore_source[repo/snap]], allocation_status[no_attempt]], failed_attempts[0], expected_shard_size[12535]
--------[zoxdkcqpuouh][5], node[r4ludfGvTh2wZ7UGP-SjWA], [P], recovery_source[snapshot recovery [ArTTnq9aTsel6jxW7feZ6Q] from repo:snap/qM6kXprETqqT0R_Qb0Wx-g], s[INITIALIZING], a[id=2n_rRR9XQteq_585F1FliQ], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-04-06T20:17:35.294Z], delayed=false, details[restore_source[repo/snap]], allocation_status[no_attempt]], failed_attempts[0], expected_shard_size[13006]
---- unassigned
--------[zoxdkcqpuouh][0], node[null], [P], recovery_source[snapshot recovery [ArTTnq9aTsel6jxW7feZ6Q] from repo:snap/qM6kXprETqqT0R_Qb0Wx-g], s[UNASSIGNED], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-04-06T20:17:35.294Z], delayed=false, details[restore_source[repo/snap]], allocation_status[no_attempt]], failed_attempts[0]
--------[zoxdkcqpuouh][1], node[null], [P], recovery_source[snapshot recovery [ArTTnq9aTsel6jxW7feZ6Q] from repo:snap/qM6kXprETqqT0R_Qb0Wx-g], s[UNASSIGNED], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-04-06T20:17:35.294Z], delayed=false, details[restore_source[repo/snap]], allocation_status[no_attempt]], failed_attempts[0]
--------[zoxdkcqpuouh][4], node[null], [P], recovery_source[snapshot recovery [ArTTnq9aTsel6jxW7feZ6Q] from repo:snap/qM6kXprETqqT0R_Qb0Wx-g], s[UNASSIGNED], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2024-04-06T20:17:35.294Z], delayed=false, details[restore_source[repo/snap]], allocation_status[no_attempt]], failed_attempts[0]

  "shard_sizes" : {
    "[zoxdkcqpuouh][2][p]_bytes" : 0, <=== has reserved 12535 bytes
    "[zoxdkcqpuouh][3][p]_bytes" : 12451, <=== has reserved 0 bytes
    "[zoxdkcqpuouh][5][p]_bytes" : 0 <=== has no reserved entry
  },
  "reserved_sizes" : [
    {
      "node_id" : "zIF6uDFXRYKPVDXFSMTRQQ",
      "path" : "/dev/shm/bk/bk-agent-prod-gcp-1712433772034280571/elastic/elasticsearch-periodic/server/build/testrun/internalClusterTest/temp/org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT_E34CC9AAB7634065-001/tempDir-005/node-2",
      "total" : 0,
      "shards" : [
        "[zoxdkcqpuouh][3]"
      ]
    },
    {
      "node_id" : "r4ludfGvTh2wZ7UGP-SjWA",
      "path" : "/dev/shm/bk/bk-agent-prod-gcp-1712433772034280571/elastic/elasticsearch-periodic/server/build/testrun/internalClusterTest/temp/org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT_E34CC9AAB7634065-001/tempDir-005/node-1",
      "total" : 12535,
      "shards" : [
        "[zoxdkcqpuouh][2]"
      ]
    }
  ]
    "zIF6uDFXRYKPVDXFSMTRQQ" : {
      "node_name" : "node_t2",
      "least_available" : {
        "path" : "/dev/shm/bk/bk-agent-prod-gcp-1712433772034280571/elastic/elasticsearch-periodic/server/build/testrun/internalClusterTest/temp/org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT_E34CC9AAB7634065-001/tempDir-005/node-2",
        "total_bytes" : 22691,
        "used_bytes" : 0,
        "free_bytes" : 22691,
        "free_disk_percent" : 100.0,
        "used_disk_percent" : 0.0
      }
    },
    "r4ludfGvTh2wZ7UGP-SjWA" : {
      "node_name" : "node_t1",
      "least_available" : {
        "path" : "/dev/shm/bk/bk-agent-prod-gcp-1712433772034280571/elastic/elasticsearch-periodic/server/build/testrun/internalClusterTest/temp/org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT_E34CC9AAB7634065-001/tempDir-005/node-1",
        "total_bytes" : 63302148096,
        "used_bytes" : 9228197888,
        "free_bytes" : 54073950208,
        "free_disk_percent" : 85.4,
        "used_disk_percent" : 14.6
      },

SnapshotShardSizeInfo{snapshotShardSizes=[
[snapshot=repo:snap/qM6kXprETqqT0R_Qb0Wx-g, index=[zoxdkcqpuouh/ZsO0w7BsQRy3hAFwsIpZ2g], shard=[zoxdkcqpuouh][1]]=>13187, 
[snapshot=repo:snap/qM6kXprETqqT0R_Qb0Wx-g, index=[zoxdkcqpuouh/ZsO0w7BsQRy3hAFwsIpZ2g], shard=[zoxdkcqpuouh][4]]=>12594,
[snapshot=repo:snap/qM6kXprETqqT0R_Qb0Wx-g, index=[zoxdkcqpuouh/ZsO0w7BsQRy3hAFwsIpZ2g], shard=[zoxdkcqpuouh][0]]=>12166
]}

Desired balance updated: DesiredBalance[lastConvergedIndex=17, assignments={
[zoxdkcqpuouh][5]=ShardAssignment[nodeIds=[r4ludfGvTh2wZ7UGP-SjWA], total=1, unassigned=0, ignored=0], 
[zoxdkcqpuouh][4]=ShardAssignment[nodeIds=[r4ludfGvTh2wZ7UGP-SjWA], total=1, unassigned=0, ignored=0], 
[zoxdkcqpuouh][3]=ShardAssignment[nodeIds=[zIF6uDFXRYKPVDXFSMTRQQ], total=1, unassigned=0, ignored=0], 
[zoxdkcqpuouh][2]=ShardAssignment[nodeIds=[r4ludfGvTh2wZ7UGP-SjWA], total=1, unassigned=0, ignored=0], 
[zoxdkcqpuouh][1]=ShardAssignment[nodeIds=[r4ludfGvTh2wZ7UGP-SjWA], total=1, unassigned=0, ignored=0], 
[zoxdkcqpuouh][0]=ShardAssignment[nodeIds=[zIF6uDFXRYKPVDXFSMTRQQ], total=1, unassigned=0, ignored=0]
}]. Diff: 
[zoxdkcqpuouh][4]: ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1] -> ShardAssignment[nodeIds=[r4ludfGvTh2wZ7UGP-SjWA], total=1, unassigned=0, ignored=0]
[zoxdkcqpuouh][1]: ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1] -> ShardAssignment[nodeIds=[r4ludfGvTh2wZ7UGP-SjWA], total=1, unassigned=0, ignored=0]
[zoxdkcqpuouh][0]: ShardAssignment[nodeIds=[], total=1, unassigned=1, ignored=1] -> ShardAssignment[nodeIds=[zIF6uDFXRYKPVDXFSMTRQQ], total=1, unassigned=0, ignored=0]

The round 2 balance is calculated incorrectly as it is based on incorrect cluster info:

  • [zoxdkcqpuouh][5][p]_bytes does not have size nor reserved entry
  • zIF6uDFXRYKPVDXFSMTRQQ reports "used_bytes" : 0 even though there is one shard on it already (can be related to TestFileStore and how we mock?)

@idegtiarenko
Copy link
Contributor

I have not found any particular reason why balance was calculated in 2 round here, however I consider this to be a valid scenario.
I also do not think we have any guarantees to have a point in time snapshot for ClusterInfo that can explain the test value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs medium-risk An open issue or test failure that is a medium risk to future releases Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

5 participants