-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] DiskThresholdDeciderIT testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards failing #105331
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
Oh, interesting, according to:
it was stuck waiting at this line: Lines 202 to 204 in a7f2e2d
|
Output also contains:
so the actual failure happened at: Lines 173 to 178 in a7f2e2d
|
According to the recent failure info, the test created the following shards:
During the first allocation, only 5 shard had computed balance and were allocated accordingly:
Notice the smallest one is ignored. Possibly as the size was still not computed from the repository. Later the shard balance was computed as following:
This suggests that the computation failed to take into account another non-empty shard (still?) initializing on the same node. |
I wonder if in this case the shard has started, but corresponding information was still not available in ClusterInfo and the second allocation round was happening as if |
I checked the latest failure logs and they contain:
|
According to the logs from the latest failure:
Allocation of all shards happens in 2 rounds.
Nothing out of ordinary here. Round 2:
The round 2 balance is calculated incorrectly as it is based on incorrect cluster info:
|
I have not found any particular reason why balance was calculated in 2 round here, however I consider this to be a valid scenario. |
Build scan:
https://gradle-enterprise.elastic.co/s/kxdqmnytyenuq/tests/:server:internalClusterTest/org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT/testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards
Reproduction line:
Applicable branches:
main
Reproduces locally?:
No
Failure history:
Failure dashboard for
org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT#testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards
Failure excerpt:
The text was updated successfully, but these errors were encountered: