Skip to content

Strict validation for cluster.routing.allocation.balance.threshold can lead to snapshot restore failure #116558

@ywangd

Description

@ywangd

We enhanced the validation for cluster.routing.allocation.balance.threshold in #115831 so that it no longer accepts values lower than 1.0 in v9.0+. If a snapshot is taken in an old cluster where the setting has an invalid value, i.e. in the range of [0.0, 0.1), the snapshot will not be restorable in a new cluster and it generates the following exception

[2024-11-08T04:54:27,351][WARN ][o.e.s.RestoreService     ] [test-cluster-0] [repo:old_snap/ZwUuFTIlQ1-qeTsp8Cehcg] failed to restore snapshot java.lang.IllegalArgumentException: illegal value can't update [cluster.routing.allocation.balance.threshold] from [1.0] to [0.999]
	at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.settings.Setting$Updater.getValue(Setting.java:1304)
	at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.settings.AbstractScopedSettings.validateUpdate(AbstractScopedSettings.java:139)
	at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.snapshots.RestoreService$RestoreSnapshotStateTask.applyGlobalStateRestore(RestoreService.java:1546)
	at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.snapshots.RestoreService$RestoreSnapshotStateTask.execute(RestoreService.java:1477)
	at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.cluster.service.MasterService$UnbatchedExecutor.execute(MasterService.java:573)

We should re-consider the strict validation or make it possible for restore to ignore invalid cluster settings.

Relates: #115831
Relates: #116460

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions