-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more desired balance stats #102065
Add more desired balance stats #102065
Conversation
This change expose amount of total and desired allocations reconciled during last reroute.
Pinging @elastic/es-distributed (Team:Distributed) |
Hi @idegtiarenko, I've created a changelog YAML for you. |
...st/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceStatsTests.java
Show resolved
Hide resolved
...in/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceReconciler.java
Show resolved
Hide resolved
@@ -50,7 +55,10 @@ public static DesiredBalanceStats readFrom(StreamInput in) throws IOException { | |||
in.readVLong(), | |||
in.getTransportVersion().onOrAfter(COMPUTED_SHARD_MOVEMENTS_VERSION) ? in.readVLong() : -1, | |||
in.readVLong(), | |||
in.readVLong() | |||
in.readVLong(), | |||
in.getTransportVersion().onOrAfter(ADDITIONAL_DESIRED_BALANCE_RECONCILIATION_STATS) ? in.readVInt() : 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not -1 like the computedShardMovements above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit I'd recommend then -1 like above, just so they are not different for no apparent reason.
...rc/main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceStats.java
Outdated
Show resolved
Hide resolved
...de/src/yamlRestTest/resources/rest-api-spec/test/smoke_test_multinode/30_desired_balance.yml
Show resolved
Hide resolved
...in/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceReconciler.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Only some minor comments.
...de/src/yamlRestTest/resources/rest-api-spec/test/smoke_test_multinode/30_desired_balance.yml
Show resolved
Hide resolved
...in/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceReconciler.java
Show resolved
Hide resolved
@@ -50,7 +55,10 @@ public static DesiredBalanceStats readFrom(StreamInput in) throws IOException { | |||
in.readVLong(), | |||
in.getTransportVersion().onOrAfter(COMPUTED_SHARD_MOVEMENTS_VERSION) ? in.readVLong() : -1, | |||
in.readVLong(), | |||
in.readVLong() | |||
in.readVLong(), | |||
in.getTransportVersion().onOrAfter(ADDITIONAL_DESIRED_BALANCE_RECONCILIATION_STATS) ? in.readVInt() : 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit I'd recommend then -1 like above, just so they are not different for no apparent reason.
...rc/main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceStats.java
Outdated
Show resolved
Hide resolved
public DesiredBalanceReconciler(ClusterSettings clusterSettings, ThreadPool threadPool) { | ||
this.logReconciliationMetrics = new FrequencyCappedAction(threadPool); | ||
this.logReconciliationMetrics.setMinInterval(TimeValue.timeValueMinutes(30)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit Would it make sense to re-use UNDESIRED_ALLOCATIONS_LOG_INTERVAL_SETTING or define a new setting for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is intended for internal consumption, the other one is to warn users. So I would prefer not to share implementation details.
I am also hoping to replace log based collection with APM as soon as it is available
DesiredBalanceReconciler.this.totalAllocations.set(totalAllocations); | ||
|
||
logReconciliationMetrics.maybeExecute( | ||
() -> logger.debug( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned that the periodic logging has the purpose of creating a dashboard. Is the debug level good then? Meaning, we'd need to enable it for specific clusters then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, we need to explicitly enable it. It is going to be disabled by default for users (to avoid adding more noise to the logs).
DesiredBalanceReconciler.this.undesiredAllocations.get() | ||
) | ||
.field( | ||
"allocator.desired_balance.reconciliation.total_allocations", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add the percentage as well? because it may be easier to show in a dashboard
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
builder.endObject(); | ||
return builder; | ||
} | ||
|
||
public double undesiredAllocationsFraction() { | ||
return (double) undesiredAllocations / totalAllocations; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit for versions before ADDITIONAL_DESIRED_BALANCE_RECONCILIATION_STATS, this will be -1/-1 = 1 = 100%. I do not think it is a problem, but if we do not want that, we could extend the logic to return 0% in case of negative values. Similar comment for logUndesiredAllocationsMetrics()
.
@@ -164,6 +164,7 @@ static TransportVersion def(int id) { | |||
public static final TransportVersion UPDATE_NON_DYNAMIC_SETTINGS_ADDED = def(8_533_00_0); | |||
public static final TransportVersion REPO_ANALYSIS_REGISTER_OP_COUNT_ADDED = def(8_534_00_0); | |||
public static final TransportVersion ML_TRAINED_MODEL_PREFIX_STRINGS_ADDED = def(8_535_00_0); | |||
public static final TransportVersion ADDITIONAL_DESIRED_BALANCE_RECONCILIATION_STATS = def(8_536_00_0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's now another 8_536 in main, so please re-adjust before merging.
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
… a future apm integration)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
This change expose amount of total and desired allocations reconciled during last reroute.
Related to: ES-7295