Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more desired balance stats #102065

Merged
merged 13 commits into from
Nov 15, 2023

Conversation

idegtiarenko
Copy link
Contributor

This change expose amount of total and desired allocations reconciled during last reroute.

Related to: ES-7295

This change expose amount of total and desired allocations reconciled during
last reroute.
@idegtiarenko idegtiarenko added >enhancement :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Meta label for distributed team v8.12.0 labels Nov 13, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Collaborator

Hi @idegtiarenko, I've created a changelog YAML for you.

@@ -50,7 +55,10 @@ public static DesiredBalanceStats readFrom(StreamInput in) throws IOException {
in.readVLong(),
in.getTransportVersion().onOrAfter(COMPUTED_SHARD_MOVEMENTS_VERSION) ? in.readVLong() : -1,
in.readVLong(),
in.readVLong()
in.readVLong(),
in.getTransportVersion().onOrAfter(ADDITIONAL_DESIRED_BALANCE_RECONCILIATION_STATS) ? in.readVInt() : 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not -1 like the computedShardMovements above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong reason

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit I'd recommend then -1 like above, just so they are not different for no apparent reason.

Copy link
Contributor

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only some minor comments.

@@ -50,7 +55,10 @@ public static DesiredBalanceStats readFrom(StreamInput in) throws IOException {
in.readVLong(),
in.getTransportVersion().onOrAfter(COMPUTED_SHARD_MOVEMENTS_VERSION) ? in.readVLong() : -1,
in.readVLong(),
in.readVLong()
in.readVLong(),
in.getTransportVersion().onOrAfter(ADDITIONAL_DESIRED_BALANCE_RECONCILIATION_STATS) ? in.readVInt() : 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit I'd recommend then -1 like above, just so they are not different for no apparent reason.

public DesiredBalanceReconciler(ClusterSettings clusterSettings, ThreadPool threadPool) {
this.logReconciliationMetrics = new FrequencyCappedAction(threadPool);
this.logReconciliationMetrics.setMinInterval(TimeValue.timeValueMinutes(30));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit Would it make sense to re-use UNDESIRED_ALLOCATIONS_LOG_INTERVAL_SETTING or define a new setting for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is intended for internal consumption, the other one is to warn users. So I would prefer not to share implementation details.
I am also hoping to replace log based collection with APM as soon as it is available

DesiredBalanceReconciler.this.totalAllocations.set(totalAllocations);

logReconciliationMetrics.maybeExecute(
() -> logger.debug(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned that the periodic logging has the purpose of creating a dashboard. Is the debug level good then? Meaning, we'd need to enable it for specific clusters then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, we need to explicitly enable it. It is going to be disabled by default for users (to avoid adding more noise to the logs).

DesiredBalanceReconciler.this.undesiredAllocations.get()
)
.field(
"allocator.desired_balance.reconciliation.total_allocations",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add the percentage as well? because it may be easier to show in a dashboard

Copy link
Contributor

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

builder.endObject();
return builder;
}

public double undesiredAllocationsFraction() {
return (double) undesiredAllocations / totalAllocations;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit for versions before ADDITIONAL_DESIRED_BALANCE_RECONCILIATION_STATS, this will be -1/-1 = 1 = 100%. I do not think it is a problem, but if we do not want that, we could extend the logic to return 0% in case of negative values. Similar comment for logUndesiredAllocationsMetrics().

@@ -164,6 +164,7 @@ static TransportVersion def(int id) {
public static final TransportVersion UPDATE_NON_DYNAMIC_SETTINGS_ADDED = def(8_533_00_0);
public static final TransportVersion REPO_ANALYSIS_REGISTER_OP_COUNT_ADDED = def(8_534_00_0);
public static final TransportVersion ML_TRAINED_MODEL_PREFIX_STRINGS_ADDED = def(8_535_00_0);
public static final TransportVersion ADDITIONAL_DESIRED_BALANCE_RECONCILIATION_STATS = def(8_536_00_0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's now another 8_536 in main, so please re-adjust before merging.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
@idegtiarenko idegtiarenko merged commit 842e563 into elastic:main Nov 15, 2023
14 checks passed
@idegtiarenko idegtiarenko deleted the reconciliation_metrics branch November 15, 2023 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement Team:Distributed Meta label for distributed team v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants