Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose reconciliation metrics via APM #102244

Merged
merged 11 commits into from Nov 16, 2023
Merged

Conversation

idegtiarenko
Copy link
Contributor

This change exposes new reconciliation metrics to APM.
Related to: ES-7295

@idegtiarenko idegtiarenko added >enhancement :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Meta label for distributed team v8.12.0 labels Nov 15, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @idegtiarenko, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@pgomulka pgomulka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment about name


unassignedShards = LongGaugeMetric.create(
meterRegistry,
"elasticsearch.allocator.desired_balance_reconciliation.unassigned_shards",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use es. prefix instead of elasticsearch. some metrics already do this. naming is under discussion, if the conclusion will be to use a different prefix we will change it

Copy link
Contributor

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only nits

/**
* Number of assigned shards during last reconciliation that are not allocated on desired node and need to be moved
*/
protected final AtomicLong undesiredAllocations = new AtomicLong();
protected final LongGaugeMetric undesiredAllocations;
private final DoubleGauge undesiredAllocationsFraction;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit why is this private and not protected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not accessed from outside. In fact it is only initialized in constructor.
I am keeping the reference to make it cleaner that we collect that metric

/**
* This wrapper allow us to record metric with APM (via LongGauge) while also access its current state via AtomicLong
*/
public record LongGaugeMetric(AtomicLong value, LongGauge gauge) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should move this to some utility folder, because it may be useful in other places in ES?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved that into org.elasticsearch.telemetry.metric. @pgomulka, please let us know if there is a better place for that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think this is good place. Thank you @idegtiarenko

import java.util.concurrent.atomic.AtomicLong;

/**
* This wrapper allow us to record metric with APM (via LongGauge) while also access its current state via AtomicLong
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* This wrapper allow us to record metric with APM (via LongGauge) while also access its current state via AtomicLong
* This wrapper allow us to record metric with APM (via {@link LongGauge}) while also access its current state via {@link AtomicLong}

Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments on naming

);
totalAllocations = LongGaugeMetric.create(
meterRegistry,
"es.allocator.desired_balance.total_allocations",
Copy link
Contributor

@ldematte ldematte Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a cumulative sum (i.e. it's never resetted), I would suggest to flip it: es.allocator.desired_balance.allocations.total
Similarly, es.allocator.desired_balance.allocations.undesired.count

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a cumulative sum. It is a last observed value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or es.allocator.desired_balance.allocations.undesired.total if (as I suppose, since you have a ratio/fraction) they are both cumulative sums

"count"
);
undesiredAllocationsFraction = meterRegistry.registerDoubleGauge(
"es.allocator.desired_balance.undesired_allocations_fraction",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting example that I will add to the guidelines, so we have a suffix for that. I think "fraction" is OK, but I would suggest "ratio" as it's more compact. So either es.allocator.desired_balance.undesired_allocations.fraction or es.allocator.desired_balance.undesired_allocations.ratio (or es.allocator.desired_balance.allocations.undesired.ratio if you go with the suggestion above)

# Conflicts:
#	server/src/main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceShardsAllocator.java
Copy link
Contributor

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@idegtiarenko
Copy link
Contributor Author

@elasticsearchmachine please run elasticsearch-ci/bwc-snapshots

@idegtiarenko
Copy link
Contributor Author

@elasticsearchmachine please run elasticsearch-ci/part-1

@idegtiarenko idegtiarenko merged commit 4474bbd into main Nov 16, 2023
15 checks passed
@idegtiarenko idegtiarenko deleted the apm_reconciliation_metrics branch November 16, 2023 16:58
andreidan pushed a commit to andreidan/elasticsearch that referenced this pull request Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement Team:Distributed Meta label for distributed team v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants