New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose reconciliation metrics via APM #102244
Conversation
Hi @idegtiarenko, I've created a changelog YAML for you. |
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one comment about name
|
||
unassignedShards = LongGaugeMetric.create( | ||
meterRegistry, | ||
"elasticsearch.allocator.desired_balance_reconciliation.unassigned_shards", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use es.
prefix instead of elasticsearch. some metrics already do this. naming is under discussion, if the conclusion will be to use a different prefix we will change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only nits
/** | ||
* Number of assigned shards during last reconciliation that are not allocated on desired node and need to be moved | ||
*/ | ||
protected final AtomicLong undesiredAllocations = new AtomicLong(); | ||
protected final LongGaugeMetric undesiredAllocations; | ||
private final DoubleGauge undesiredAllocationsFraction; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit why is this private and not protected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not accessed from outside. In fact it is only initialized in constructor.
I am keeping the reference to make it cleaner that we collect that metric
/** | ||
* This wrapper allow us to record metric with APM (via LongGauge) while also access its current state via AtomicLong | ||
*/ | ||
public record LongGaugeMetric(AtomicLong value, LongGauge gauge) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should move this to some utility folder, because it may be useful in other places in ES?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved that into org.elasticsearch.telemetry.metric
. @pgomulka, please let us know if there is a better place for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I think this is good place. Thank you @idegtiarenko
import java.util.concurrent.atomic.AtomicLong; | ||
|
||
/** | ||
* This wrapper allow us to record metric with APM (via LongGauge) while also access its current state via AtomicLong |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* This wrapper allow us to record metric with APM (via LongGauge) while also access its current state via AtomicLong | |
* This wrapper allow us to record metric with APM (via {@link LongGauge}) while also access its current state via {@link AtomicLong} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments on naming
); | ||
totalAllocations = LongGaugeMetric.create( | ||
meterRegistry, | ||
"es.allocator.desired_balance.total_allocations", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a cumulative sum (i.e. it's never resetted), I would suggest to flip it: es.allocator.desired_balance.allocations.total
Similarly, es.allocator.desired_balance.allocations.undesired.count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not a cumulative sum. It is a last observed value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or es.allocator.desired_balance.allocations.undesired.total
if (as I suppose, since you have a ratio/fraction) they are both cumulative sums
"count" | ||
); | ||
undesiredAllocationsFraction = meterRegistry.registerDoubleGauge( | ||
"es.allocator.desired_balance.undesired_allocations_fraction", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting example that I will add to the guidelines, so we have a suffix for that. I think "fraction" is OK, but I would suggest "ratio" as it's more compact. So either es.allocator.desired_balance.undesired_allocations.fraction
or es.allocator.desired_balance.undesired_allocations.ratio
(or es.allocator.desired_balance.allocations.undesired.ratio
if you go with the suggestion above)
# Conflicts: # server/src/main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceShardsAllocator.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticsearchmachine please run elasticsearch-ci/bwc-snapshots |
@elasticsearchmachine please run elasticsearch-ci/part-1 |
This change exposes new reconciliation metrics to APM.
Related to: ES-7295