Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track node and service counts in the state store and emit them periodically as metrics #8603

Merged
merged 9 commits into from
Sep 2, 2020

Conversation

crhino
Copy link
Contributor

@crhino crhino commented Sep 2, 2020

OSS PR usage metrics.

This PR emits 3 new metrics, consul.state.nodes/consul.state.services/consul.state.service_instances, which are all gauges that track the number of registered entities in the Consul state store.

As part of this PR, we add functionality to the state store to keep track of the count of elements associated to a table. This is done by using the change tracking functionality of memdb and a new usage table. This makes it so that we do not need to iterate through the entire table just to retrieve a count of the number of nodes/service instances.

The new functionality is located in agent/consul/state/usage*.go and agent/consul/usagemetrics/usagemetrics*.go.

In addition, a number of state store methods were modified to accept interfaces, either ReadTxn or WriteTxn depending on what each function requires.

We update the usage table on Commit() by using the TrackedChanges() API
of memdb.

Track memdb changes on restore so that usage data can be compiled
Using the newly provided state store methods, we periodically emit usage
metrics from the servers.

We decided to emit these metrics from all servers, not just the leader,
because that means we do not have to care about leader election flapping
causing metrics turbulence, and it seems reasonable for each server to
emit its own view of the state, even if they should always converge
rapidly.
We add a WriteTxn interface for use in updating the usage memdb table,
with the forward-looking prospect of incrementally converting other
functions to accept interfaces.

As well, we use the ReadTxn in new usage code, and as a side effect
convert a couple of existing functions to use that interface as well.
This commit refactors the state store usage code to track unique service
name changes on transaction commit. This means we only need to lookup
usage entries when reading the information, as opposed to iterating over
a large number of service indices.

- Take into account a service instance's name being changed
- Do not iterate through entire list of service instances, we only care
about whether there is 0, 1, or more than 1.
This is below the 10 second interval that lib/telemetry.go implements as
its aggregation interval, ensuring that we always report these metrics.
Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants