proposal to new tags/labels natively metrics system#44433
Draft
wbpcode wants to merge 1 commit into
Draft
Conversation
Signed-off-by: wbpcode/wangbaiping <wbphub@gmail.com>
Member
Author
|
DON'T trying to review this PR. It's dirty and lot's of concept that haven't been verified. I will split this to reasonable pieces and test it, verify it to make it's possible to give reasonable review. Only check the description only to see whether the proposal make sense to you. |
Member
Author
|
cc @zirain |
Member
Author
|
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
See also #20289 for the broader discussion.
Envoy's current stats model starts from a flattened string name such as
cluster.foo.upstream_rq_total, and later tries to recover structure from that string by runningtag extraction rules. That model gave good compatibility with StatsD and workable Prometheus output,
but it has two fundamental problems:
.. A cluster name such aspayments.v1should be one logical value, but it is embedded in a dotted stat path and must later be recovered
heuristically.
share the same structure.
We already have
*WithTagsAPIs, but they are additive. They do not solve the main problem for thebulk of Envoy stats, which are still created from flattened names.
Goal
time instead of recovering them later.
Non-goal
Core idea
Instead of creating a stat from one opaque string and then reverse-engineering meaning from that
string, we create a stat from a structured sequence of elements.
Each metric is built from one source of truth and yields three related outputs:
The key change is that the canonical name and tags are produced directly from the structured input,
not inferred later from the full name.
StatElementStatElementis the basic building block for the new API.It can be read as:
value_name_ignore_name_true, keep the tag semantically, but do not emit the tag key into the legacy flattened name.This gives three useful forms:
Plain path element
{.value_ = "upstream_rq_total"}This contributes to both the full name and the canonical name.
Named tag
{.value_ = route_name, .name_ = well_known.route_}This contributes a tag
route=<route_name>, and also contributesroute.<route_name>to thelegacy full name.
Compatibility tag
{.value_ = cluster_name, .name_ = well_known.cluster_name_, .ignore_name_ = true}This contributes a tag
cluster_name=<cluster_name>, but the legacy name stayscluster.<cluster_name>...instead of becomingcluster.cluster_name.<cluster_name>....That last form is what lets us preserve existing flattened names while still making the tag explicit.
New scope API
The new API lets a scope carry structured prefix information instead of only a flat string prefix.
The design intent is:
createScope(std::string)remains the legacy entry point.createScope(StatElementViewSpan)is convenient for configuration-time code that starts fromstring views.
createScope(StatElementSpan)andgetOrCreate*are the structured API for code that alreadyhas interned stat names and wants to avoid flatten-then-recover behavior.
A scope created from structured elements keeps those elements as its prefix. Child scopes and child
metrics append more structured elements to that prefix. The final metric name, canonical name, and
tag set are all derived from the combined sequence.
So the new model is:
Well-known tag names
For hot-path code, the tag keys should not be raw strings.
Stats::Contextprovides interned,well-known tag names:
This keeps common tag keys centralized and avoids repeatedly constructing the same tag-name symbols.
StatElementViewstill has value for config-time or cold-path code, where string input is natural.But the long-term model is that frequently used stats should be assembled from
StatName-backedelements, not raw strings.
Cluster scope example
The cluster scope is a good example because it shows the compatibility requirement clearly.
Assume the cluster name is
payments.v1.Step 1: create the cluster scope
Semantically, that means:
"cluster"is a normal path element."payments.v1"is the value of thecluster_nametag.ignore_name_ = truesays: keep the tag, but preserve the legacy flattened path shape.At this point, the scope prefix represents:
cluster.payments.v1clustercluster_name="payments.v1"Step 2: create a metric inside that scope
Now the combined structured input is effectively:
[ {.value_ = "cluster"}, {.value_ = "payments.v1", .name_ = "cluster_name", .ignore_name_ = true}, {.value_ = "upstream_rq_total"}, ]From that, the metric becomes:
cluster.payments.v1.upstream_rq_totalcluster.upstream_rq_totalcluster_name="payments.v1"This is the important property of the design:
cluster.payments.v1.upstream_rq_total.cluster_name="payments.v1".payments.v1is carried as onelogical tagged value from the start, rather than discovered later by inspecting the string name.
Summary
The proposal is to make structured stat construction the primary API.
StatElementdescribes metric structure explicitly. The new scope APIs let that structure becarried in scope prefixes and metric suffixes. The store then derives the legacy full name,
canonical metric identity, and explicit tags from one source of truth.
The cluster scope example shows the intended outcome: keep
cluster.<name>.*exactly as it existstoday for compatibility, while also making
cluster_name=<name>explicit and reliable without anypost-hoc extraction.
Graceful migration
The new API is additive. I will add new CLI parameter or environment variable options to enable the new API for specific scopes, and then migrate call sites in those scopes at a reasonable pace. The old API will continue to work until we remove it (This will be a very long time in the future). The cluster scope will be the first migration candidate, and I will use it as a test case to validate the design and implementation of the new API before migrating other scopes.