hot restart: propagate programmatic stat tags across restart#45674
hot restart: propagate programmatic stat tags across restart#45674bpalermo wants to merge 3 commits into
Conversation
|
Hi @bpalermo, welcome and thank you for your contribution. We will try to review your Pull Request as quickly as possible. In the meantime, please take a look at the contribution guidelines if you have not done so already. |
Stats created via *FromStatNameWithTags inline their tag values into the
stat name. The hot restart stat merge re-created such stats from the name
alone with no tags, so after a restart the tag values collapsed into the
metric name and the labels went empty (e.g. as exported to Prometheus).
This resolves the long-standing "TODO(snowp): Propagate tag values during
hot restarts".
The parent now transmits the tag-extracted name and tag values for stats
that carry programmatic tags; the child re-creates them with identical
labels via new Scope::{counter,gauge}FromMergedStatName methods (default
implementations fall back to the prior name-derived behavior, so other
Scope implementations are unaffected). Guarded by
envoy.reloadable_features.hot_restart_propagate_stat_tags.
Signed-off-by: Bruno Palermo <b@palermo.dev>
0bdc37d to
481e3ea
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request addresses a bug where stats created with programmatic tags lost their labels across hot restarts. It introduces mechanisms to transmit tag-extracted names and tag values from the parent process to the child, allowing the child to correctly re-create the stats with identical labels. The review feedback focuses on minor performance and readability optimizations, including reserving space in maps and protobuf repeated fields to prevent unnecessary reallocations, and simplifying absl::Span construction by utilizing implicit conversions.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
please fix format and address ai-gen comments. |
|
/wait |
Resolves conflicts with the new tag-aware scope API (envoyproxy#45146): counterFromMergedStatName/gaugeFromMergedStatName now default to the tag-aware counterFromTaggedName/gaugeFromTaggedName (which retain tag metadata on tag-aware scopes and isolated stores), and the legacy ThreadLocalStore ScopeImpl overrides them via the tag-aware TagStatNameJoiner, so the getOrCreate*Base refactor is no longer needed. Signed-off-by: Bruno Palermo <b@palermo.dev>
…add tests - Export tag metadata from the hot restart parent only for stats whose tags the child cannot re-derive from the name via tag extraction (regex-extracted tags, including the default tag regexes, are reproduced by the child during the merge), keeping the stats transfer payload unchanged for stats without programmatic tags. Adds a nullable Store::tagProducer() accessor to support the comparison. - Reserve map/repeated-field capacity in the tag conversion paths (review feedback). - Fix spelling flagged by CI (inlines -> embeds). - New tests: parent-side gating, end-to-end tag propagation through export/merge, runtime-flag-off legacy behavior, gauge merge path, and direct coverage of the merged-stat creation methods on legacy and tag-aware scopes. Signed-off-by: Bruno Palermo <b@palermo.dev>
|
Updated — changes since the review:
cc @paul-r-gall |
Commit Message: hot restart: propagate programmatic stat tags across restart
Additional Description:
Stats created with programmatic tags (via
Scope::*FromStatNameWithTags/ the tag-aware*FromTaggedNameAPI) embed their tag values into the flat stat name while keeping a cleantagExtractedNameand a set of tags. This works on a fresh process, but breaks across a hot restart:HotRestartingParent::Internal::exportStatsToChild);tags()/tagExtractedName()are dropped.StatMerger::mergeCountersre-creates the stat from the name alone, which runs only regex tag extraction. With no matchingstats_tagsregex, the merged stat gets empty tags andtagExtractedName == the full mangled name.This is the gap behind the long-standing
// TODO(snowp): Propagate tag values during hot restartsinstat_merger.cc. Regex-extracted tags survive only because extraction re-derives them from the name; programmatic tags have nothing to re-derive from.This change actually propagates the tags:
hot_restart.proto: newTaggedMetric(tag-extracted name + repeatedTag) andcounter_tags/gauge_tagsmaps onReply.Stats.TagProduceron the flat name and skips the stat when extraction reproduces the stored tags (the common case — including all default tag regexes). The stats transfer payload is therefore unchanged except for stats that actually carry programmatic tags. A nullableStore::tagProducer()accessor (defaultnullptr; onlyThreadLocalStoreImploverrides) supports the comparison.StatMergerre-creates the stat with the original tags via newScope::counterFromMergedStatName/gaugeFromMergedStatName. Following stats: new tags friendly scope based API #45146 these have non-virtual-breaking defaults that delegate to the tag-awarecounterFromTaggedName/gaugeFromTaggedName(which retain the metadata on tag-aware scopes andIsolatedStoreImpl); the legacyThreadLocalStoreScopeImpl— which intentionally drops tag metadata on the*FromTaggedNamepath because it cannot compose tag-extracted names with its prefix in general — overrides them to honor the components, which arrive fully resolved. The full name is still reconstructed viaDynamicContext(dynamic spans preserved), so the cache key matches the stat the child independently creates — the fix removes the first-write-wins poisoning rather than working around it.Risk Level: Medium — changes stats labels after a hot restart for tag-bearing stats. Runtime guarded.
Testing: New tests:
stat_merger_test(ProgrammaticTagsSurviveMerge,ProgrammaticGaugeTagsSurviveMerge,ProgrammaticTagsLostWithoutMetadatapinning the pre-fix symptom),hot_restarting_parent_test(ExportsTagMetadataOnlyWhenNotRederivable,TagMetadataAppliedToMergedStatsend-to-end,TagMetadataIgnoredWhenRuntimeFlagDisabled), andthread_local_store_test(MergedStatNameHonorsSuppliedTagson both the legacy and tag-aware scopes). Built/run via RBE.Docs Changes: N/A
Release Notes: Added (
changelogs/current/bug_fixes/stats__hot-restart-propagate-programmatic-stat-tags.rst).Runtime guard:
envoy.reloadable_features.hot_restart_propagate_stat_tags(revert to legacy name-derived behavior).