Skip to content

[Task]: Migrate the go.d Prometheus collector to Framework V2 (compatibility-only) #22635

@ilyam8

Description

@ilyam8

Problem / root cause

The go.d prometheus collector is still Framework V1 — registers CreateCollectorV1 and returns Collect(ctx) map[string]int64
(src/go/plugin/go.d/collector/prometheus/init.go:29, collector.go:128). V2 (metrix store, chart templates, host scopes) is required for relabeling and profiles. The durable
compatibility contract is the chart context (prometheus.<metric> / prometheus.<app>.<metric>, charts.go:262-267); chart-ID strings are not a contract. Two framework gaps block a
clean V2 migration:

  • FW-1: autogen emits a bare <metric> context; context_namespace is honored only by the template compiler, not autogen (chartengine/autogen.go:604, compiler.go:35).
  • FW-2: strict metrix rejects/panics on NaN/sparse summary/histogram input (pkg/metrix/summary.go:188-230, histogram.go:200), but scraped summaries legitimately have NaN quantiles.

Clean end state

A V2 collector (CreateV2, metrix.CollectorStore, Collect(ctx) error, MetricStore(), ChartTemplateYAML(); no reachable V1 path) that:

  • gauges/counters → flat + autogen; histograms/summaries → native metrix.Histogram/Summary; float dimensions;
  • preserves V1 contexts (via FW-1), values, dimensions, algorithms, _info skip, selector/limit ordering, fallback types, update_every:10, lifecycle (expire_after_cycles:10), and
    label_prefix/app;
  • native hist/summary done robustly: pre-validate before instrument declaration and ObservePoint (metrix panics at both; cache accepted schema by name); schema drift → skip +
    rate-limited log
    ; absent/NaN component → gap (FW-2);
  • chart-ID drift accepted; src/health/REFERENCE.md examples updated with real post-migration IDs.

Acceptance criteria

  • PR2 — compat manifest + spike (baseline, no runtime change; first PR of this issue, after the parser). Merges durable golden tests capturing current V1 behavior (contexts
    primary; values float-tolerant; dims/divisors/labels+label_prefix/lifecycle/_info/fallback/selector+limit order/config defaults incl. update_every:10). Enumerates ALL current config
    drift — autodetection_retry (schema 60 / metadata 0 / runtime unset), expected_prefix+app (in config_schema.json:39,49, omitted from metadata.yaml), Counter vs counter casing
    (config_schema.json:130 vs collector.go:63). The spike is investigation (design notes for the autogen-context mechanism + the groups-less question); it does NOT merge an
    emitted-context proof — that lands in PR3.
  • PR3 — FW-1 (autogen context-namespace), Full Design Gate. Autogen emits prometheus[.app].<metric> via a per-job override; a test asserts the emitted context.
  • PR4 — FW-2 (metrix gap-on-absent), Full Design Gate. Native Summary/Histogram emit a dimension gap for an absent/NaN quantile/bucket; partial + all-empty tests; strict validation
    intact for other collectors.
  • PR5 — V2 migration. PR2 manifest passes against V2; native hist/summary with pre-validation (NaN/Inf count/sum, duplicate quantiles, non-monotonic buckets, reserved
    le/quantile user labels → drop+log) and schema-drift skip+log; flat gauge/counter; float; label_prefix/app preserved; raw label values accepted (confirm cosmetic); update_every
    stays 10; no V1 path; REFERENCE.md updated; Counter/counter casing fixed; per-integration consistency artifacts.

Category

refactor

Scope boundaries

IN: compat-only V2 migration + FW-1 + FW-2 + the manifest/spike baseline. OUT: relabeling, profiles, the parser rewrite (Issue 1), new labels/host-scopes/Functions. FW-1/FW-2 are Full
Design Gate framework changes (separate gated PRs; design note + approval); they are general-purpose and may be split into standalone framework issues.

Validation

Golden manifest (contexts primary, values float-tolerant); emitted-context tests (FW-1); gap tests (FW-2); pre-validation + drift tests; real-node run; consistency/CI.

Risks / compatibility

Chart-ID drift breaks chart-ID references (contexts preserved; docs updated). Native hist/summary panics if fed malformed input → pre-validation mandatory. Schema drift drops+logs the
series (rare; accepted minor data loss). FW-1/FW-2 are high-blast-radius (all collectors) → design note + approval first.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions