Skip to content

fix: prevent stale exemplars leaking to histogram _sum/_count#18056

Merged
aknuds1 merged 1 commit into
prometheus:mainfrom
aknuds1:fix/histogram-stale-exemplar-leak
Feb 15, 2026
Merged

fix: prevent stale exemplars leaking to histogram _sum/_count#18056
aknuds1 merged 1 commit into
prometheus:mainfrom
aknuds1:fix/histogram-stale-exemplar-leak

Conversation

@aknuds1
Copy link
Copy Markdown
Contributor

@aknuds1 aknuds1 commented Feb 10, 2026

Which issue(s) does the PR fix:

NONE

Does this PR introduce a user-facing change?

[BUGFIX] OTLP: Fix exemplars getting mixed between incorrect parts of a histogram

Summary

In addHistogramDataPoints, exemplars assigned to the +Inf bucket of one data point were carried over into the _sum and _count Append calls of the next data point via the shared appOpts. This happened because the AppenderV2 migration replaced per-call nil exemplar arguments with a shared appOpts struct that wasn't cleared between iterations.

The fix clears appOpts.Exemplars at the start of each loop iteration, restoring the nil-exemplar semantics that existed before the migration.

A regression test with two histogram data points verifies exemplars don't leak across data point boundaries.

…data points

In addHistogramDataPoints, exemplars assigned to the +Inf bucket of one
data point were carried over into the _sum and _count Append calls of
the next data point via the shared appOpts. Clear appOpts.Exemplars at
the start of each loop iteration to restore the nil-exemplar semantics
that existed before the AppenderV2 migration.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Copy link
Copy Markdown
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find, LGTM.

A suggestion for an easier-to-understand release note:

[BUGFIX] OTLP: Fix exemplars getting mixed between incorrect parts of a histogram.

I have a feeling that "histogram data point leaking" or "stale exemplars" are hard to understand for the usual Prometheus user. Folks who want to go deeper are free to take a look at the linked PR. That said, this is a nitpicking, feel free to ignore!

@aknuds1
Copy link
Copy Markdown
Contributor Author

aknuds1 commented Feb 15, 2026

Thanks, revised the release note @ArthurSens.

@aknuds1 aknuds1 merged commit 4fb6ce4 into prometheus:main Feb 15, 2026
55 of 57 checks passed
@aknuds1 aknuds1 deleted the fix/histogram-stale-exemplar-leak branch February 15, 2026 09:47
bwplotka added a commit that referenced this pull request Feb 23, 2026
* Isolate fix: Remove 5s sleep for 99% speedup. Discarded unwanted code.

Signed-off-by: 3Juhwan <13selfesteem91@naver.com>
Signed-off-by: Sammy Tran <sammyqtran@gmail.com>

* Add regex optimization for simple contains alternations

Signed-off-by: Casie Chen <casie.chen@grafana.com>

* enable experimental functions in promql benchmarks

Signed-off-by: Dan Cech <dcech@grafana.com>

* Fix a couple of broken links in configuration.md (#18045)

Signed-off-by: kakabisht <kakabisht07@gmail.com>

* promql: info function: support multiple name matchers (#17968)

* Add new test cases for multiple name matchers in PromQL info function
* Fix handling of multiple name matchers in PromQL info function

---------

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

* tsdb: Optimize LabelValues for sparse intersections (Fixes #14551)

Signed-off-by: Divyansh Mishra <divyanshmishra@Divyanshs-MacBook-Air-3.local>

* fix: handle ErrTooOldSample as 400 Bad Request in OTLP and v2 histogram write paths

The OTLP write handler and the PRW v2 histogram append path were missing
ErrTooOldSample from their error type checks, causing these errors to
fall through to the default case and return HTTP 500 Internal Server Error.
This triggered unnecessary retries in OTLP clients like the Python SDK.

The PRW v1 write handler (line 115) and the PRW v2 sample append path
(line 377) already correctly handle ErrTooOldSample as a 400, and this
change makes the remaining paths consistent.

Also adds ErrTooOldSample to the v1 sample/histogram log checks so
these errors are properly logged instead of silently returned.

Fixes #16645

Signed-off-by: Varun Chawla <varun_6april@hotmail.com>

* fix: prevent stale exemplars leaking to histogram _sum/_count across data points (#18056)

In addHistogramDataPoints, exemplars assigned to the +Inf bucket of one
data point were carried over into the _sum and _count Append calls of
the next data point via the shared appOpts. Clear appOpts.Exemplars at
the start of each loop iteration to restore the nil-exemplar semantics
that existed before the AppenderV2 migration.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>

* tsdb: fix flaky TestBlockRanges by using explicit compaction

Replace polling loops (for range 100 { time.Sleep }) with explicit
db.Compact() calls after disabling background compaction, eliminating
CI flakiness on slow machines. Also fix incorrect overlap assertions
that were checking the wrong direction (LessOrEqual -> GreaterOrEqual).

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>

* rules: fix flaky TestAsyncRuleEvaluation on Windows (#17965)

Convert all timing-sensitive subtests of TestAsyncRuleEvaluation to use
synctest for deterministic testing. This fixes flakiness on Windows
caused by timer granularity and scheduling variance.

The timing assertions are preserved using synctest's fake time, which
allows accurate verification of sequential vs concurrent execution
timing without relying on wall-clock time.

Fixes #17961

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>

* Move krajorama to general maintainer (#18095)

He's been participating in the bug scrub for a year and provides
reviews all over the code base. Also fix name spelling.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

* promtool: fix --enable-feature flags ignored in check config and test rules (#18097)

Both are regressions from the parser refactoring in #17977.

- Fixes #18092
- Fixes #18093

Signed-off-by: Martin Valiente Ainz <64830185+tinitiuset@users.noreply.github.com>

* tsdb/wlog: Remove any temproary checkpoints when creating a Checkpoint (#17598)

* RemoveTmpDirs function to tsdbutil
* Refactor db to use RemoveTmpDirs and no longer cleanup checkpoint tmp dirs
* Use RemoveTmpDirs in wlog checkpoint to cleanup all checkpoint tmp folders
* Add tests for RemoveTmpDirs
* Ensure db.Open will still cleanup extra temporary checkpoints

Signed-off-by: Kyle Eckhart <kgeckhart@users.noreply.github.com>

* chore(lint): enable wg.Go

Since our minimum supported go version is now go 1.25, we can use wg.Go.

Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>

* chore: enable staticcheck linter and update golangci-lint to 2.10.1

Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>

* PromQL: Add experimental histogram_quantiles variadic function (#17285)

Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Signed-off-by: beorn7 <beorn@grafana.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: beorn7 <beorn@grafana.com>

* Upgrade `mongo-driver` to v1.17.9 (#18077)

Signed-off-by: Sayuru <71478576+samaras3@users.noreply.github.com>

* tests: add CI job for ompliance testing (#18121)

Signed-off-by: bwplotka <bwplotka@gmail.com>

* chore: Add consistent closing logging (#18119)

Signed-off-by: bwplotka <bwplotka@gmail.com>

* fix(deps): update github.com/hashicorp/nomad/api digest to daca79d (#18128)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* fix(deps): update aws go dependencies (#18135)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Fix renovate PR body (#18154)

* chore(deps): update actions/stale action to v10.2.0 (#18144)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* chore(deps): update actions/setup-node action to v6.2.0 (#18143)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* chore(deps): update github/codeql-action action to v4.32.4 (#18147)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* fix(deps): update kubernetes go dependencies to v0.35.1 (#18136)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* chore(deps): update dependabot/fetch-metadata action to v2.5.0 (#18145)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* fix(deps): update module github.com/prometheus/alertmanager to v0.31.1 (#18142)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* fix(deps): update module github.com/klauspost/compress to v1.18.4 (#18138)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* fix(deps): update github.com/nsf/jsondiff digest to 8e8d90c (#18129)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* fix(deps): update google.golang.org/genproto/googleapis/api digest to 42d3e9b (#18132)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Fixup renovate PR note (#18156)

* fix(deps): update module golang.org/x/sys to v0.41.0 (#18163)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* fix(deps): update module github.com/grpc-ecosystem/grpc-gateway/v2 to v2.28.0 (#18159)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* fix(deps): update module github.com/envoyproxy/go-control-plane/envoy to v1.37.0 (#18158)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* fix(deps): update module github.com/digitalocean/godo to v1.175.0 (#18157)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* fix(deps): update module google.golang.org/api to v0.267.0 (#18165)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* [FEATURE] AWS SD:  Add Elasticache Role (#18099)

* AWS SD: Elasticache

This change adds Elasticache to the AWS SD.

Co-authored-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Matt <small_minority@hotmail.com>

---------

Signed-off-by: Matt <small_minority@hotmail.com>
Co-authored-by: Ben Kochie <superq@gmail.com>

* fix(deps): update module golang.org/x/text to v0.34.0 (#18164)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* fix(deps): update module github.com/envoyproxy/protoc-gen-validate to v1.3.3 (#18137)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* fix test after merge

Signed-off-by: bwplotka <bwplotka@gmail.com>

---------

Signed-off-by: 3Juhwan <13selfesteem91@naver.com>
Signed-off-by: Sammy Tran <sammyqtran@gmail.com>
Signed-off-by: Casie Chen <casie.chen@grafana.com>
Signed-off-by: Dan Cech <dcech@grafana.com>
Signed-off-by: kakabisht <kakabisht07@gmail.com>
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Signed-off-by: Divyansh Mishra <divyanshmishra@Divyanshs-MacBook-Air-3.local>
Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: Martin Valiente Ainz <64830185+tinitiuset@users.noreply.github.com>
Signed-off-by: Kyle Eckhart <kgeckhart@users.noreply.github.com>
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: Sayuru <71478576+samaras3@users.noreply.github.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
Signed-off-by: Matt <small_minority@hotmail.com>
Co-authored-by: 3Juhwan <13selfesteem91@naver.com>
Co-authored-by: Casie Chen <casie.chen@grafana.com>
Co-authored-by: Dan Cech <dcech@grafana.com>
Co-authored-by: hridyesh bisht <41201308+kakabisht@users.noreply.github.com>
Co-authored-by: Julien <291750+roidelapluie@users.noreply.github.com>
Co-authored-by: zenador <zenador@users.noreply.github.com>
Co-authored-by: Divyansh Mishra <divyanshmishra@Divyanshs-MacBook-Air-3.local>
Co-authored-by: Varun Chawla <varun_6april@hotmail.com>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Co-authored-by: Martin Valiente Ainz <64830185+tinitiuset@users.noreply.github.com>
Co-authored-by: Bryan Boreham <bjboreham@gmail.com>
Co-authored-by: Kyle Eckhart <kgeckhart@users.noreply.github.com>
Co-authored-by: Matthieu MOREL <matthieu.morel35@gmail.com>
Co-authored-by: Linas Medžiūnas <linasm@users.noreply.github.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: beorn7 <beorn@grafana.com>
Co-authored-by: Sayuru <71478576+samaras3@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
Co-authored-by: Matt <small_minority@hotmail.com>
renovate Bot added a commit to sdwilsh/ansible-playbooks that referenced this pull request Apr 8, 2026
##### [\`v3.11.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.11.0)

- \[CHANGE] Hetzner SD: The `__meta_hetzner_datacenter` label is deprecated for the role `robot` but kept for backward compatibility, use the `__meta_hetzner_robot_datacenter` label instead. For the role `hcloud`, the label is deprecated and will stop working after the 1 July 2026. [#17850](prometheus/prometheus#17850)
- \[CHANGE] Hetzner SD: The `__meta_hetzner_hcloud_datacenter_location` and `__meta_hetzner_hcloud_datacenter_location_network_zone` labels are deprecated, use the `__meta_hetzner_hcloud_location` and `__meta_hetzner_hcloud_location_network_zone` labels instead. [#17850](prometheus/prometheus#17850)
- \[CHANGE] Promtool: Redirect debug output to stderr to avoid interfering with stdout-based tool output. [#18346](prometheus/prometheus#18346)
- \[FEATURE] AWS SD: Add Elasticache Role. [#18099](prometheus/prometheus#18099)
- \[FEATURE] AWS SD: Add RDS Role. [#18206](prometheus/prometheus#18206)
- \[FEATURE] Azure SD: Add support for Azure Workload Identity authentication method. [#17207](prometheus/prometheus#17207)
- \[FEATURE] Discovery: Introduce `prometheus_sd_last_update_timestamp_seconds` metric to track the last time a service discovery update was sent to consumers. [#18194](prometheus/prometheus#18194)
- \[FEATURE] Kubernetes SD: Add support for node role selectors for pod roles. [#18006](prometheus/prometheus#18006)
- \[FEATURE] Kubernetes SD: Introduce pod-based labels for deployment, cronjob, and job controller names: `__meta_kubernetes_pod_deployment_name`, `__meta_kubernetes_pod_cronjob_name` and `__meta_kubernetes_pod_job_name`, respectively. [#17774](prometheus/prometheus#17774)
- \[FEATURE] PromQL: Add `</` and `>/` operators for trimming observations from native histograms. [#17904](prometheus/prometheus#17904)
- \[FEATURE] PromQL: Add experimental `histogram_quantiles` variadic function for computing multiple quantiles at once. [#17285](prometheus/prometheus#17285)
- \[FEATURE] TSDB: Add `storage.tsdb.retention.percentage` configuration to configure the maximum percent of disk usable for TSDB storage. [#18080](prometheus/prometheus#18080)
- \[FEATURE] TSDB: Add an experimental `fast-startup` feature flag that writes a `series_state.json` file to the WAL directory to track active series state across restarts. [#18303](prometheus/prometheus#18303)
- \[FEATURE] TSDB: Add an experimental `st-storage` feature flag. When enabled, Prometheus stores ingested start timestamps (ST, previously called Created Timestamp) from scrape or OTLP in the TSDB and Agent WAL, and exposes them via Remote Write 2. [#18062](prometheus/prometheus#18062)
- \[FEATURE] TSDB: Add an experimental `xor2-encoding` feature flag for the new TSDB block float sample chunk encoding that is optimized for scraped data and allows encoding start timestamps. [#18062](prometheus/prometheus#18062)
- \[ENHANCEMENT] HTTP client: Add AWS `external_id` support for sigv4. [#17916](prometheus/prometheus#17916)
- \[ENHANCEMENT] Kubernetes SD: Deduplicate deprecation warning logs from the Kubernetes API to reduce noise. [#17829](prometheus/prometheus#17829)
- \[ENHANCEMENT] TSDB: Remove old temporary checkpoints when creating a Checkpoint. [#17598](prometheus/prometheus#17598)
- \[ENHANCEMENT] UI: Add autocomplete support for experimental `first_over_time` and `ts_of_first_over_time` PromQL functions. [#18318](prometheus/prometheus#18318)
- \[ENHANCEMENT] Vultr SD: Upgrade govultr library from v2 to v3 for continued security patches and maintenance. [#18347](prometheus/prometheus#18347)
- \[PERF] PromQL: Improve performance and reduce heap allocations in joins (VectorBinop)/And/Or/Unless. [#17159](prometheus/prometheus#17159)
- \[PERF] PromQL: Partially address performance regression in native histogram aggregations due to using `KahanAdd`. [#18252](prometheus/prometheus#18252)
- \[PERF] Remote write: Optimize WAL watching used for RW sending to reuse internal buffers. [#18250](prometheus/prometheus#18250)
- \[PERF] TSDB: Optimize LabelValues intersection performance for matchers. [#18069](prometheus/prometheus#18069)
- \[PERF] UI: Skip restacking on hover in stacked series charts. [#18230](prometheus/prometheus#18230)
- \[BUGFIX] AWS SD: Fix EC2 SD ignoring the configured `endpoint` option, a regression from the AWS SDK v2 migration. [#18133](prometheus/prometheus#18133)
- \[BUGFIX] AWS SD: Fix panic in EC2 SD when DescribeAvailabilityZones returns nil ZoneName or ZoneId. [#18133](prometheus/prometheus#18133)
- \[BUGFIX] Agent: Fix memory leak caused by duplicate SeriesRefs being loaded as active series. [#17538](prometheus/prometheus#17538)
- \[BUGFIX] Alerting: Fix alert state incorrectly resetting to pending when the FOR period is increased in the config file. [#18244](prometheus/prometheus#18244)
- \[BUGFIX] Azure SD: Fix system-assigned managed identity not working when `client_id` is empty. [#18323](prometheus/prometheus#18323)
- \[BUGFIX] Consul SD: Fix filter parameter not being applied to health service endpoint, causing Node and Node.Meta filters to be ignored. [#17349](prometheus/prometheus#17349)
- \[BUGFIX] Kubernetes SD: Fix duplicate targets generated by `*DualStack` EndpointSlices policies. [#18192](prometheus/prometheus#18192)
- \[BUGFIX] OTLP: Fix ErrTooOldSample being returned as HTTP 500 instead of 400 in PRW v2 histogram write paths, preventing infinite client retry loops. [#18084](prometheus/prometheus#18084)
- \[BUGFIX] OTLP: Fix exemplars getting mixed between incorrect parts of a histogram. [#18056](prometheus/prometheus#18056)
- \[BUGFIX] PromQL: Do not skip histogram buckets in queries where histogram trimming is used. [#18263](prometheus/prometheus#18263)
- \[BUGFIX] Remote write: Fix `prometheus_remote_storage_sent_batch_duration_seconds` measuring before the request was sent. [#18214](prometheus/prometheus#18214)
- \[BUGFIX] Rules: Fix alert state restoration when rule labels contain Go template expressions. [#18375](prometheus/prometheus#18375)
- \[BUGFIX] Scrape: Fix panic when parsing bare label names without an equal sign in brace-only metric notation. [#18229](prometheus/prometheus#18229)
- \[BUGFIX] TSDB: Fail early when `use-uncached-io` feature flag is set on unsupported environments. [#18219](prometheus/prometheus#18219)
- \[BUGFIX] TSDB: Fall back to CLI flag values when retention is removed from config file. [#18200](prometheus/prometheus#18200)
- \[BUGFIX] TSDB: Fix memory leaks in buffer pools by clearing reference fields before returning buffers to pools. [#17895](prometheus/prometheus#17895)
- \[BUGFIX] TSDB: Fix missing mmap of histogram chunks during WAL replay. [#18306](prometheus/prometheus#18306)
- \[BUGFIX] TSDB: Fix storage.tsdb.retention.time unit mismatch in file causing retention to be 1e6 times longer than configured. [#18200](prometheus/prometheus#18200)
- \[BUGFIX] Tracing: Fix missing traceID in query log when tracing is enabled, previously only spanID was emitted. [#18189](prometheus/prometheus#18189)
- \[BUGFIX] UI: Fix tooltip Y-offset drift when using multiple graph panels. [#18228](prometheus/prometheus#18228)
- \[BUGFIX] UI: Update retention display in runtime info when config is reloaded. [#18200](prometheus/prometheus#18200)
renovate Bot added a commit to sdwilsh/ansible-playbooks that referenced this pull request Apr 8, 2026
##### [\`v3.11.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.11.0)

- \[CHANGE] Hetzner SD: The `__meta_hetzner_datacenter` label is deprecated for the role `robot` but kept for backward compatibility, use the `__meta_hetzner_robot_datacenter` label instead. For the role `hcloud`, the label is deprecated and will stop working after the 1 July 2026. [#17850](prometheus/prometheus#17850)
- \[CHANGE] Hetzner SD: The `__meta_hetzner_hcloud_datacenter_location` and `__meta_hetzner_hcloud_datacenter_location_network_zone` labels are deprecated, use the `__meta_hetzner_hcloud_location` and `__meta_hetzner_hcloud_location_network_zone` labels instead. [#17850](prometheus/prometheus#17850)
- \[CHANGE] Promtool: Redirect debug output to stderr to avoid interfering with stdout-based tool output. [#18346](prometheus/prometheus#18346)
- \[FEATURE] AWS SD: Add Elasticache Role. [#18099](prometheus/prometheus#18099)
- \[FEATURE] AWS SD: Add RDS Role. [#18206](prometheus/prometheus#18206)
- \[FEATURE] Azure SD: Add support for Azure Workload Identity authentication method. [#17207](prometheus/prometheus#17207)
- \[FEATURE] Discovery: Introduce `prometheus_sd_last_update_timestamp_seconds` metric to track the last time a service discovery update was sent to consumers. [#18194](prometheus/prometheus#18194)
- \[FEATURE] Kubernetes SD: Add support for node role selectors for pod roles. [#18006](prometheus/prometheus#18006)
- \[FEATURE] Kubernetes SD: Introduce pod-based labels for deployment, cronjob, and job controller names: `__meta_kubernetes_pod_deployment_name`, `__meta_kubernetes_pod_cronjob_name` and `__meta_kubernetes_pod_job_name`, respectively. [#17774](prometheus/prometheus#17774)
- \[FEATURE] PromQL: Add `</` and `>/` operators for trimming observations from native histograms. [#17904](prometheus/prometheus#17904)
- \[FEATURE] PromQL: Add experimental `histogram_quantiles` variadic function for computing multiple quantiles at once. [#17285](prometheus/prometheus#17285)
- \[FEATURE] TSDB: Add `storage.tsdb.retention.percentage` configuration to configure the maximum percent of disk usable for TSDB storage. [#18080](prometheus/prometheus#18080)
- \[FEATURE] TSDB: Add an experimental `fast-startup` feature flag that writes a `series_state.json` file to the WAL directory to track active series state across restarts. [#18303](prometheus/prometheus#18303)
- \[FEATURE] TSDB: Add an experimental `st-storage` feature flag. When enabled, Prometheus stores ingested start timestamps (ST, previously called Created Timestamp) from scrape or OTLP in the TSDB and Agent WAL, and exposes them via Remote Write 2. [#18062](prometheus/prometheus#18062)
- \[FEATURE] TSDB: Add an experimental `xor2-encoding` feature flag for the new TSDB block float sample chunk encoding that is optimized for scraped data and allows encoding start timestamps. [#18062](prometheus/prometheus#18062)
- \[ENHANCEMENT] HTTP client: Add AWS `external_id` support for sigv4. [#17916](prometheus/prometheus#17916)
- \[ENHANCEMENT] Kubernetes SD: Deduplicate deprecation warning logs from the Kubernetes API to reduce noise. [#17829](prometheus/prometheus#17829)
- \[ENHANCEMENT] TSDB: Remove old temporary checkpoints when creating a Checkpoint. [#17598](prometheus/prometheus#17598)
- \[ENHANCEMENT] UI: Add autocomplete support for experimental `first_over_time` and `ts_of_first_over_time` PromQL functions. [#18318](prometheus/prometheus#18318)
- \[ENHANCEMENT] Vultr SD: Upgrade govultr library from v2 to v3 for continued security patches and maintenance. [#18347](prometheus/prometheus#18347)
- \[PERF] PromQL: Improve performance and reduce heap allocations in joins (VectorBinop)/And/Or/Unless. [#17159](prometheus/prometheus#17159)
- \[PERF] PromQL: Partially address performance regression in native histogram aggregations due to using `KahanAdd`. [#18252](prometheus/prometheus#18252)
- \[PERF] Remote write: Optimize WAL watching used for RW sending to reuse internal buffers. [#18250](prometheus/prometheus#18250)
- \[PERF] TSDB: Optimize LabelValues intersection performance for matchers. [#18069](prometheus/prometheus#18069)
- \[PERF] UI: Skip restacking on hover in stacked series charts. [#18230](prometheus/prometheus#18230)
- \[BUGFIX] AWS SD: Fix EC2 SD ignoring the configured `endpoint` option, a regression from the AWS SDK v2 migration. [#18133](prometheus/prometheus#18133)
- \[BUGFIX] AWS SD: Fix panic in EC2 SD when DescribeAvailabilityZones returns nil ZoneName or ZoneId. [#18133](prometheus/prometheus#18133)
- \[BUGFIX] Agent: Fix memory leak caused by duplicate SeriesRefs being loaded as active series. [#17538](prometheus/prometheus#17538)
- \[BUGFIX] Alerting: Fix alert state incorrectly resetting to pending when the FOR period is increased in the config file. [#18244](prometheus/prometheus#18244)
- \[BUGFIX] Azure SD: Fix system-assigned managed identity not working when `client_id` is empty. [#18323](prometheus/prometheus#18323)
- \[BUGFIX] Consul SD: Fix filter parameter not being applied to health service endpoint, causing Node and Node.Meta filters to be ignored. [#17349](prometheus/prometheus#17349)
- \[BUGFIX] Kubernetes SD: Fix duplicate targets generated by `*DualStack` EndpointSlices policies. [#18192](prometheus/prometheus#18192)
- \[BUGFIX] OTLP: Fix ErrTooOldSample being returned as HTTP 500 instead of 400 in PRW v2 histogram write paths, preventing infinite client retry loops. [#18084](prometheus/prometheus#18084)
- \[BUGFIX] OTLP: Fix exemplars getting mixed between incorrect parts of a histogram. [#18056](prometheus/prometheus#18056)
- \[BUGFIX] PromQL: Do not skip histogram buckets in queries where histogram trimming is used. [#18263](prometheus/prometheus#18263)
- \[BUGFIX] Remote write: Fix `prometheus_remote_storage_sent_batch_duration_seconds` measuring before the request was sent. [#18214](prometheus/prometheus#18214)
- \[BUGFIX] Rules: Fix alert state restoration when rule labels contain Go template expressions. [#18375](prometheus/prometheus#18375)
- \[BUGFIX] Scrape: Fix panic when parsing bare label names without an equal sign in brace-only metric notation. [#18229](prometheus/prometheus#18229)
- \[BUGFIX] TSDB: Fail early when `use-uncached-io` feature flag is set on unsupported environments. [#18219](prometheus/prometheus#18219)
- \[BUGFIX] TSDB: Fall back to CLI flag values when retention is removed from config file. [#18200](prometheus/prometheus#18200)
- \[BUGFIX] TSDB: Fix memory leaks in buffer pools by clearing reference fields before returning buffers to pools. [#17895](prometheus/prometheus#17895)
- \[BUGFIX] TSDB: Fix missing mmap of histogram chunks during WAL replay. [#18306](prometheus/prometheus#18306)
- \[BUGFIX] TSDB: Fix storage.tsdb.retention.time unit mismatch in file causing retention to be 1e6 times longer than configured. [#18200](prometheus/prometheus#18200)
- \[BUGFIX] Tracing: Fix missing traceID in query log when tracing is enabled, previously only spanID was emitted. [#18189](prometheus/prometheus#18189)
- \[BUGFIX] UI: Fix tooltip Y-offset drift when using multiple graph panels. [#18228](prometheus/prometheus#18228)
- \[BUGFIX] UI: Update retention display in runtime info when config is reloaded. [#18200](prometheus/prometheus#18200)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants