Skip to content

2.8.0

Compare
Choose a tag to compare
@lamida lamida released this 03 May 12:58
· 2762 commits to main since this release
mimir-2.8.0
f917e08

This release contains 223 PRs from 53 authors, including new contributors Abdurrahman J. Allawala, Ashray Jain, Cyrill N, Daniel Barnes, Dave, David van der Spek, day4me, Devin Trejo, Dmitriy Okladin, Gabriel Santos, inbarpatashnik, Johannes Tandler, Julien Girard, KingJ, Miller, Rafał Boniecki, Raphael Ferreira, Raúl Marín, Ruslan Kovalov, Shagit Ziganshin, shanmugara, Wilfried ROSET. Thank you!

Grafana Mimir version 2.8.0 release notes

Grafana Labs is excited to announce version 2.8 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.

Features and enhancements

  • Experimental support for using Redis as cache Mimir now can use Redis for caching results, chunks, index and metadata.
  • Experimental support for fetching secret from Vault for TLS configuration.
  • Experimental support for querying native histograms. This support is not finalized as the related Prometheus API is also experimental, thus the exact behavior might change in future releases.
  • Query-frontend and ruler now use protobuf internal query result payload format by default This reduces the CPU and memory utilisation of the querier, query-frontend and ruler, as well as reducing network bandwidth consumed between these components.
  • Query-frontend cached results now contain timestamp This allows Mimir to check if cached results are still valid based on current TTL configured for tenant. Results cached by previous Mimir version are used until they expire from cache, which can take up to 7 days. If you need to use per-tenant TTL sooner, please flush results cache manually.
  • Optimized regular expression label matchers This reduces CPU utilisation in ingesters and store-gateways when running queries containing regular expression label matchers.
  • Store-gateway now use streaming for LabelNames RPC This improves memory utilization in store-gateway when calling LabelNames RPC.

Helm chart improvements

The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.

Important changes

In Grafana Mimir 2.8 we have removed the following previously deprecated or experimental metrics:

  • cortex_bucket_store_series_get_all_duration_seconds
  • cortex_bucket_store_series_merge_duration_seconds
  • cortex_ingester_tsdb_wal_replay_duration_seconds

The following configuration options are deprecated and will be removed in Grafana Mimir 2.10:

  • The CLI flag -blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup and its respective YAML configuration option tsdb.max_tsdb_opening_concurrency_on_startup.

The following configuration options that were deprecated in 2.6 are removed:

  • The CLI flag -store.max-query-length and its respective YAML configuration option limits.max_query_length.

The following configuration options that were deprecated in 2.5 are removed:

  • The CLI flag -azure.msi-resource.

The following experimental options and features are now stable:

  • The protobuf internal query result payload format, which is now enabled by default

We changed default value of block storage retention period. The default value for -blocks-storage.tsdb.retention-period was 24h and now is 13h

Bug fixes

  • Querier: Streaming remote read will now continue to return multiple chunks per frame after the first frame. PR 4423
  • Query-frontend: don't retry queries which error inside PromQL. PR 4643
  • Store-gateway & query-frontend: report more consistent statistics for fetched index bytes. PR 4671

Changelog

2.8.0

Grafana Mimir

  • [CHANGE] Ingester: changed experimental CLI flag from -out-of-order-blocks-external-label-enabled to -ingester.out-of-order-blocks-external-label-enabled #4440
  • [CHANGE] Store-gateway: The following metrics have been removed: #4332
    • cortex_bucket_store_series_get_all_duration_seconds
    • cortex_bucket_store_series_merge_duration_seconds
  • [CHANGE] Ingester: changed default value of -blocks-storage.tsdb.retention-period from 24h to 13h. If you're running Mimir with a custom configuration and you're overriding -querier.query-store-after to a value greater than the default 12h then you should increase -blocks-storage.tsdb.retention-period accordingly. #4382
  • [CHANGE] Ingester: the configuration parameter -blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup has been deprecated and will be removed in Mimir 2.10. #4445
  • [CHANGE] Query-frontend: Cached results now contain timestamp which allows Mimir to check if cached results are still valid based on current TTL configured for tenant. Results cached by previous Mimir version are used until they expire from cache, which can take up to 7 days. If you need to use per-tenant TTL sooner, please flush results cache manually. #4439
  • [CHANGE] Ingester: the cortex_ingester_tsdb_wal_replay_duration_seconds metrics has been removed. #4465
  • [CHANGE] Query-frontend and ruler: use protobuf internal query result payload format by default. This feature is no longer considered experimental. #4557 #4709
  • [CHANGE] Ruler: reject creating federated rule groups while tenant federation is disabled. Previously the rule groups would be silently dropped during bucket sync. #4555
  • [CHANGE] Compactor: the /api/v1/upload/block/{block}/finish endpoint now returns a 429 status code when the compactor has reached the limit specified by -compactor.max-block-upload-validation-concurrency. #4598
  • [CHANGE] Compactor: when starting a block upload the maximum byte size of the block metadata provided in the request body is now limited to 1 MiB. If this limit is exceeded a 413 status code is returned. #4683
  • [CHANGE] Store-gateway: cache key format for expanded postings has changed. This will invalidate the expanded postings in the index cache when deployed. #4667
  • [FEATURE] Cache: Introduce experimental support for using Redis for results, chunks, index, and metadata caches. #4371
  • [FEATURE] Vault: Introduce experimental integration with Vault to fetch secrets used to configure TLS for clients. Server TLS secrets will still be read from a file. tls-ca-path, tls-cert-path and tls-key-path will denote the path in Vault for the following CLI flags when -vault.enabled is true: #4446.
    • -distributor.ha-tracker.etcd.*
    • -distributor.ring.etcd.*
    • -distributor.forwarding.grpc-client.*
    • -querier.store-gateway-client.*
    • -ingester.client.*
    • -ingester.ring.etcd.*
    • -querier.frontend-client.*
    • -query-frontend.grpc-client-config.*
    • -query-frontend.results-cache.redis.*
    • -blocks-storage.bucket-store.index-cache.redis.*
    • -blocks-storage.bucket-store.chunks-cache.redis.*
    • -blocks-storage.bucket-store.metadata-cache.redis.*
    • -compactor.ring.etcd.*
    • -store-gateway.sharding-ring.etcd.*
    • -ruler.client.*
    • -ruler.alertmanager-client.*
    • -ruler.ring.etcd.*
    • -ruler.query-frontend.grpc-client-config.*
    • -alertmanager.sharding-ring.etcd.*
    • -alertmanager.alertmanager-client.*
    • -memberlist.*
    • -query-scheduler.grpc-client-config.*
    • -query-scheduler.ring.etcd.*
    • -overrides-exporter.ring.etcd.*
  • [FEATURE] Distributor, ingester, querier, query-frontend, store-gateway: add experimental support for native histograms. Requires that the experimental protobuf query result response format is enabled by -query-frontend.query-result-response-format=protobuf on the query frontend. #4286 #4352 #4354 #4376 #4377 #4387 #4396 #4425 #4442 #4494 #4512 #4513 #4526
  • [FEATURE] Added -<prefix>.s3.storage-class flag to configure the S3 storage class for objects written to S3 buckets. #4300
  • [FEATURE] Add freebsd to the target OS when generating binaries for a Mimir release. #4654
  • [FEATURE] Ingester: Add prepare-shutdown endpoint which can be used as part of Kubernetes scale down automations. #4718
  • [ENHANCEMENT] Add timezone information to Alpine Docker images. #4583
  • [ENHANCEMENT] Ruler: Sync rules when ruler JOINING the ring instead of ACTIVE, In order to reducing missed rule iterations during ruler restarts. #4451
  • [ENHANCEMENT] Allow to define service name used for tracing via JAEGER_SERVICE_NAME environment variable. #4394
  • [ENHANCEMENT] Querier and query-frontend: add experimental, more performant protobuf query result response format enabled with -query-frontend.query-result-response-format=protobuf. #4304 #4318 #4375
  • [ENHANCEMENT] Compactor: added experimental configuration parameter -compactor.first-level-compaction-wait-period, to configure how long the compactor should wait before compacting 1st level blocks (uploaded by ingesters). This configuration option allows to reduce the chances compactor begins compacting blocks before all ingesters have uploaded their blocks to the storage. #4401
  • [ENHANCEMENT] Store-gateway: use more efficient chunks fetching and caching. #4255
  • [ENHANCEMENT] Query-frontend and ruler: add experimental, more performant protobuf internal query result response format enabled with -ruler.query-frontend.query-result-response-format=protobuf. #4331
  • [ENHANCEMENT] Ruler: increased tolerance for missed iterations on alerts, reducing the chances of flapping firing alerts during ruler restarts. #4432
  • [ENHANCEMENT] Optimized .* and .+ regular expression label matchers. #4432
  • [ENHANCEMENT] Optimized regular expression label matchers with alternates (e.g. a|b|c). #4647
  • [ENHANCEMENT] Added an in-memory cache for regular expression matchers, to avoid parsing and compiling the same expression multiple times when used in recurring queries. #4633
  • [ENHANCEMENT] Query-frontend: results cache TTL is now configurable by using -query-frontend.results-cache-ttl and -query-frontend.results-cache-ttl-for-out-of-order-time-window options. These values can also be specified per tenant. Default values are unchanged (7 days and 10 minutes respectively). #4385
  • [ENHANCEMENT] Ingester: added advanced configuration parameter -blocks-storage.tsdb.wal-replay-concurrency representing the maximum number of CPUs used during WAL replay. #4445
  • [ENHANCEMENT] Ingester: added metrics cortex_ingester_tsdb_open_duration_seconds_total to measure the total time it takes to open all existing TSDBs. The time tracked by this metric also includes the TSDBs WAL replay duration. #4465
  • [ENHANCEMENT] Store-gateway: use streaming implementation for LabelNames RPC. The batch size for streaming is controlled by -blocks-storage.bucket-store.batch-series-size. #4464
  • [ENHANCEMENT] Memcached: Add support for TLS or mTLS connections to cache servers. #4535
  • [ENHANCEMENT] Compactor: blocks index files are now validated for correctness for blocks uploaded via the TSDB block upload feature. #4503
  • [ENHANCEMENT] Compactor: block chunks and segment files are now validated for correctness for blocks uploaded via the TSDB block upload feature. #4549
  • [ENHANCEMENT] Ingester: added configuration options to configure the "postings for matchers" cache of each compacted block queried from ingesters: #4561
    • -blocks-storage.tsdb.block-postings-for-matchers-cache-ttl
    • -blocks-storage.tsdb.block-postings-for-matchers-cache-size
    • -blocks-storage.tsdb.block-postings-for-matchers-cache-force
  • [ENHANCEMENT] Compactor: validation of blocks uploaded via the TSDB block upload feature is now configurable on a per tenant basis: #4585
    • -compactor.block-upload-validation-enabled has been added, compactor_block_upload_validation_enabled can be used to override per tenant
    • -compactor.block-upload.block-validation-enabled was the previous global flag and has been removed
  • [ENHANCEMENT] TSDB Block Upload: block upload validation concurrency can now be limited with -compactor.max-block-upload-validation-concurrency. #4598
  • [ENHANCEMENT] OTLP: Add support for converting OTel exponential histograms to Prometheus native histograms. The ingestion of native histograms must be enabled, please set -ingester.native-histograms-ingestion-enabled to true. #4063 #4639
  • [ENHANCEMENT] Query-frontend: add metric cortex_query_fetched_index_bytes_total to measure TSDB index bytes fetched to execute a query. #4597
  • [ENHANCEMENT] Query-frontend: add experimental limit to enforce a max query expression size in bytes via -query-frontend.max-query-expression-size-bytes or max_query_expression_size_bytes. #4604
  • [ENHANCEMENT] Query-tee: improve message logged when comparing responses and one response contains a non-JSON payload. #4588
  • [ENHANCEMENT] Distributor: add ability to set per-distributor limits via distributor_limits block in runtime configuration in addition to the existing configuration. #4619
  • [ENHANCEMENT] Querier: reduce peak memory consumption for queries that touch a large number of chunks. #4625
  • [ENHANCEMENT] Query-frontend: added experimental -query-frontend.query-sharding-max-regexp-size-bytes limit to query-frontend. When set to a value greater than 0, query-frontend disabled query sharding for any query with a regexp matcher longer than the configured limit. #4632
  • [ENHANCEMENT] Store-gateway: include statistics from LabelValues and LabelNames calls in cortex_bucket_store_series* metrics. #4673
  • [ENHANCEMENT] Query-frontend: improve readability of distributed tracing spans. #4656
  • [ENHANCEMENT] Update Docker base images from alpine:3.17.2 to alpine:3.17.3. #4685
  • [ENHANCEMENT] Querier: improve performance when shuffle sharding is enabled and the shard size is large. #4711
  • [ENHANCEMENT] Ingester: improve performance when Active Series Tracker is in use. #4717
  • [ENHANCEMENT] Store-gateway: optionally select -blocks-storage.bucket-store.series-selection-strategy, which can limit the impact of large posting lists (when many series share the same label name and value). #4667 #4695 #4698
  • [ENHANCEMENT] Querier: Cache the converted float histogram from chunk iterator, hence there is no need to lookup chunk every time to get the converted float histogram. #4684
  • [ENHANCEMENT] Ruler: Improve rule upload performance when not enforcing per-tenant rule group limits. #4828
  • [ENHANCEMENT] Improved memory limit on the in-memory cache used for regular expression matchers. #4751
  • [BUGFIX] Querier: Streaming remote read will now continue to return multiple chunks per frame after the first frame. #4423
  • [BUGFIX] Store-gateway: the values for stage="processed" for the metrics cortex_bucket_store_series_data_touched and cortex_bucket_store_series_data_size_touched_bytes when using fine-grained chunks caching is now reporting the correct values of chunks held in memory. #4449
  • [BUGFIX] Compactor: fixed reporting a compaction error when compactor is correctly shut down while populating blocks. #4580
  • [BUGFIX] OTLP: Do not drop exemplars of the OTLP Monotonic Sum metric. #4063
  • [BUGFIX] Packaging: flag /etc/default/mimir and /etc/sysconfig/mimir as config to prevent overwrite. #4587
  • [BUGFIX] Query-frontend: don't retry queries which error inside PromQL. #4643
  • [BUGFIX] Store-gateway & query-frontend: report more consistent statistics for fetched index bytes. #4671
  • [BUGFIX] Native histograms: fix how IsFloatHistogram determines if mimirpb.Histogram is a float histogram. #4706
  • [BUGFIX] Query-frontend: fix query sharding for native histograms. #4666
  • [BUGFIX] Ring status page: fixed the owned tokens percentage value displayed. #4730
  • [BUGFIX] Querier: fixed chunk iterator that can return sample with wrong timestamp. #4450
  • [BUGFIX] Packaging: fix preremove script preventing upgrades. #4801
  • [BUGFIX] Security: updates Go to version 1.20.4 to fix CVE-2023-24539, CVE-2023-24540, CVE-2023-29400. #4903

Mixin

  • [ENHANCEMENT] Queries: Display data touched per sec in bytes instead of number of items. #4492
  • [ENHANCEMENT] _config.job_names.<job> values can now be arrays of regular expressions in addition to a single string. Strings are still supported and behave as before. #4543
  • [ENHANCEMENT] Queries dashboard: remove mention to store-gateway "streaming enabled" in panels because store-gateway only support streaming series since Mimir 2.7. #4569
  • [ENHANCEMENT] Ruler: Add panel description for Read QPS panel in Ruler dashboard to explain values when in remote ruler mode. #4675
  • [BUGFIX] Ruler dashboard: show data for reads from ingesters. #4543
  • [BUGFIX] Pod selector regex for deployments: change (.*-mimir-) to (.*mimir-). #4603

Jsonnet

  • [CHANGE] Ruler: changed ruler deployment max surge from 0 to 50%, and max unavailable from 1 to 0. #4381
  • [CHANGE] Memcached connections parameters -blocks-storage.bucket-store.index-cache.memcached.max-idle-connections, -blocks-storage.bucket-store.chunks-cache.memcached.max-idle-connections and -blocks-storage.bucket-store.metadata-cache.memcached.max-idle-connections settings are now configured based on max-get-multi-concurrency and max-async-concurrency. #4591
  • [CHANGE] Add support to use external Redis as cache. Following are some changes in the jsonnet config: #4386 #4640
    • Renamed memcached_*_enabled config options to cache_*_enabled
    • Renamed memcached_*_max_item_size_mb config options to cache_*_max_item_size_mb
    • Added cache_*_backend config options
  • [CHANGE] Store-gateway StatefulSets with disabled multi-zone deployment are also unregistered from the ring on shutdown. This eliminated resharding during rollouts, at the cost of extra effort during scaling down store-gateways. For more information see Scaling down store-gateways. #4713
  • [ENHANCEMENT] Alertmanager: add alertmanager_data_disk_size and alertmanager_data_disk_class configuration options, by default no storage class is set. #4389
  • [ENHANCEMENT] Update rollout-operator to v0.4.0. #4524
  • [ENHANCEMENT] Update memcached to memcached:1.6.19-alpine. #4581
  • [ENHANCEMENT] Add support for mTLS connections to Memcached servers. #4553
  • [ENHANCEMENT] Update the memcached-exporter to v0.11.2. #4570
  • [ENHANCEMENT] Autoscaling: Add autoscaling_query_frontend_memory_target_utilization, autoscaling_ruler_query_frontend_memory_target_utilization, and autoscaling_ruler_memory_target_utilization configuration options, for controlling the corresponding autoscaler memory thresholds. Each has a default of 1, i.e. 100%. #4612
  • [ENHANCEMENT] Distributor: add ability to set per-distributor limits via distributor_instance_limits using runtime configuration. #4627
  • [BUGFIX] Add missing query sharding settings for user_24M and user_32M plans. #4374

Mimirtool

  • [ENHANCEMENT] Backfill: mimirtool will now sleep and retry if it receives a 429 response while trying to finish an upload due to validation concurrency limits. #4598
  • [ENHANCEMENT] gauge panel type is supported now in mimirtool analyze dashboard. #4679
  • [ENHANCEMENT] Set a User-Agent header on requests to Mimir or Prometheus servers. #4700

Mimir Continuous Test

  • [FEATURE] Allow continuous testing of native histograms as well by enabling the flag -tests.write-read-series-test.histogram-samples-enabled. The metrics exposed by the tool will now have a new label called type with possible values of float, histogram_float_counter, histogram_float_gauge, histogram_int_counter, histogram_int_gauge, the list of metrics impacted: #4457
    • mimir_continuous_test_writes_total
    • mimir_continuous_test_writes_failed_total
    • mimir_continuous_test_queries_total
    • mimir_continuous_test_queries_failed_total
    • mimir_continuous_test_query_result_checks_total
    • mimir_continuous_test_query_result_checks_failed_total
  • [ENHANCEMENT] Added a new metric mimir_continuous_test_build_info that reports version information, similar to the existing cortex_build_info metric exposed by other Mimir components. #4712
  • [ENHANCEMENT] Add coherency for the selected ranges and instants of test queries. #4704

Query-tee

Documentation

  • [CHANGE] Clarify what deprecation means in the lifecycle of configuration parameters. #4499
  • [CHANGE] Update compactor split-groups and split-and-merge-shards recommendation on component page. #4623
  • [FEATURE] Add instructions about how to configure native histograms. #4527
  • [ENHANCEMENT] Runbook for MimirCompactorHasNotSuccessfullyRunCompaction extended to include scenario where compaction has fallen behind. #4609
  • [ENHANCEMENT] Add explanation for QPS values for reads in remote ruler mode and writes generally, to the Ruler dashboard page. #4629
  • [ENHANCEMENT] Expand zone-aware replication page to cover single physical availability zone deployments. #4631
  • [FEATURE] Add instructions to use puppet module. #4610

Tools

  • [ENHANCEMENT] tsdb-index: iteration over index is now faster when any equal matcher is supplied. #4515

All changes in this release: mimir-2.7.3...mimir-2.8.0