Skip to content

2.7.1

Compare
Choose a tag to compare
@aldernero aldernero released this 16 Mar 16:39
· 3178 commits to main since this release
dbe4ccd

This release contains 177 PRs from 43 authors, including new contributors Bartosz Cisek, dggmsa, gmintoco, Ihor Urazov, James Ross, Jean-Philippe Quéméner, Jon Gutschon, l3ioo, lpugoy, Nicolás Pazos, Oscar, Reto Kupferschmid, ying-jeanne. Thank you!

Grafana Mimir version 2.7.1 release notes

Grafana Labs is excited to announce version 2.7.1 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.

Note: During the release process, version 2.7.0 was tagged too early, before completing the release checklist and production testing. Release 2.7.1 doesn't include any code changes since 2.7.0, but now has proper release notes, published documentation, and has been fully tested in our production environment.

Features and enhancements

  • Store-gateway streaming enabled by default The new default value of 5000 for -blocks-storage.bucket-store.batch-series-size enables store-gateway streaming in the default configuration. This means that series are loaded from object storage in batches rather than buffering them all in memory before returning to the querier. Enabling streaming can reduce memory utilization peaks in the store-gateway.
  • Store-gateway index header reader no longer uses mmap by default Along with streaming enabled in the store-gateway, this change contributes to more efficient memory usage. See the Important changes section for more details.
  • Support for keep_firing_for option to ruler configuration This new option determines the amount of time an alert should keep firing while the ruler expression doesn't return results.
  • More efficient chunks fetching and caching Enable with the new experimental feature flag -blocks-storage.bucket-store.chunks-cache.fine-grained-chunks-caching-enabled=true. This should reduce CPU, memory utilization, and receive bandwidth of a store-gateway.
  • Experimental query sharding improvements:
    A new configuration parameter, -query-frontend.query-sharding-target-series-per-shard, allows query sharding to take into account cardinality of similar requests executed previously when computing the maximum number of shards to use. If you want to try it out, we recommend starting with a value of 2500.
  • Experimental support for native histogram ingestion:
    Native histograms can now be ingested. The new per-tenant limit -ingester.native-histograms-ingestion-enabled controls whether native histograms are stored or ignored. The support for querying native histograms is not complete yet and it's expected to be available in the next release.

Alertmanager improvements

  • New metrics The following upstream metrics are now exposed:
    • cortex_alertmanager_dispatcher_aggregation_groups
    • cortex_alertmanager_dispatcher_alert_processing_duration_seconds

Helm chart improvements

The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.

Important changes

In Grafana Mimir 2.7, the default vaules of the following configuration options have changed:

  • -blocks-storage.bucket-store.batch-series-size is now enabled by default with a value of 5000.
  • -ruler.evaluation-delay-duration has changed from 0 to 1m.

In Grafana Mimir 2.7, the following configuration options are now deprecated:

  • -blocks-storage.bucket-store.chunks-cache.subrange-size since there's no benefit to changing the default of 16000
  • -blocks-storage.bucket-store.consistency-delay has been deprecated and will be removed in Mimir 2.9.
  • -compactor.consistency-delay has been deprecated and will be removed in Mimir 2.9.
  • -ingester.ring.readiness-check-ring-health has been deprecated and will be removed in Mimir 2.9.

In Grafana Mimir 2.7, the following options, metrics, and labels have been removed:

  • Experimental support for ephemeral storage introduced in Mimir 2.6.0 has been removed.
    • Following options are no longer available:
      • -blocks-storage.ephemeral-tsdb.*
      • -distributor.ephemeral-series-enabled
      • -distributor.ephemeral-series-matchers
      • -ingester.max-ephemeral-series-per-user
      • -ingester.instance-limits.max-ephemeral-series
    • The following metrics have been removed:
      • cortex_ingester_ephemeral_series
      • cortex_ingester_ephemeral_series_created_total
      • cortex_ingester_ephemeral_series_removed_total
      • cortex_ingester_ingested_ephemeral_samples_total
      • cortex_ingester_ingested_ephemeral_samples_failures_total
      • cortex_ingester_memory_ephemeral_users
      • cortex_ingester_queries_ephemeral_total
      • cortex_ingester_queried_ephemeral_samples
      • cortex_ingester_queried_ephemeral_series
    • Additionally, querying using the {__mimir_storage__="ephemeral"} selector no longer works. All label values with the ephemeral- prefix within the reason label of the cortex_discarded_samples_total metric are no longer available.
  • The store-gateway default index header reader no longer uses mmap and the mmap-based index header reader has been removed. The following flags have been changed:
    • -blocks-storage.bucket-store.index-header.map-populate-enabled has been removed
    • -blocks-storage.bucket-store.index-header.stream-reader-enabled has been removed
    • -blocks-storage.bucket-store.index-header.stream-reader-max-idle-file-handles has been renamed to -blocks-storage.bucket-store.index-header.max-idle-file-handles, and the corresponding configuration file option has been renamed from stream_reader_max_idle_file_handles to max_idle_file_handles

Bug fixes

  • Store-gateway: return Canceled rather than Aborted or Internal error when the calling querier cancels a label names or values request, and return Internal if processing the request fails for another reason. PR 4061
  • Querier: track canceled requests with status code 499 in the metrics instead of 503 or 422. PR 4099
  • Ingester: compact out-of-order data during /ingester/flush or when TSDB is idle. PR 4180
  • Ingester: conversion of global limits max-series-per-user, max-series-per-metric, max-metadata-per-user and max-metadata-per-metric into corresponding local limits now takes into account the number of ingesters in each zone. PR 4238
  • Ingester: track cortex_ingester_memory_series metric consistently with cortex_ingester_memory_series_created_total and cortex_ingester_memory_series_removed_total. PR 4312
  • Querier: fixed a bug which was incorrectly matching series with regular expression label matchers with begin/end anchors in the middle of the regular expression. PR 4340

Changelog

2.7.1

Grafana Mimir

  • [CHANGE] Ingester: the configuration parameter -ingester.ring.readiness-check-ring-health has been deprecated and will be removed in Mimir 2.9. #4422
  • [CHANGE] Ruler: changed default value of -ruler.evaluation-delay-duration option from 0 to 1m. #4250
  • [CHANGE] Querier: Errors with status code 422 coming from the store-gateway are propagated and not converted to the consistency check error anymore. #4100
  • [CHANGE] Store-gateway: When a query hits max_fetched_chunks_per_query and max_fetched_series_per_query limits, an error with the status code 422 is created and returned. #4056
  • [CHANGE] Packaging: Migrate FPM packaging solution to NFPM. Rationalize packages dependencies and add package for all binaries. #3911
  • [CHANGE] Store-gateway: Deprecate flag -blocks-storage.bucket-store.chunks-cache.subrange-size since there's no benefit to changing the default of 16000. #4135
  • [CHANGE] Experimental support for ephemeral storage introduced in Mimir 2.6.0 has been removed. Following options are no longer available: #4252
    • -blocks-storage.ephemeral-tsdb.*
    • -distributor.ephemeral-series-enabled
    • -distributor.ephemeral-series-matchers
    • -ingester.max-ephemeral-series-per-user
    • -ingester.instance-limits.max-ephemeral-series
      Querying with using {__mimir_storage__="ephemeral"} selector no longer works. All label values with ephemeral- prefix in reason label of cortex_discarded_samples_total metric are no longer available. Following metrics have been removed:
    • cortex_ingester_ephemeral_series
    • cortex_ingester_ephemeral_series_created_total
    • cortex_ingester_ephemeral_series_removed_total
    • cortex_ingester_ingested_ephemeral_samples_total
    • cortex_ingester_ingested_ephemeral_samples_failures_total
    • cortex_ingester_memory_ephemeral_users
    • cortex_ingester_queries_ephemeral_total
    • cortex_ingester_queried_ephemeral_samples
    • cortex_ingester_queried_ephemeral_series
  • [CHANGE] Store-gateway: use mmap-less index-header reader by default and remove mmap-based index header reader. The following flags have changed: #4280
    • -blocks-storage.bucket-store.index-header.map-populate-enabled has been removed
    • -blocks-storage.bucket-store.index-header.stream-reader-enabled has been removed
    • -blocks-storage.bucket-store.index-header.stream-reader-max-idle-file-handles has been renamed to -blocks-storage.bucket-store.index-header.max-idle-file-handles, and the corresponding configuration file option has been renamed from stream_reader_max_idle_file_handles to max_idle_file_handles
  • [CHANGE] Store-gateway: the streaming store-gateway is now enabled by default. The new default setting for -blocks-storage.bucket-store.batch-series-size is 5000. #4330
  • [CHANGE] Compactor: the configuration parameter -compactor.consistency-delay has been deprecated and will be removed in Mimir 2.9. #4409
  • [CHANGE] Store-gateway: the configuration parameter -blocks-storage.bucket-store.consistency-delay has been deprecated and will be removed in Mimir 2.9. #4409
  • [FEATURE] Ruler: added keep_firing_for support to alerting rules. #4099
  • [FEATURE] Distributor, ingester: ingestion of native histograms. The new per-tenant limit -ingester.native-histograms-ingestion-enabled controls whether native histograms are stored or ignored. #4159
  • [FEATURE] Query-frontend: Introduce experimental -query-frontend.query-sharding-target-series-per-shard to allow query sharding to take into account cardinality of similar requests executed previously. This feature uses the same cache that's used for results caching. #4121 #4177 #4188 #4254
  • [ENHANCEMENT] Go: update go to 1.20.1. #4266
  • [ENHANCEMENT] Ingester: added out_of_order_blocks_external_label_enabled shipper option to label out-of-order blocks before shipping them to cloud storage. #4182 #4297
  • [ENHANCEMENT] Ruler: introduced concurrency when loading per-tenant rules configuration. This improvement is expected to speed up the ruler start up time in a Mimir cluster with a large number of tenants. #4258
  • [ENHANCEMENT] Compactor: Add reason label to cortex_compactor_runs_failed_total. The value can be shutdown or error. #4012
  • [ENHANCEMENT] Store-gateway: enforce max_fetched_series_per_query. #4056
  • [ENHANCEMENT] Query-frontend: Disambiguate logs for failed queries. #4067
  • [ENHANCEMENT] Query-frontend: log caller user agent in query stats logs. #4093
  • [ENHANCEMENT] Store-gateway: add data_type label with values on cortex_bucket_store_partitioner_extended_ranges_total, cortex_bucket_store_partitioner_expanded_ranges_total, cortex_bucket_store_partitioner_requested_ranges_total, cortex_bucket_store_partitioner_expanded_bytes_total, cortex_bucket_store_partitioner_requested_bytes_total for postings, series, and chunks. #4095
  • [ENHANCEMENT] Store-gateway: Reduce memory allocation rate when loading TSDB chunks from Memcached. #4074
  • [ENHANCEMENT] Query-frontend: track cortex_frontend_query_response_codec_duration_seconds and cortex_frontend_query_response_codec_payload_bytes metrics to measure the time taken and bytes read / written while encoding and decoding query result payloads. #4110
  • [ENHANCEMENT] Alertmanager: expose additional upstream metrics cortex_alertmanager_dispatcher_aggregation_groups, cortex_alertmanager_dispatcher_alert_processing_duration_seconds. #4151
  • [ENHANCEMENT] Querier and query-frontend: add experimental, more performant protobuf internal query result response format enabled with -query-frontend.query-result-response-format=protobuf. #4153
  • [ENHANCEMENT] Store-gateway: use more efficient chunks fetching and caching. This should reduce CPU, memory utilization, and receive bandwidth of a store-gateway. Enable with -blocks-storage.bucket-store.chunks-cache.fine-grained-chunks-caching-enabled=true. #4163 #4174 #4227
  • [ENHANCEMENT] Query-frontend: Wait for in-flight queries to finish before shutting down. #4073 #4170
  • [ENHANCEMENT] Store-gateway: added encode and other stage to cortex_bucket_store_series_request_stage_duration_seconds metric. #4179
  • [ENHANCEMENT] Ingester: log state of TSDB when shipping or forced compaction can't be done due to unexpected state of TSDB. #4211
  • [ENHANCEMENT] Update Docker base images from alpine:3.17.1 to alpine:3.17.2. #4240
  • [ENHANCEMENT] Store-gateway: add a stage label to the metrics cortex_bucket_store_series_data_fetched, cortex_bucket_store_series_data_size_fetched_bytes, cortex_bucket_store_series_data_touched, cortex_bucket_store_series_data_size_touched_bytes. This label only applies to data_type="chunks". For fetched metrics with data_type="chunks" the stage label has 2 values: fetched - the chunks or bytes that were fetched from the cache or the object store, refetched - the chunks or bytes that had to be refetched from the cache or the object store because their size was underestimated during the first fetch. For touched metrics with data_type="chunks" the stage label has 2 values: processed - the chunks or bytes that were read from the fetched chunks or bytes and were processed in memory, returned - the chunks or bytes that were selected from the processed bytes to satisfy the query. #4227 #4316
  • [ENHANCEMENT] Compactor: improve the partial block check related to compactor.partial-block-deletion-delay to potentially issue less requests to object storage. #4246
  • [ENHANCEMENT] Memcached: added -*.memcached.min-idle-connections-headroom-percentage support to configure the minimum number of idle connections to keep open as a percentage (0-100) of the number of recently used idle connections. This feature is disabled when set to a negative value (default), which means idle connections are kept open indefinitely. #4249
  • [ENHANCEMENT] Querier and store-gateway: optimized regular expression label matchers with case insensitive alternate operator. #4340 #4357
  • [ENHANCEMENT] Compactor: added the experimental flag -compactor.block-upload.block-validation-enabled with the default true to configure whether block validation occurs on backfilled blocks. #3411
  • [ENHANCEMENT] Ingester: apply a jitter to the first TSDB head compaction interval configured via -blocks-storage.tsdb.head-compaction-interval. Subsequent checks will happen at the configured interval. This should help to spread the TSDB head compaction among different ingesters over the configured interval. #4364
  • [ENHANCEMENT] Ingester: the maximum accepted value for -blocks-storage.tsdb.head-compaction-interval has been increased from 5m to 15m. #4364
  • [BUGFIX] Store-gateway: return Canceled rather than Aborted or Internal error when the calling querier cancels a label names or values request, and return Internal if processing the request fails for another reason. #4061
  • [BUGFIX] Querier: track canceled requests with status code 499 in the metrics instead of 503 or 422. #4099
  • [BUGFIX] Ingester: compact out-of-order data during /ingester/flush or when TSDB is idle. #4180
  • [BUGFIX] Ingester: conversion of global limits max-series-per-user, max-series-per-metric, max-metadata-per-user and max-metadata-per-metric into corresponding local limits now takes into account the number of ingesters in each zone. #4238
  • [BUGFIX] Ingester: track cortex_ingester_memory_series metric consistently with cortex_ingester_memory_series_created_total and cortex_ingester_memory_series_removed_total. #4312
  • [BUGFIX] Querier: fixed a bug which was incorrectly matching series with regular expression label matchers with begin/end anchors in the middle of the regular expression. #4340

Mixin

  • [CHANGE] Move auto-scaling panel rows down beneath logical network path in Reads and Writes dashboards. #4049
  • [CHANGE] Make distributor auto-scaling metric panels show desired number of replicas. #4218
  • [CHANGE] Alerts: The alert MimirMemcachedRequestErrors has been renamed to MimirCacheRequestErrors. #4242
  • [ENHANCEMENT] Alerts: Added MimirAutoscalerKedaFailing alert firing when a KEDA scaler is failing. #4045
  • [ENHANCEMENT] Add auto-scaling panels to ruler dashboard. #4046
  • [ENHANCEMENT] Add gateway auto-scaling panels to Reads and Writes dashboards. #4049 #4216
  • [ENHANCEMENT] Dashboards: distinguish between label names and label values queries. #4065
  • [ENHANCEMENT] Add query-frontend and ruler-query-frontend auto-scaling panels to Reads and Ruler dashboards. #4199
  • [BUGFIX] Alerts: Fixed MimirAutoscalerNotActive to not fire if scaling metric does not exist, to avoid false positives on scaled objects with 0 min replicas. #4045
  • [BUGFIX] Alerts: MimirCompactorHasNotSuccessfullyRunCompaction is no longer triggered by frequent compactor restarts. #4012
  • [BUGFIX] Tenants dashboard: Correctly show the ruler-query-scheduler queue size. #4152

Jsonnet

  • [CHANGE] Create the query-frontend-discovery service only when Mimir is deployed in microservice mode without query-scheduler. #4353
  • [CHANGE] Add results cache backend config to ruler-query-frontend configuration to allow cache reuse for cardinality-estimation based sharding. #4257
  • [ENHANCEMENT] Add support for ruler auto-scaling. #4046
  • [ENHANCEMENT] Add optional weight param to newQuerierScaledObject and newRulerQuerierScaledObject to allow running multiple querier deployments on different node types. #4141
  • [ENHANCEMENT] Add support for query-frontend and ruler-query-frontend auto-scaling. #4199
  • [BUGFIX] Shuffle sharding: when applying user class limits, honor the minimum shard size configured in $._config.shuffle_sharding.*. #4363

Mimirtool

  • [FEATURE] Added keep_firing_for support to rules configuration. #4099
  • [ENHANCEMENT] Add -tls-insecure-skip-verify to rules, alertmanager and backfill commands. #4162

Query-tee

  • [CHANGE] Increase default value of -backend.read-timeout to 150s, to accommodate default querier and query frontend timeout of 120s. #4262
  • [ENHANCEMENT] Log errors that occur while performing requests to compare two endpoints. #4262
  • [ENHANCEMENT] When comparing two responses that both contain an error, only consider the comparison failed if the errors differ. Previously, if either response contained an error, the comparison always failed, even if both responses contained the same error. #4262
  • [ENHANCEMENT] Include the value of the X-Scope-OrgID header when logging a comparison failure. #4262
  • [BUGFIX] Parameters (expression, time range etc.) for a query request where the parameters are in the HTTP request body rather than in the URL are now logged correctly when responses differ. #4265

Documentation

  • [ENHANCEMENT] Add guide on alternative migration method for Thanos to Mimir #3554
  • [ENHANCEMENT] Restore "Migrate from Cortex" for Jsonnet. #3929
  • [ENHANCEMENT] Document migration from microservices to read-write deployment mode. #3951
  • [ENHANCEMENT] Do not error when there is nothing to commit as part of a publish #4058
  • [ENHANCEMENT] Explain how to run Mimir locally using docker-compose #4079
  • [ENHANCEMENT] Docs: use long flag names in runbook commands. #4088
  • [ENHANCEMENT] Clarify how ingester replication happens. #4101
  • [ENHANCEMENT] Improvements to the Get Started guide. #4315
  • [BUGFIX] Added indentation to Azure and SWIFT backend definition. #4263

Tools

  • [ENHANCEMENT] Adapt tsdb-print-chunk for native histograms. #4186
  • [ENHANCEMENT] Adapt tsdb-index-health for blocks containing native histograms. #4186
  • [ENHANCEMENT] Adapt tsdb-chunks tool to handle native histograms. #4186

All changes in this release: mimir-2.6.0...mimir-2.7.1