Skip to content

Commit

Permalink
Merge branch 'main' into helm_prometheusrule
Browse files Browse the repository at this point in the history
* main: (63 commits)
  Add new section on website for links to blog posts, podcasts and talks. (grafana#2216)
  Rename codified errors to errors catalog (grafana#2256)
  Helm: add a step to contributing doc (grafana#2257)
  Signal that 2.2 release is now in progress. (grafana#2254)
  Removed migration of alertmanager local state files from old hierarchy (Cortex 1.8 and earlier) (grafana#2253)
  operations/mimir: Change multi_zone_ingester_max_unavailable to 25 (grafana#2251)
  Helm: weekly release (grafana#2252)
  Jsonnet: Configure ingester max global metadata per user and per metric (grafana#2250)
  Helm: metamonitor naming (grafana#2236)
  Mimir documentation about out-of-order (grafana#2183)
  Vendor latest mimir-prometheus/main (grafana#2243)
  Set CODEOWNERS to primary technical writer (grafana#2242)
  Use BasicLifecycler for distributors and auto-forget (grafana#2154)
  Docs: Basic documentation for deploying the ruler using jsonnet. (grafana#2127)
  Fix post merge reviews on 2187 (grafana#2230)
  Add tests for user metadata in the ingester (grafana#2184)
  Change the error message template for per-tenant limits (grafana#2234)
  helm: meta-monitoring (grafana#2068)
  Article about migrating from Consul to memberlist. Added documentation for /memberlist endpoint. (grafana#2166)
  Update runbooks to mention possibility to investigate memberlist KV store in various alerts (grafana#2158)
  ...
  • Loading branch information
rlex committed Jun 28, 2022
2 parents 419b251 + 8fabc83 commit 208d9c8
Show file tree
Hide file tree
Showing 471 changed files with 17,696 additions and 6,980 deletions.
1 change: 1 addition & 0 deletions .github/workflows/helm-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ jobs:
with:
ct_configfile: operations/helm/ct.yaml
ct_check_version_increment: false
helm_version: v3.8.2
2 changes: 1 addition & 1 deletion .github/workflows/test-build-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ jobs:
- name: Set up Helm
uses: azure/setup-helm@v1
with:
version: v3.5.2
version: v3.8.2
- name: Check Helm Tests
run: make BUILD_IN_CONTAINER=false check-helm-tests

Expand Down
57 changes: 50 additions & 7 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,51 +4,78 @@

### Grafana Mimir

### Mixin

### Jsonnet

### Mimirtool

### Mimir Continuous Test

### Documentation


## 2.2.0-rc.0

### Grafana Mimir

* [CHANGE] Increased default configuration for `-server.grpc-max-recv-msg-size-bytes` and `-server.grpc-max-send-msg-size-bytes` from 4MB to 100MB. #1883
* [CHANGE] Default values have changed for the following settings. This improves query performance for recent data (within 12h) by only reading from ingesters: #1909 #1921
- `-blocks-storage.bucket-store.ignore-blocks-within` now defaults to `10h` (previously `0`)
- `-querier.query-store-after` now defaults to `12h` (previously `0`)
- `-querier.shuffle-sharding-ingesters-lookback-period` now defaults to `13h` (previously `0`)
* [CHANGE] Alertmanager: removed support for migrating local files from Cortex 1.8 or earlier. Related to original Cortex PR https://github.com/cortexproject/cortex/pull/3910. #2253
* [CHANGE] The following settings are now classified as advanced because the defaults should work for most users and tuning them requires in-depth knowledge of how the read path works: #1929
- `-querier.query-ingesters-within`
- `-querier.query-store-after`
* [CHANGE] Config flag category overrides can be set dynamically at runtime. #1934
* [CHANGE] Ingester: deprecated `-ingester.ring.join-after`. Mimir now behaves as this setting is always set to 0s. This configuration option will be removed in Mimir 2.4.0. #1965
* [CHANGE] Blocks uploaded by ingester no longer contain `__org_id__` label. Compactor now ignores this label and will compact blocks with and without this label together. `mimirconvert` tool will remove the label from blocks as "unknown" label. #1972
* [CHANGE] Querier: deprecated `-querier.shuffle-sharding-ingesters-lookback-period`, instead adding `-querier.shuffle-sharding-ingesters-enabled` to enable or disable shuffle sharding on the read path. The value of `-querier.query-ingesters-within` is now used internally for shuffle sharding lookback. #2110
* [CHANGE] Memberlist: `-memberlist.abort-if-join-fails` now defaults to false. Previously it defaulted to true. #2168
* [CHANGE] Ruler: `/api/v1/rules*` and `/prometheus/rules*` configuration endpoints are removed. Use `/prometheus/config/v1/rules*`. #2182
* [CHANGE] Ingester: `-ingester.exemplars-update-period` has been renamed to `-ingester.tsdb-config-update-period`. You can use it to update multiple, per-tenant TSDB configurations. #2187
* [FEATURE] Ingester: (Experimental) Add the ability to ingest out-of-order samples up to an allowed limit. If you enable this feature, it requires additional memory and disk space. This feature also enables a write-behind log, which might lead to longer ingester-start replays. When this feature is disabled, there is no overhead on memory, disk space, or startup times. #2187
* `-ingester.out-of-order-time-window`, as duration string, allows you to set how back in time a sample can be. The default is `0s`, where `s` is seconds.
* `cortex_ingester_tsdb_out_of_order_samples_appended_total` metric tracks the total number of out-of-order samples ingested by the ingester.
* `cortex_discarded_samples_total` has a new label `reason="sample-too-old"`, when the `-ingester.out-of-order-time-window` flag is greater than zero. The label tracks the number of samples that were discarded for being too old; they were out of order, but beyond the time window allowed.
* [ENHANCEMENT] Distributor: Added limit to prevent tenants from sending excessive number of requests: #1843
* The following CLI flags (and their respective YAML config options) have been added:
* `-distributor.request-rate-limit`
* `-distributor.request-burst-limit`
* The following metric is exposed to tell how many requests have been rejected:
* `cortex_discarded_requests_total`
* [ENHANCEMENT] Store-gateway: Add the experimental ability to run requests in a dedicated OS thread pool. This feature can be configured using `-store-gateway.thread-pool-size` and is disabled by default. Replaces the ability to run index header operations in a dedicated thread pool. #1660 #1812
* [ENHANCEMENT] Improved error messages to make them easier to understand; each now have a unique, global identifier that you can use to look up in the runbooks for more information. #1907 #1919 #1888 #1939 #1984 #2009 #2066 #2104
* [ENHANCEMENT] Improved error messages to make them easier to understand; each now have a unique, global identifier that you can use to look up in the runbooks for more information. #1907 #1919 #1888 #1939 #1984 #2009 #2066 #2104 #2150
* [ENHANCEMENT] Memberlist KV: incoming messages are now processed on per-key goroutine. This may reduce loss of "maintanance" packets in busy memberlist installations, but use more CPU. New `memberlist_client_received_broadcasts_dropped_total` counter tracks number of dropped per-key messages. #1912
* [ENHANCEMENT] Blocks Storage, Alertmanager, Ruler: add support a prefix to the bucket store (`*_storage.storage_prefix`). This enables using the same bucket for the three components. #1686 #1951
* [ENHANCEMENT] Upgrade Docker base images to `alpine:3.16.0`. #2028
* [ENHANCEMENT] Store-gateway: Add experimental configuration option for the store-gateway to attempt to pre-populate the file system cache when memory-mapping index-header files. Enabled with `-blocks-storage.bucket-store.index-header.map-populate-enabled=true`. Note this flag only has an effect when running on Linux. #2019 #2054
* [ENHANCEMENT] Chunk Mapper: reduce memory usage of async chunk mapper. #2043
* [ENHANCEMENT] Ingesters: Added new configuration option that makes it possible for mimir ingesters to perform queries on overlapping blocks in the filesystem. Enabled with `-blocks-storage.tsdb.allow-overlapping-queries`. #2091
* [ENHANCEMENT] Ingester: reduce sleep time when reading WAL. #2098
* [ENHANCEMENT] Compactor: Add HTTP API for uploading TSDB blocks. #1694
* [ENHANCEMENT] Compactor: Run sanity check on blocks storage configuration at startup. #2143
* [ENHANCEMENT] Compactor: Add HTTP API for uploading TSDB blocks. Enabled with `-compactor.block-upload-enabled`. #1694 #2126
* [ENHANCEMENT] Ingester: Enable querying overlapping blocks by default. #2187
* [ENHANCEMENT] Distributor: Auto-forget unhealthy distributors after ten failed ring heartbeats. #2154
* [BUGFIX] Fix regexp parsing panic for regexp label matchers with start/end quantifiers. #1883
* [BUGFIX] Ingester: fixed deceiving error log "failed to update cached shipped blocks after shipper initialisation", occurring for each new tenant in the ingester. #1893
* [BUGFIX] Ring: fix bug where instances may appear unhealthy in the hash ring web UI even though they are not. #1933
* [BUGFIX] API: gzip is now enforced when identity encoding is explicitly rejected. #1864
* [BUGFIX] Fix panic at startup when Mimir is running in monolithic mode and query sharding is enabled. #2036
* [BUGFIX] Ruler: report failed evaluation metric for any 5xx status code returned by the query-frontend when remote operational mode is enabled. #2053
* [BUGFIX] Ruler: report `cortex_ruler_queries_failed_total` metric for any remote query error except 4xx when remote operational mode is enabled. #2053 #2143
* [BUGFIX] Ingester: fix slow rollout when using `-ingester.ring.unregister-on-shutdown=false` with long `-ingester.ring.heartbeat-period`. #2085
* [BUGFIX] Ruler: add timeout for remote rule evaluation queries to prevent rule group evaluations getting stuck indefinitely. The duration is configurable with (`-ruler.query-frontend.timeout` (default `2m`). #2090
* [BUGFIX] Ruler: add timeout for remote rule evaluation queries to prevent rule group evaluations getting stuck indefinitely. The duration is configurable with `-querier.timeout` (default `2m`). #2090 #2222
* [BUGFIX] Limits: Active series custom tracker configuration has been named back from `active_series_custom_trackers_config` to `active_series_custom_trackers`. For backwards compatibility both version is going to be supported for until Mimir v2.4. When both fields are specified, `active_series_custom_trackers_config` takes precedence over `active_series_custom_trackers`. #2101
* [BUGFIX] Ingester: fixed the order of labels applied when incrementing the `cortex_discarded_metadata_total` metric. #2096
* [BUGFIX] Ingester: fixed bug where retrieving metadata for a metric with multiple metadata entries would return multiple copies of a single metadata entry rather than all available entries. #2096
* [BUGFIX] Distributor: canceled requests are no longer accounted as internal errors. #2157

### Mixin

* [CHANGE] Split `mimir_queries` rules group into `mimir_queries` and `mimir_ingester_queries` to keep number of rules per group within the default per-tenant limit. #1885
* [CHANGE] Dashboards: Expose full image tag in "Mimir / Rollout progress" dashboard's "Pod per version panel." #1932
* [CHANGE] Dashboards: Disabled gateway panels by default, because most users don't have a gateway exposing the metrics expected by Mimir dashboards. You can re-enable it setting `gateway_enabled: true` in the mixin config and recompiling the mixin running `make build-mixin`. #1954
* [CHANGE] Alerts: adapt `MimirFrontendQueriesStuck` and `MimirSchedulerQueriesStuck` to consider ruler query path components. #1949
* [CHANGE] Alerts: Change `MimirRulerTooManyFailedQueries` severity to `critical`. #2165
* [ENHANCEMENT] Dashboards: Add config option `datasource_regex` to customise the regular expression used to select valid datasources for Mimir dashboards. #1802
* [ENHANCEMENT] Dashboards: Added "Mimir / Remote ruler reads" and "Mimir / Remote ruler reads resources" dashboards. #1911 #1937
* [ENHANCEMENT] Dashboards: Make networking panels work for pods created by the mimir-distributed helm chart. #1927
Expand All @@ -62,22 +89,31 @@
* [BUGFIX] Do not trigger `MimirAllocatingTooMuchMemory` alerts if no container limits are supplied. #1905
* [BUGFIX] Dashboards: Remove empty "Chunks per query" panel from `Mimir / Queries` dashboard. #1928
* [BUGFIX] Dashboards: Use Grafana's `$__rate_interval` for rate queries in dashboards to support scrape intervals of >15s. #2011
* [BUGFIX] Alerts: Make each version of `MimirCompactorHasNotUploadedBlocks` distinct to avoid rule evaluation failures due to duplicate series being generated. #2197

### Jsonnet

* [CHANGE] Remove use of `-querier.query-store-after`, `-querier.shuffle-sharding-ingesters-lookback-period`, `-blocks-storage.bucket-store.ignore-blocks-within`, and `-blocks-storage.tsdb.close-idle-tsdb-timeout` CLI flags since the values now match defaults. #1915 #1921
* [CHANGE] Change default value for `-blocks-storage.bucket-store.chunks-cache.memcached.timeout` to `450ms` to increase use of cached data. #2035
* [CHANGE] The `memberlist_ring_enabled` configuration now applies to Alertmanager. #2102
* [CHANGE] Default value for `memberlist_ring_enabled` is now true. It means that all hash rings use Memberlist as default KV store instead of Consul (previous default). #2161
* [CHANGE] Configure `-ingester.max-global-metadata-per-user` to correspond to 20% of the configured max number of series per tenant. #2250
* [CHANGE] Configure `-ingester.max-global-metadata-per-metric` to be 10. #2250
* [CHANGE] Change `_config.multi_zone_ingester_max_unavailable` to 25. #2251
* [FEATURE] Added querier autoscaling support. It requires [KEDA](https://keda.sh) installed in the Kubernetes cluster and query-scheduler enabled in the Mimir cluster. Querier autoscaler can be enabled and configure through the following options in the jsonnet config: #2013 #2023
* `autoscaling_querier_enabled`: `true` to enable autoscaling.
* `autoscaling_querier_min_replicas`: minimum number of querier replicas.
* `autoscaling_querier_max_replicas`: maximum number of querier replicas.
* `autoscaling_prometheus_url`: Prometheus base URL from which to scrape Mimir metrics (e.g. `http://prometheus.default:9090/prometheus`).
* [FEATURE] Jsonnet: Add support for ruler remote evaluation mode (`ruler_remote_evaluation_enabled`), which deploys and uses a dedicated query path for rule evaluation. This enables the benefits of the query-frontend for rule evaluation, such as query sharding. #2073
* [ENHANCEMENT] Added `compactor` service, that can be used to route requests directly to compactor (e.g. admin UI). #2063
* [ENHANCEMENT] Added a `consul_enabled` configuration option that defaults to true (matching previous behavior) to provide the ability to disable consul. #2093
* [ENHANCEMENT] Added a `consul_enabled` configuration option to provide the ability to disable consul. It is automatically set to false when `memberlist_ring_enabled` is true and `multikv_migration_enabled` (used for migration from Consul to memberlist) is not set. #2093 #2152
* [BUGFIX] Querier: Fix disabling shuffle sharding on the read path whilst keeping it enabled on write path. #2164

### Mimirtool

* [CHANGE] mimirtool rules: `--use-legacy-routes` now toggles between using `/prometheus/config/v1/rules` (default) and `/api/v1/rules` (legacy) endpoints. #2182
* [FEATURE] Added bearer token support for when Mimir is behind a gateway authenticating by bearer token. #2146
* [BUGFIX] mimirtool analyze: Fix dashboard JSON unmarshalling errors (#1840). #1973

### Mimir Continuous Test
Expand All @@ -92,11 +128,18 @@
* [ENHANCEMENT] Explain the runtime override of active series matchers. #1868
* [ENHANCEMENT] Clarify "Set rule group" API specification. #1869
* [ENHANCEMENT] Published Mimir jsonnet documentation. #2024
* [ENHANCEMENT] Documented required scrape interval for using alerting and recording rules from Mimir jsonnet. #2147
* [ENHANCEMENT] Runbooks: Mention memberlist as possible source of problems for various alerts. #2158
* [ENHANCEMENT] Added step-by-step article about migrating from Consul to Memberlist KV store using jsonnet without downtime. #2166
* [ENHANCEMENT] Documented `/memberlist` admin page. #2166
* [ENHANCEMENT] Documented how to configure queriers’ autoscaling with Jsonnet. #2128
* [BUGFIX] Fixed ruler configuration used in the getting started guide. #2052
* [BUGFIX] Fixed Mimir Alertmanager datasource in Grafana used by "Play with Grafana Mimir" tutorial. #2115

## 2.1.0

### Grafana Mimir

* [CHANGE] Compactor: No longer upload debug meta files to object storage. #1257
* [CHANGE] Default values have changed for the following settings: #1547
- `-alertmanager.alertmanager-client.grpc-max-recv-msg-size` now defaults to 100 MiB (previously was not configurable and set to 16 MiB)
Expand Down
2 changes: 1 addition & 1 deletion CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -1 +1 @@
/docs/ @grafana/docs-squad @jdbaldry
/docs/ @osg-grafana
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -476,7 +476,7 @@ mixin-serve: ## Runs Grafana (listening on port 3000) loading the mixin dashboar
@./operations/mimir-mixin-tools/serve/run.sh

mixin-screenshots: ## Generates mixin dashboards screenshots.
@find docs/sources/operators-guide/visualizing-metrics/dashboards -name '*.png' -delete
@find docs/sources/operators-guide/monitoring-grafana-mimir/dashboards -name '*.png' -delete
@./operations/mimir-mixin-tools/screenshots/run.sh

check-jsonnet-manifests: format-jsonnet-manifests
Expand Down

0 comments on commit 208d9c8

Please sign in to comment.