Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add out-of-order sample support #2187

Merged
merged 9 commits into from
Jun 24, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@
* [CHANGE] Blocks uploaded by ingester no longer contain `__org_id__` label. Compactor now ignores this label and will compact blocks with and without this label together. `mimirconvert` tool will remove the label from blocks as "unknown" label. #1972
* [CHANGE] Querier: deprecated `-querier.shuffle-sharding-ingesters-lookback-period`, instead adding `-querier.shuffle-sharding-ingesters-enabled` to enable or disable shuffle sharding on the read path. The value of `-querier.query-ingesters-within` is now used internally for shuffle sharding lookback. #2110
* [CHANGE] Memberlist: `-memberlist.abort-if-join-fails` now defaults to false. Previously it defaulted to true. #2168
* [CHANGE] Ingester: `-ingester.exemplars-update-period` has been renamed to `-ingester.tsdb-config-update-period` and is used to update multiple per-tenant TSDB config. #2187
codesome marked this conversation as resolved.
Show resolved Hide resolved
* [FEATURE] Ingester: Add experimental ability to ingest out of order samples up to an allowed limit. Enabling this takes additional memory and disk space. It also enables a write behind log that could lead to longer ingester start replays. There is no overhead on memory, disk space, startup times, with it being disabled. #2187
* `-ingester.out-of-order-allowance` allows setting how back in time a sample can be as duration string. Defaults to `0s`.
codesome marked this conversation as resolved.
Show resolved Hide resolved
* `cortex_ingester_tsdb_out_of_order_samples_appended_total` metric tracks the total number of out of samples ingested by the ingester.
codesome marked this conversation as resolved.
Show resolved Hide resolved
* [ENHANCEMENT] Distributor: Added limit to prevent tenants from sending excessive number of requests: #1843
* The following CLI flags (and their respective YAML config options) have been added:
* `-distributor.request-rate-limit`
Expand All @@ -30,10 +34,10 @@
* [ENHANCEMENT] Upgrade Docker base images to `alpine:3.16.0`. #2028
* [ENHANCEMENT] Store-gateway: Add experimental configuration option for the store-gateway to attempt to pre-populate the file system cache when memory-mapping index-header files. Enabled with `-blocks-storage.bucket-store.index-header.map-populate-enabled=true`. Note this flag only has an effect when running on Linux. #2019 #2054
* [ENHANCEMENT] Chunk Mapper: reduce memory usage of async chunk mapper. #2043
* [ENHANCEMENT] Ingesters: Added new configuration option that makes it possible for mimir ingesters to perform queries on overlapping blocks in the filesystem. Enabled with `-blocks-storage.tsdb.allow-overlapping-queries`. #2091
* [ENHANCEMENT] Ingester: reduce sleep time when reading WAL. #2098
* [ENHANCEMENT] Compactor: Run sanity check on blocks storage configuration at startup. #2143
* [ENHANCEMENT] Compactor: Add HTTP API for uploading TSDB blocks. Enabled with `-compactor.block-upload-enabled`. #1694 #2126
* [ENHANCEMENT] Ingester: Enable querying overlapping blocks by default. #2187
* [BUGFIX] Fix regexp parsing panic for regexp label matchers with start/end quantifiers. #1883
* [BUGFIX] Ingester: fixed deceiving error log "failed to update cached shipped blocks after shipper initialisation", occurring for each new tenant in the ingester. #1893
* [BUGFIX] Ring: fix bug where instances may appear unhealthy in the hash ring web UI even though they are not. #1933
Expand Down
50 changes: 36 additions & 14 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -2306,12 +2306,12 @@
},
{
"kind": "field",
"name": "exemplars_update_period",
"name": "tsdb_config_update_period",
"required": false,
"desc": "Period with which to update per-tenant max exemplar limit.",
"desc": "Period with which to update per-tenant TSDB config.",
codesome marked this conversation as resolved.
Show resolved Hide resolved
"fieldValue": null,
"fieldDefaultValue": 15000000000,
"fieldFlag": "ingester.exemplars-update-period",
"fieldFlag": "ingester.tsdb-config-update-period",
"fieldType": "duration",
"fieldCategory": "experimental"
},
Expand Down Expand Up @@ -2648,6 +2648,17 @@
"fieldType": "map of tracker name (string) to matcher (string)",
"fieldCategory": "advanced"
},
{
"kind": "field",
"name": "out_of_order_time_window",
"required": false,
"desc": "Non-zero value enables out-of-order support for most recent samples in this time window. Ingester will need more memory that is a factor of rate of out of order sample being ingested and number of series getting out of order samples. It can be configured per-tenant.",
"fieldValue": null,
"fieldDefaultValue": 0,
"fieldFlag": "ingester.out-of-order-time-window",
"fieldType": "duration",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "max_fetched_chunks_per_query",
Expand Down Expand Up @@ -5380,17 +5391,6 @@
"fieldType": "boolean",
"fieldCategory": "advanced"
},
{
"kind": "field",
"name": "allow_overlapping_queries",
"required": false,
"desc": "Enable querying overlapping blocks. If there are going to be overlapping blocks in the ingesters this should be enabled.",
"fieldValue": null,
"fieldDefaultValue": false,
"fieldFlag": "blocks-storage.tsdb.allow-overlapping-queries",
"fieldType": "boolean",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "series_hash_cache_max_size_bytes",
Expand All @@ -5412,6 +5412,28 @@
"fieldFlag": "blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup",
"fieldType": "int",
"fieldCategory": "advanced"
},
{
"kind": "field",
"name": "out_of_order_cap_min",
"required": false,
"desc": "Minimum capacity for out of order chunks (in samples. between 0 and 255.)",
codesome marked this conversation as resolved.
Show resolved Hide resolved
"fieldValue": null,
"fieldDefaultValue": 4,
"fieldFlag": "blocks-storage.tsdb.out-of-order-cap-min",
"fieldType": "int",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "out_of_order_cap_max",
"required": false,
"desc": "Maximum capacity for out of order chunks (in samples. between 1 and 255.)",
codesome marked this conversation as resolved.
Show resolved Hide resolved
"fieldValue": null,
"fieldDefaultValue": 32,
"fieldFlag": "blocks-storage.tsdb.out-of-order-cap-max",
"fieldType": "int",
"fieldCategory": "experimental"
}
],
"fieldValue": null,
Expand Down
12 changes: 8 additions & 4 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -477,8 +477,6 @@ Usage of ./cmd/mimir/mimir:
OpenStack Swift user ID.
-blocks-storage.swift.username string
OpenStack Swift username.
-blocks-storage.tsdb.allow-overlapping-queries
[experimental] Enable querying overlapping blocks. If there are going to be overlapping blocks in the ingesters this should be enabled.
-blocks-storage.tsdb.block-ranges-period value
TSDB blocks range period. (default 2h0m0s)
-blocks-storage.tsdb.close-idle-tsdb-timeout duration
Expand Down Expand Up @@ -507,6 +505,10 @@ Usage of ./cmd/mimir/mimir:
[experimental] True to enable snapshotting of in-memory TSDB data on disk when shutting down.
-blocks-storage.tsdb.new-chunk-disk-mapper
[experimental] Temporary flag to select whether to use the new (used in upstream Prometheus) or the old (legacy) chunk disk mapper.
-blocks-storage.tsdb.out-of-order-cap-max int
[experimental] Maximum capacity for out of order chunks (in samples. between 1 and 255.) (default 32)
codesome marked this conversation as resolved.
Show resolved Hide resolved
-blocks-storage.tsdb.out-of-order-cap-min int
[experimental] Minimum capacity for out of order chunks (in samples. between 0 and 255.) (default 4)
-blocks-storage.tsdb.retention-period duration
TSDB blocks retention in the ingester before a block is removed, relative to the newest block written for the tenant. This should be larger than the -blocks-storage.tsdb.block-ranges-period, -querier.query-store-after and large enough to give store-gateways and queriers enough time to discover newly uploaded blocks. (default 24h0m0s)
-blocks-storage.tsdb.series-hash-cache-max-size-bytes uint
Expand Down Expand Up @@ -841,8 +843,6 @@ Usage of ./cmd/mimir/mimir:
Path to the key file for the client certificate. Also requires the client certificate to be configured.
-ingester.client.tls-server-name string
Override the expected name on the server certificate.
-ingester.exemplars-update-period duration
[experimental] Period with which to update per-tenant max exemplar limit. (default 15s)
-ingester.ignore-series-limit-for-metric-names string
Comma-separated list of metric names, for which the -ingester.max-global-series-per-metric limit will be ignored. Does not affect the -ingester.max-global-series-per-user limit.
-ingester.instance-limits.max-inflight-push-requests int
Expand All @@ -865,6 +865,8 @@ Usage of ./cmd/mimir/mimir:
The maximum number of active series per tenant, across the cluster before replication. 0 to disable. (default 150000)
-ingester.metadata-retain-period duration
Period at which metadata we have not seen will remain in memory before being deleted. (default 10m0s)
-ingester.out-of-order-time-window value
[experimental] Non-zero value enables out-of-order support for most recent samples in this time window. Ingester will need more memory that is a factor of rate of out of order sample being ingested and number of series getting out of order samples. It can be configured per-tenant.
-ingester.rate-update-period duration
Period with which to update the per-tenant ingestion rates. (default 15s)
-ingester.ring.consul.acl-token string
Expand Down Expand Up @@ -949,6 +951,8 @@ Usage of ./cmd/mimir/mimir:
True to enable the zone-awareness and replicate ingested samples across different availability zones. This option needs be set on ingesters, distributors, queriers and rulers when running in microservices mode.
-ingester.stream-chunks-when-using-blocks
Stream chunks from ingesters to queriers. (default true)
-ingester.tsdb-config-update-period duration
[experimental] Period with which to update per-tenant TSDB config. (default 15s)
-log.format value
Output log messages in the given format. Valid formats: [logfmt, json] (default logfmt)
-log.level value
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -741,9 +741,9 @@ ring:
# prod: '{namespace=~"prod-.*"}'
[active_series_custom_trackers: <map of tracker name (string) to matcher (string)> | default = ]

# (experimental) Period with which to update per-tenant max exemplar limit.
# CLI flag: -ingester.exemplars-update-period
[exemplars_update_period: <duration> | default = 15s]
# (experimental) Period with which to update per-tenant TSDB config.
codesome marked this conversation as resolved.
Show resolved Hide resolved
# CLI flag: -ingester.tsdb-config-update-period
[tsdb_config_update_period: <duration> | default = 15s]

instance_limits:
# (advanced) Max ingestion rate (samples/sec) that ingester will accept. This
Expand Down Expand Up @@ -2720,6 +2720,13 @@ The `limits` block configures default and per-tenant limits imposed by component
# CLI flag: -ingester.active-series-custom-trackers
[active_series_custom_trackers: <map of tracker name (string) to matcher (string)> | default = ]

# (experimental) Non-zero value enables out-of-order support for most recent
codesome marked this conversation as resolved.
Show resolved Hide resolved
# samples in this time window. Ingester will need more memory that is a factor
# of rate of out of order sample being ingested and number of series getting out
# of order samples. It can be configured per-tenant.
# CLI flag: -ingester.out-of-order-time-window
[out_of_order_time_window: <duration> | default = 0s]

# Maximum number of chunks that can be fetched in a single query from ingesters
# and long-term storage. This limit is enforced in the querier, ruler and
# store-gateway. 0 to disable.
Expand Down Expand Up @@ -3518,11 +3525,6 @@ tsdb:
# CLI flag: -blocks-storage.tsdb.isolation-enabled
[isolation_enabled: <boolean> | default = false]

# (experimental) Enable querying overlapping blocks. If there are going to be
# overlapping blocks in the ingesters this should be enabled.
# CLI flag: -blocks-storage.tsdb.allow-overlapping-queries
[allow_overlapping_queries: <boolean> | default = false]

# (advanced) Max size - in bytes - of the in-memory series hash cache. The
# cache is shared across all tenants and it's used only when query sharding is
# enabled.
Expand All @@ -3532,6 +3534,16 @@ tsdb:
# (advanced) limit the number of concurrently opening TSDB's on startup
# CLI flag: -blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup
[max_tsdb_opening_concurrency_on_startup: <int> | default = 10]

# (experimental) Minimum capacity for out of order chunks (in samples. between
codesome marked this conversation as resolved.
Show resolved Hide resolved
# 0 and 255.)
# CLI flag: -blocks-storage.tsdb.out-of-order-cap-min
[out_of_order_cap_min: <int> | default = 4]

# (experimental) Maximum capacity for out of order chunks (in samples. between
# 1 and 255.)
# CLI flag: -blocks-storage.tsdb.out-of-order-cap-max
[out_of_order_cap_max: <int> | default = 32]
```

### compactor
Expand Down
5 changes: 5 additions & 0 deletions docs/sources/operators-guide/mimir-runbooks/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -1412,6 +1412,11 @@ Common **causes**:

> **Note**: You can learn more about out of order samples in Prometheus, in the blog post [Debugging out of order samples](https://www.robustperception.io/debugging-out-of-order-samples/).

### err-mimir-sample-too-old

This error is very similar to `err-mimir-sample-out-of-order` above. The main difference is that, the out-of-order support was enabled, but the sample was
codesome marked this conversation as resolved.
Show resolved Hide resolved
older than the out-of-order allowance w.r.t. the latest sample for that particular time series or the TSDB.

codesome marked this conversation as resolved.
Show resolved Hide resolved
### err-mimir-sample-duplicate-timestamp

This error occurs when the ingester rejects a sample because it is a duplicate of a previously received sample with the same timestamp but different value in the same time series.
Expand Down
Loading