Skip to content

Commit

Permalink
Add tenant string in per-tenant error labels
Browse files Browse the repository at this point in the history
  • Loading branch information
zenador committed Jun 8, 2022
1 parent ad505db commit 758ef72
Show file tree
Hide file tree
Showing 5 changed files with 45 additions and 43 deletions.
32 changes: 16 additions & 16 deletions docs/sources/operators-guide/mimir-runbooks/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -1049,7 +1049,7 @@ A metric name can only contain characters as defined by Prometheus’ [Metric na

> **Note**: Invalid series are skipped during the ingestion, and valid series within the same request are ingested.
### err-mimir-max-label-names-per-series
### err-mimir-tenant-max-label-names-per-series

This non-critical error occurs when Mimir receives a write request that contains a series with a number of labels that exceed the configured limit.
The limit protects the system’s stability from potential abuse or mistakes. To configure the limit on a per-tenant basis, use the `-validation.max-label-names-per-series` option.
Expand All @@ -1063,14 +1063,14 @@ A label name name can only contain characters as defined by Prometheus’ [Metri

> **Note**: Invalid series are skipped during the ingestion, and valid series within the same request are ingested.
### err-mimir-label-name-too-long
### err-mimir-tenant-label-name-too-long

This non-critical error occurs when Mimir receives a write request that contains a series with a label name whose length exceeds the configured limit.
The limit protects the system’s stability from potential abuse or mistakes. To configure the limit on a per-tenant basis, use the `-validation.max-length-label-name` option.

> **Note**: Invalid series are skipped during the ingestion, and valid series within the same request are ingested.
### err-mimir-label-value-too-long
### err-mimir-tenant-label-value-too-long

This non-critical error occurs when Mimir receives a write request that contains a series with a label value whose length exceeds the configured limit.
The limit protects the system’s stability from potential abuse or mistakes. To configure the limit on a per-tenant basis, use the `-validation.max-length-label-value` option.
Expand All @@ -1092,7 +1092,7 @@ If you experience this error, [open an issue in the Mimir repository](https://gi

> **Note**: Invalid series are skipped during the ingestion, and valid series within the same request are ingested.
### err-mimir-too-far-in-future
### err-mimir-tenant-too-far-in-future

This non-critical error occurs when Mimir receives a write request that contains a sample whose timestamp is in the future compared to the current "real world" time.
Mimir accepts timestamps that are slightly in the future, due to skewed clocks for example. It rejects timestamps that are too far in the future, based on the definition that you can set via the `-validation.create-grace-period` option.
Expand Down Expand Up @@ -1128,21 +1128,21 @@ Each metric metadata must have a metric name. Rarely it does not, in which case

> **Note**: Invalid metrics metadata are skipped during the ingestion, and valid metadata within the same request are ingested.
### err-mimir-metric-name-too-long
### err-mimir-tenant-metric-name-too-long

This non-critical error occurs when Mimir receives a write request that contains a metric metadata with a metric name whose length exceeds the configured limit.
The limit protects the system’s stability from potential abuse or mistakes. To configure the limit on a per-tenant basis, use the `-validation.max-metadata-length` option.

> **Note**: Invalid metrics metadata are skipped during the ingestion, and valid metadata within the same request are ingested.
### err-mimir-help-too-long
### err-mimir-tenant-help-too-long

This non-critical error occurs when Mimir receives a write request that contains a metric metadata with an help description whose length exceeds the configured limit.
The limit protects the system’s stability from potential abuse or mistakes. To configure the limit on a per-tenant basis, use the `-validation.max-metadata-length` option.

> **Note**: Invalid metrics metadata are skipped during the ingestion, and valid metadata within the same request are ingested.
### err-mimir-unit-too-long
### err-mimir-tenant-unit-too-long

This non-critical error occurs when Mimir receives a write request that contains a metric metadata with unit name whose length exceeds the configured limit.
The limit protects the system’s stability from potential abuse or mistakes. To configure the limit on a per-tenant basis, use the `-validation.max-metadata-length` option.
Expand Down Expand Up @@ -1231,7 +1231,7 @@ How to **fix** it:
- Check the write requests latency through the `Mimir / Writes` dashboard and come back to investigate the root cause of high latency (the higher the latency, the higher the number of in-flight write requests).
- Consider scaling out the ingesters.

### err-mimir-max-series-per-user
### err-mimir-tenant-max-series-per-user

This error occurs when the number of in-memory series for a given tenant exceeds the configured limit.

Expand All @@ -1243,7 +1243,7 @@ How to **fix** it:
- Ensure the actual number of series written by the affected tenant is legit.
- Consider increasing the per-tenant limit by using the `-ingester.max-global-series-per-user` option (or `max_global_series_per_user` in the runtime configuration).

### err-mimir-max-series-per-metric
### err-mimir-tenant-max-series-per-metric

This error occurs when the number of in-memory series for a given tenant and metric name exceeds the configured limit.

Expand All @@ -1260,7 +1260,7 @@ How to **fix** it:
- Consider increasing the per-tenant limit by using the `-ingester.max-global-series-per-metric` option.
- Consider excluding specific metric names from this limit's check by using the `-ingester.ignore-series-limit-for-metric-names` option (or `max_global_series_per_metric` in the runtime configuration).

### err-mimir-max-metadata-per-user
### err-mimir-tenant-max-metadata-per-user

This non-critical error occurs when the number of in-memory metrics with metadata for a given tenant exceeds the configured limit.

Expand All @@ -1277,7 +1277,7 @@ How to **fix** it:
- Check the current number of metric names for the affected tenant, running the instant query `count(count by(__name__) ({__name__=~".+"}))`. Alternatively, you can get the cardinality of `__name__` label calling the API endpoint `/api/v1/cardinality/label_names`.
- Consider increasing the per-tenant limit setting to a value greater than the number of unique metric names returned by the previous query.

### err-mimir-max-metadata-per-metric
### err-mimir-tenant-max-metadata-per-metric

This non-critical error occurs when the number of different metadata for a given metric name exceeds the configured limit.

Expand All @@ -1295,7 +1295,7 @@ How to **fix** it:
- If the different metadata is unexpected, consider fixing the discrepancy in the instrumented applications.
- If the different metadata is expected, consider increasing the per-tenant limit by using the `-ingester.max-global-series-per-metric` option (or `max_global_metadata_per_metric` in the runtime configuration).

### err-mimir-max-chunks-per-query
### err-mimir-tenant-max-chunks-per-query

This error occurs when a query execution exceeds the limit on the number of series chunks fetched.

Expand All @@ -1307,7 +1307,7 @@ How to **fix** it:
- Consider reducing the time range and/or cardinality of the query. To reduce the cardinality of the query, you can add more label matchers to the query, restricting the set of matching series.
- Consider increasing the per-tenant limit by using the `-querier.max-fetched-chunks-per-query` option (or `max_fetched_chunks_per_query` in the runtime configuration).

### err-mimir-max-series-per-query
### err-mimir-tenant-max-series-per-query

This error occurs when a query execution exceeds the limit on the maximum number of series.

Expand All @@ -1319,7 +1319,7 @@ How to **fix** it:
- Consider reducing the time range and/or cardinality of the query. To reduce the cardinality of the query, you can add more label matchers to the query, restricting the set of matching series.
- Consider increasing the per-tenant limit by using the `-querier.max-fetched-series-per-query` option (or `max_fetched_series_per_query` in the runtime configuration).

### err-mimir-max-chunks-bytes-per-query
### err-mimir-tenant-max-chunks-bytes-per-query

This error occurs when a query execution exceeds the limit on aggregated size (in bytes) of fetched chunks.

Expand All @@ -1331,7 +1331,7 @@ How to **fix** it:
- Consider reducing the time range and/or cardinality of the query. To reduce the cardinality of the query, you can add more label matchers to the query, restricting the set of matching series.
- Consider increasing the per-tenant limit by using the `-querier.max-fetched-chunk-bytes-per-query` option (or `max_fetched_chunk_bytes_per_query` in the runtime configuration).

### err-mimir-max-query-length
### err-mimir-tenant-max-query-length

This error occurs when the time range of a query exceeds the configured maximum length.

Expand Down Expand Up @@ -1372,7 +1372,7 @@ How to **fix** it:

- Increase the per-tenant limit by using the `-distributor.ingestion-rate-limit` (samples per second) and `-distributor.ingestion-burst-size` (number of samples) options (or `ingestion_rate` and `ingestion_burst_size` in the runtime configuration). The configurable burst represents how many samples, exemplars and metadata can temporarily exceed the limit, in case of short traffic peaks. The configured burst size must be greater or equal than the configured limit.

### err-mimir-too-many-ha-clusters
### err-mimir-tenant-too-many-ha-clusters

This error occurs when a distributor rejects a write request because the number of [high-availability (HA) clusters]({{< relref "../configuring/configuring-high-availability-deduplication.md" >}}) has hit the configured limit for this tenant.

Expand Down
2 changes: 1 addition & 1 deletion pkg/distributor/distributor_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -3502,7 +3502,7 @@ func TestDistributorValidation(t *testing.T) {
Value: 4,
}},
expectedStatusCode: http.StatusBadRequest,
expectedErr: fmt.Sprintf(`received a sample whose timestamp is too far in the future, timestamp: %d series: 'testmetric' (err-mimir-too-far-in-future)`, future),
expectedErr: fmt.Sprintf(`received a sample whose timestamp is too far in the future, timestamp: %d series: 'testmetric' (err-mimir-tenant-too-far-in-future)`, future),
},

// Test maximum labels names per series.
Expand Down
34 changes: 18 additions & 16 deletions pkg/util/globalerror/errors.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,20 +16,21 @@ const (

MissingMetricName ID = "missing-metric-name"
InvalidMetricName ID = "metric-name-invalid"
MaxLabelNamesPerSeries ID = "max-label-names-per-series"
SeriesInvalidLabel ID = "label-invalid"
SeriesLabelNameTooLong ID = "label-name-too-long"
SeriesLabelValueTooLong ID = "label-value-too-long"
SeriesWithDuplicateLabelNames ID = "duplicate-label-names"
SeriesLabelsNotSorted ID = "labels-not-sorted"
SampleTooFarInFuture ID = "too-far-in-future"
MaxSeriesPerMetric ID = "max-series-per-metric"
MaxMetadataPerMetric ID = "max-metadata-per-metric"
MaxSeriesPerUser ID = "max-series-per-user"
MaxMetadataPerUser ID = "max-metadata-per-user"
MaxChunksPerQuery ID = "max-chunks-per-query"
MaxSeriesPerQuery ID = "max-series-per-query"
MaxChunkBytesPerQuery ID = "max-chunks-bytes-per-query"

MaxLabelNamesPerSeries ID = "tenant-max-label-names-per-series"
SeriesLabelNameTooLong ID = "tenant-label-name-too-long"
SeriesLabelValueTooLong ID = "tenant-label-value-too-long"
SampleTooFarInFuture ID = "tenant-too-far-in-future"
MaxSeriesPerMetric ID = "tenant-max-series-per-metric"
MaxMetadataPerMetric ID = "tenant-max-metadata-per-metric"
MaxSeriesPerUser ID = "tenant-max-series-per-user"
MaxMetadataPerUser ID = "tenant-max-metadata-per-user"
MaxChunksPerQuery ID = "tenant-max-chunks-per-query"
MaxSeriesPerQuery ID = "tenant-max-series-per-query"
MaxChunkBytesPerQuery ID = "tenant-max-chunks-bytes-per-query"

DistributorMaxIngestionRate ID = "distributor-max-ingestion-rate"
DistributorMaxInflightPushRequests ID = "distributor-max-inflight-push-requests"
Expand All @@ -44,14 +45,15 @@ const (
ExemplarTimestampInvalid ID = "exemplar-timestamp-invalid"

MetricMetadataMissingMetricName ID = "metadata-missing-metric-name"
MetricMetadataMetricNameTooLong ID = "metric-name-too-long"
MetricMetadataHelpTooLong ID = "help-too-long"
MetricMetadataUnitTooLong ID = "unit-too-long"

MaxQueryLength ID = "max-query-length"
MetricMetadataMetricNameTooLong ID = "tenant-metric-name-too-long"
MetricMetadataHelpTooLong ID = "tenant-help-too-long"
MetricMetadataUnitTooLong ID = "tenant-unit-too-long"

MaxQueryLength ID = "tenant-max-query-length"
RequestRateLimited ID = "tenant-max-request-rate"
IngestionRateLimited ID = "tenant-max-ingestion-rate"
TooManyHAClusters ID = "too-many-ha-clusters"
TooManyHAClusters ID = "tenant-too-many-ha-clusters"
)

// Message returns the provided msg, appending the error id.
Expand Down
8 changes: 4 additions & 4 deletions pkg/util/validation/errors_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,20 +18,20 @@ func TestNewMetadataMetricNameMissingError(t *testing.T) {

func TestNewMetadataMetricNameTooLongError(t *testing.T) {
err := newMetadataMetricNameTooLongError(&mimirpb.MetricMetadata{MetricFamilyName: "test_metric", Unit: "counter", Help: "This is a test metric."})
assert.Equal(t, "received a metric metadata whose metric name length exceeds the limit, metric name: 'test_metric' (err-mimir-metric-name-too-long). You can adjust the related per-tenant limit by configuring -validation.max-metadata-length, or by contacting your service administrator.", err.Error())
assert.Equal(t, "received a metric metadata whose metric name length exceeds the limit, metric name: 'test_metric' (err-mimir-tenant-metric-name-too-long). You can adjust the related per-tenant limit by configuring -validation.max-metadata-length, or by contacting your service administrator.", err.Error())
}

func TestNewMetadataHelpTooLongError(t *testing.T) {
err := newMetadataHelpTooLongError(&mimirpb.MetricMetadata{MetricFamilyName: "test_metric", Unit: "counter", Help: "This is a test metric."})
assert.Equal(t, "received a metric metadata whose help description length exceeds the limit, help: 'This is a test metric.' metric name: 'test_metric' (err-mimir-help-too-long). You can adjust the related per-tenant limit by configuring -validation.max-metadata-length, or by contacting your service administrator.", err.Error())
assert.Equal(t, "received a metric metadata whose help description length exceeds the limit, help: 'This is a test metric.' metric name: 'test_metric' (err-mimir-tenant-help-too-long). You can adjust the related per-tenant limit by configuring -validation.max-metadata-length, or by contacting your service administrator.", err.Error())
}

func TestNewMetadataUnitTooLongError(t *testing.T) {
err := newMetadataUnitTooLongError(&mimirpb.MetricMetadata{MetricFamilyName: "test_metric", Unit: "counter", Help: "This is a test metric."})
assert.Equal(t, "received a metric metadata whose unit name length exceeds the limit, unit: 'counter' metric name: 'test_metric' (err-mimir-unit-too-long). You can adjust the related per-tenant limit by configuring -validation.max-metadata-length, or by contacting your service administrator.", err.Error())
assert.Equal(t, "received a metric metadata whose unit name length exceeds the limit, unit: 'counter' metric name: 'test_metric' (err-mimir-tenant-unit-too-long). You can adjust the related per-tenant limit by configuring -validation.max-metadata-length, or by contacting your service administrator.", err.Error())
}

func TestNewMaxQueryLengthError(t *testing.T) {
err := NewMaxQueryLengthError(time.Hour, time.Minute)
assert.Equal(t, "the query time range exceeds the limit (query length: 1h0m0s, limit: 1m0s) (err-mimir-max-query-length). You can adjust the related per-tenant limit by configuring -store.max-query-length, or by contacting your service administrator.", err.Error())
assert.Equal(t, "the query time range exceeds the limit (query length: 1h0m0s, limit: 1m0s) (err-mimir-tenant-max-query-length). You can adjust the related per-tenant limit by configuring -store.max-query-length, or by contacting your service administrator.", err.Error())
}
12 changes: 6 additions & 6 deletions pkg/util/validation/validate_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -127,9 +127,9 @@ func TestValidateLabels(t *testing.T) {
# HELP cortex_discarded_samples_total The total number of samples that were discarded.
# TYPE cortex_discarded_samples_total counter
cortex_discarded_samples_total{reason="label_invalid",user="testUser"} 1
cortex_discarded_samples_total{reason="label_name_too_long",user="testUser"} 1
cortex_discarded_samples_total{reason="label_value_too_long",user="testUser"} 1
cortex_discarded_samples_total{reason="max_label_names_per_series",user="testUser"} 1
cortex_discarded_samples_total{reason="tenant_label_name_too_long",user="testUser"} 1
cortex_discarded_samples_total{reason="tenant_label_value_too_long",user="testUser"} 1
cortex_discarded_samples_total{reason="tenant_max_label_names_per_series",user="testUser"} 1
cortex_discarded_samples_total{reason="metric_name_invalid",user="testUser"} 1
cortex_discarded_samples_total{reason="missing_metric_name",user="testUser"} 1
Expand Down Expand Up @@ -265,10 +265,10 @@ func TestValidateMetadata(t *testing.T) {
require.NoError(t, testutil.GatherAndCompare(prometheus.DefaultGatherer, strings.NewReader(`
# HELP cortex_discarded_metadata_total The total number of metadata that were discarded.
# TYPE cortex_discarded_metadata_total counter
cortex_discarded_metadata_total{reason="help_too_long",user="testUser"} 1
cortex_discarded_metadata_total{reason="metric_name_too_long",user="testUser"} 1
cortex_discarded_metadata_total{reason="tenant_help_too_long",user="testUser"} 1
cortex_discarded_metadata_total{reason="tenant_metric_name_too_long",user="testUser"} 1
cortex_discarded_metadata_total{reason="missing_metric_name",user="testUser"} 1
cortex_discarded_metadata_total{reason="unit_too_long",user="testUser"} 1
cortex_discarded_metadata_total{reason="tenant_unit_too_long",user="testUser"} 1
cortex_discarded_metadata_total{reason="random reason",user="different user"} 1
`), "cortex_discarded_metadata_total"))
Expand Down

0 comments on commit 758ef72

Please sign in to comment.