Distributor: add ingester append timeouts error #10456
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
Failing to send samples to ingesters because the request exceeded its timeout is a very clear signal that ingesters are unable to keep up with demand. In an incident today we saw that ingesters' push latencies were increased sharply by an expensive regex query which was starving other goroutines of time on CPU.
This new alert
loki_distributor_ingester_append_timeouts_total
will give us a high-signal metric which we can use for alerting.Which issue(s) this PR fixes:
N/A
Special notes for your reviewer:
Replacing the
loki_distributor_ingester_append_failures_total
metric; this was never very high signal since it included samples which could not append due to user-related errors like stream limit / too old.Checklist
CONTRIBUTING.md
guide (required)CHANGELOG.md
updatedadd-to-release-notes
labeldocs/sources/setup/upgrade/_index.md
production/helm/loki/Chart.yaml
and updateproduction/helm/loki/CHANGELOG.md
andproduction/helm/loki/README.md
. Example PR