Improved CortexRequestLatency playbook #352

pracucci · 2021-07-05T08:58:34Z

What this PR does:
In this PR I've tried to improve the CortexRequestLatency playbook, both updating it (eg. query-scheduler, store-gateway, ...) and expanding more about the investigation procedure.

Which issue(s) this PR fixes:
N/A

Checklist

CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Marco Pracucci <marco@pracucci.com>

gouthamve

LGTM with some suggestions. Overall quite good!

gouthamve · 2021-07-05T09:00:04Z

cortex-mixin/docs/playbooks.md

-
-#### Read Latency
-Query performance is an known problem. When you get this alert, you need to work out if: (a) this is a operation issue / configuration (b) this is because of algorithms and inherently limited (c) this is a bug
+The alert message includes both the Cortex service and route experiencing the high latency. Establish if the alert is about the read or write path based on that.


Do we want to specify the paths that are read and those that are write?

Good idea 👍

gouthamve · 2021-07-05T09:04:40Z

cortex-mixin/docs/playbooks.md

+  - **`distributor`**
+    - Typically, distributor p99 latency is in the range 50-100ms. If the distributor latency is higher than this, you may need to scale up the distributors.
+  - **`ingester`**
+    - Typically, ingester p99 latency is in the range 5-50ms. If the ingester latency is higher than this, you should investigate the root cause before scaling up ingesters.


Do we want to add the scaling dashboard to check if the ingesters are running more than 2Mil series per pod?

Yes good idea. However, since it's already mentioned in other playbook (that should fire if series / ingester > 1.6M) then I've mentioned that alert here too.

gouthamve · 2021-07-05T09:07:15Z

cortex-mixin/docs/playbooks.md

+      - High CPU utilization in ingesters
+        - Scale up ingesters
+      - Low cache hit ratio in the store-gateways
+        - If memcached eviction rate is high, then you should scale up memcached replicas. Check the recommendations by `Cortex / Scaling` dashboard and make reasonable adjustments as necessary.


Suggest the memcached dashboard here?

Definitely, done

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci · 2021-07-05T12:01:56Z

Thanks a lot @gouthamve for your thoughtful review!

pracucci · 2021-07-05T12:02:48Z

I'm going to merge it but if you have any further comment I will promptly address it 🙏

…quest-failure-and-latency-playbooks Improved CortexRequestLatency playbook

Improved CortexRequestLatency playbook

6421751

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci requested a review from a team as a code owner July 5, 2021 08:58

gouthamve approved these changes Jul 5, 2021

View reviewed changes

Addressed review comments

c6b4464

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci merged commit 27078c6 into main Jul 5, 2021

pracucci deleted the improve-request-failure-and-latency-playbooks branch July 5, 2021 12:02

simonswine pushed a commit to grafana/mimir that referenced this pull request Oct 18, 2021

Merge pull request grafana/cortex-jsonnet#352 from grafana/improve-re…

d97836f

…quest-failure-and-latency-playbooks Improved CortexRequestLatency playbook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improved CortexRequestLatency playbook #352

Improved CortexRequestLatency playbook #352

Uh oh!

pracucci commented Jul 5, 2021

Uh oh!

gouthamve left a comment

Uh oh!

gouthamve Jul 5, 2021

Uh oh!

pracucci Jul 5, 2021

Uh oh!

gouthamve Jul 5, 2021

Uh oh!

pracucci Jul 5, 2021

Uh oh!

gouthamve Jul 5, 2021

Uh oh!

pracucci Jul 5, 2021

Uh oh!

pracucci commented Jul 5, 2021

Uh oh!

pracucci commented Jul 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improved CortexRequestLatency playbook #352

Improved CortexRequestLatency playbook #352

Uh oh!

Conversation

pracucci commented Jul 5, 2021

Uh oh!

gouthamve left a comment

Choose a reason for hiding this comment

Uh oh!

gouthamve Jul 5, 2021

Choose a reason for hiding this comment

Uh oh!

pracucci Jul 5, 2021

Choose a reason for hiding this comment

Uh oh!

gouthamve Jul 5, 2021

Choose a reason for hiding this comment

Uh oh!

pracucci Jul 5, 2021

Choose a reason for hiding this comment

Uh oh!

gouthamve Jul 5, 2021

Choose a reason for hiding this comment

Uh oh!

pracucci Jul 5, 2021

Choose a reason for hiding this comment

Uh oh!

pracucci commented Jul 5, 2021

Uh oh!

pracucci commented Jul 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants