Prometheus Request Queue Diagnostics #889

mbolt35 · 2021-08-05T18:29:34Z

The idea behind these changes is gaining insight into the state of the outgoing prometheus/thanos requests. This diagnostic tool will assist us in determining future optimization strategies.

Goals

Add visualization/diagnostic to the queue of prometheus/thanos queries so at any given point, we can check:
- How many requests are queued?
- What requests are queued?
- How long has the request been queued?
- What issued the query (context naming, "categories")?
To ensure we are correctly monitoring any of the queries we issue, we should move the prometheus/thanos query endpoints from /api/ to the cost-model so that they can be queued alongside the backend queries

changed "Reservied" to "Reserved"

Merge develop into master

…string. Refactor specific utilities into respecitive _util package.

mbolt35 · 2021-08-06T21:36:36Z

pkg/costmodel/key.go

+			Cluster:   cluster,
+			Namespace: namespace,
+		},
+		Pod: pod,
 	}


Optimization here which seemed to have a fairly significant reduction in allocation upon very loose observation via pprof. The benefits from non-string keys are still very hard to pin down, as the keys are escaped at some point and count towards heap allocations. I did not expect embedding namespaceKey to have that dramatic of an effect, but it appears to have helped for appendLabels()

Wow, I never would have thought. 😲

I'll be honest, I'm not completely sold, but I did test this in isolation and there were noticeable changes. I still need to use some better benchmarks for measurement.

…his will allow better visibility on frontend queries made through our product.

mbolt35 · 2021-08-06T22:47:18Z

pkg/costmodel/router.go

+	w.Write(body)
+}
+
+func (a *Accesses) PrometheusQueryRange(w http.ResponseWriter, r *http.Request, _ httprouter.Params) {


For some reason, the expected payload changes between QueryRange and Query on the frontend. For now, I'm sticking to "ease of migration" versus attempting to refactor all the queries on the frontend 😄

mbolt35 · 2021-08-06T22:49:03Z

pkg/prom/query.go

@@ -157,46 +166,68 @@ func runQuery(query string, ctx *Context, resCh QueryResultsChan, profileLabel s
 	resCh <- results
 }

-func (ctx *Context) query(query string) (interface{}, prometheus.Warnings, error) {
+// RawQuery is a direct query to the prometheus client and returns the body of the response
+func (ctx *Context) RawQuery(query string) ([]byte, error) {


RawQuery and RawQueryRange were required for proxying queries from the frontend while also piping the queries through our request queue.

mbolt35 · 2021-08-06T22:51:14Z

pkg/prom/query.go

+
+	// Note that the warnings return value from client.Do() is always nil using this
+	// version of the prometheus client library. We parse the warnings out of the response
+	// body after json decodidng completes.


Hopefully this comment sheds light on the current scenario. The prometheus Warnings are actually part of the response payload. Since the http request will never return warnings, we needed to implement our own warning parsing, which happens after the unmarshalling.

Merge Master into develop

Create PULL_REQUEST_TEMPLATE.md

…h a TryDequeue() method for non-blocking dequeue and Clear() for resetting the queue contents.

…g-quote Add missing quote to CloudStatus object json rule

Update version

fix panic on nil storageclasses

optimize query, fix for no cadvisor relabels

Add helm config parameter for shared overhead costs

Master

…string. Refactor specific utilities into respecitive _util package.

…his will allow better visibility on frontend queries made through our product.

…h a TryDequeue() method for non-blocking dequeue and Clear() for resetting the queue contents.

…l into bolt/prom-diagnostics

kbrwn and others added 15 commits July 19, 2021 08:51

fix spelling mistake in awsprovider.go

1ea7169

changed "Reservied" to "Reserved"

Bump version for patch

cfa2dbe

Add cloud status types

9b13eef

Add logic to get shared overhead from values.yaml

40815c4

Replace SharedCosts with SharedOverhead in CustomPricing

27123b5

cadvisor compatibility

2e99959

pod/pod_name fixes

0ede396

simplify ns query

ed8aa73

fix typo

d2b1e89

remove unnecessary instance check

4b97f06

Merge pull request #885 from kubecost/develop

4bd915d

Merge develop into master

Update costmodelenv.go

b1d60de

Omit PV Hourly Metrics for !Available and !Bound

cdf7216

Only filter if status is failed.

f688df9

Prometheus Request Queue Diagnostics

279b12b

mbolt35 self-assigned this Aug 5, 2021

mbolt35 added the enhancement New feature or request label Aug 5, 2021

mbolt35 added 7 commits August 5, 2021 14:51

Apply names to the query contexts.

c3ce3ca

Add MaxQueryConcurrency to the state payload.

d3ede36

Add diagnostics endpoint for queue state.

c5ca80f

Fix weird issue with upgraded library and go mod.

f99121d

Upgrade to go 1.16 in go mod

aeb6f21

Added a string bank utility which can be leveraged in the buffer read…

c3577ed

…string. Refactor specific utilities into respecitive _util package.

Typo fix

6ec8ca6

mbolt35 commented Aug 6, 2021

View reviewed changes

mbolt35 added 2 commits August 6, 2021 17:40

Update package imports

981976b

Migrated proxy query and query range endpoints from /api to /model. T…

adc35b2

…his will allow better visibility on frontend queries made through our product.

mbolt35 commented Aug 6, 2021

View reviewed changes

AjayTripathy and others added 28 commits August 20, 2021 15:18

Merge pull request #904 from kubecost/master

b0eb4d7

Merge Master into develop

Merge pull request #903 from kubecost/mmd/pr-template

6016653

Create PULL_REQUEST_TEMPLATE.md

Add a repair status to the ETL status, and enhance blocking queue wit…

452aec9

…h a TryDequeue() method for non-blocking dequeue and Clear() for resetting the queue contents.

Addition for Clear()

0a122dd

Update version

93b164c

Add missing quote to CloudStatus object json rule

3904eff

Merge pull request #906 from kubecost/sean/bug-fix/cloud-asset-missin…

da351c3

…g-quote Add missing quote to CloudStatus object json rule

optimize query, fix for no cadvisor relabels

3dac6e6

fix filter

b8bfe8e

Merge pull request #908 from kubecost/master

1c91825

Update version

fix panic

5cd328d

Merge pull request #909 from kubecost/AjayTripathy-fix-panic

009e327

fix panic on nil storageclasses

Merge pull request #907 from kubecost/AjayTripathy-fix-network-data

d4bcb7f

optimize query, fix for no cadvisor relabels

Merge pull request #870 from kubecost/kaelan-helmsharedoverhead

7132bff

Add helm config parameter for shared overhead costs

Bump version

d9f6d2b

Merge pull request #910 from kubecost/master

6e7cdf0

Master

Prometheus Request Queue Diagnostics

0024683

Apply names to the query contexts.

96ebfcc

Add MaxQueryConcurrency to the state payload.

a7edca9

Add diagnostics endpoint for queue state.

027dfdf

Fix weird issue with upgraded library and go mod.

28cec5f

Upgrade to go 1.16 in go mod

d2c58ea

Added a string bank utility which can be leveraged in the buffer read…

2841672

…string. Refactor specific utilities into respecitive _util package.

Typo fix

5181585

Migrated proxy query and query range endpoints from /api to /model. T…

cf2142e

…his will allow better visibility on frontend queries made through our product.

Add a repair status to the ETL status, and enhance blocking queue wit…

0b1b814

…h a TryDequeue() method for non-blocking dequeue and Clear() for resetting the queue contents.

Addition for Clear()

a98e259

Merge branch 'bolt/prom-diagnostics' of github.com:kubecost/cost-mode…

b97416f

…l into bolt/prom-diagnostics

mbolt35 merged commit a65c111 into develop Aug 30, 2021

mbolt35 deleted the bolt/prom-diagnostics branch August 30, 2021 17:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus Request Queue Diagnostics #889

Prometheus Request Queue Diagnostics #889

mbolt35 commented Aug 5, 2021

mbolt35 Aug 6, 2021

michaelmdresser Aug 16, 2021

mbolt35 Aug 17, 2021

mbolt35 Aug 6, 2021

mbolt35 Aug 6, 2021

mbolt35 Aug 6, 2021

Prometheus Request Queue Diagnostics #889

Prometheus Request Queue Diagnostics #889

Conversation

mbolt35 commented Aug 5, 2021

Goals

mbolt35 Aug 6, 2021

Choose a reason for hiding this comment

michaelmdresser Aug 16, 2021

Choose a reason for hiding this comment

mbolt35 Aug 17, 2021

Choose a reason for hiding this comment

mbolt35 Aug 6, 2021

Choose a reason for hiding this comment

mbolt35 Aug 6, 2021

Choose a reason for hiding this comment

mbolt35 Aug 6, 2021

Choose a reason for hiding this comment