New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus Request Queue Diagnostics #889
Conversation
changed "Reservied" to "Reserved"
Merge develop into master
…string. Refactor specific utilities into respecitive _util package.
Cluster: cluster, | ||
Namespace: namespace, | ||
}, | ||
Pod: pod, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimization here which seemed to have a fairly significant reduction in allocation upon very loose observation via pprof. The benefits from non-string keys are still very hard to pin down, as the keys are escaped at some point and count towards heap allocations. I did not expect embedding namespaceKey to have that dramatic of an effect, but it appears to have helped for appendLabels()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I never would have thought. 😲
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be honest, I'm not completely sold, but I did test this in isolation and there were noticeable changes. I still need to use some better benchmarks for measurement.
…his will allow better visibility on frontend queries made through our product.
w.Write(body) | ||
} | ||
|
||
func (a *Accesses) PrometheusQueryRange(w http.ResponseWriter, r *http.Request, _ httprouter.Params) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, the expected payload changes between QueryRange and Query on the frontend. For now, I'm sticking to "ease of migration" versus attempting to refactor all the queries on the frontend 😄
@@ -157,46 +166,68 @@ func runQuery(query string, ctx *Context, resCh QueryResultsChan, profileLabel s | |||
resCh <- results | |||
} | |||
|
|||
func (ctx *Context) query(query string) (interface{}, prometheus.Warnings, error) { | |||
// RawQuery is a direct query to the prometheus client and returns the body of the response | |||
func (ctx *Context) RawQuery(query string) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RawQuery
and RawQueryRange
were required for proxying queries from the frontend while also piping the queries through our request queue.
|
||
// Note that the warnings return value from client.Do() is always nil using this | ||
// version of the prometheus client library. We parse the warnings out of the response | ||
// body after json decodidng completes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully this comment sheds light on the current scenario. The prometheus Warnings are actually part of the response payload. Since the http request will never return warnings, we needed to implement our own warning parsing, which happens after the unmarshalling.
Merge Master into develop
Create PULL_REQUEST_TEMPLATE.md
…h a TryDequeue() method for non-blocking dequeue and Clear() for resetting the queue contents.
…g-quote Add missing quote to CloudStatus object json rule
Update version
fix panic on nil storageclasses
optimize query, fix for no cadvisor relabels
Add helm config parameter for shared overhead costs
…string. Refactor specific utilities into respecitive _util package.
…his will allow better visibility on frontend queries made through our product.
…h a TryDequeue() method for non-blocking dequeue and Clear() for resetting the queue contents.
…l into bolt/prom-diagnostics
The idea behind these changes is gaining insight into the state of the outgoing prometheus/thanos requests. This diagnostic tool will assist us in determining future optimization strategies.
Goals