Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

use pointslicepool for /getdata cluster-fan out requests #1921

Merged
merged 3 commits into from Oct 21, 2020

Conversation

Dieterbe
Copy link
Contributor

@Dieterbe Dieterbe commented Oct 14, 2020

Note: this builds on top of #1923, review/merge that first.

when handling a user query, fanning out said query, and reading back the /getdata responses from peers, this will use the pointslicepool to decode incoming point slices into and should reduce memory usage. see #962

for review purposes, the generated code is identical except for the calls to pointslicepool.GetMin instead of make

@Dieterbe
Copy link
Contributor Author

NOTE: i'm still working on some tests to confirm improvements

@Dieterbe
Copy link
Contributor Author

Dieterbe commented Oct 15, 2020

i spun up docker-cluster-query with 4 query pods.
all mt read/write processes and q0 (query node 0) run master
q1 runs master + stats (#1922)
q2 runs master + stats + nudge fix (#1923)
q3 runs master + stats + nudge fix + /getdata pointslicepool (this branch)

I ran mt-fakemetrics backfill --kafka-mdm-addr localhost:9092 --offset 48h --period 10s --speedup 100 --mpo 1000 and then

cat ./even-load.sh
#!/bin/bash

q0() {
echo 'GET http://localhost:6060/render?target=some.id.of.a.metric.123*&from=-48h' | vegeta attack -rate 25 | vegeta report
}
q1() {
echo 'GET http://localhost:6061/render?target=some.id.of.a.metric.123*&from=-48h' | vegeta attack -rate 25 | vegeta report
}
q2() {
echo 'GET http://localhost:6062/render?target=some.id.of.a.metric.123*&from=-48h' | vegeta attack -rate 25 | vegeta report
}
q3() {
echo 'GET http://localhost:6063/render?target=some.id.of.a.metric.123*&from=-48h' | vegeta attack -rate 25 | vegeta report
}

q0 &
q1 &
q2 &
q3

I ran it for more than an hour.
clearly q3 wins in terms of memory usage. and q2 also does better than q1 and q0.
note the candidate hit stats of the pointslicepool lower left, showing the effect of these PR's in action.
comparison

dashboard is here

@Dieterbe
Copy link
Contributor Author

here are the 3 different PSP behaviors
first one suffers from the nudging bug, which the 2nd one fixes. but we only do half the gets on the pool as puts because upon decoding we always allocate fresh.
only for q3 do we pull the same rate in and out of the pool. with the occassional miss which i presume is due to GC
psp

Copy link
Contributor

@robert-milan robert-milan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to manually maintain Series msgp encoding / decoding etc... ?

@Dieterbe
Copy link
Contributor Author

because we use custom code to use the slice pool. perhaps it can be done cleaner with the msgp extension system but i didn't want to spend too much time on it.

Copy link
Contributor

@robert-milan robert-milan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@robert-milan robert-milan merged commit 8f037dd into master Oct 21, 2020
@robert-milan robert-milan deleted the getdata-use-pointslicepool branch October 21, 2020 13:08
@Dieterbe Dieterbe mentioned this pull request Oct 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants