-
Notifications
You must be signed in to change notification settings - Fork 106
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. pinging @robert-milan / @colega / @jesusvazquez as they have an interest in the expr library now too.
@@ -57,6 +58,7 @@ func ConfigSetup() { | |||
apiCfg.BoolVar(&optimizations.PreNormalization, "pre-normalization", true, "enable pre-normalization optimization") | |||
apiCfg.BoolVar(&optimizations.MDP, "mdp-optimization", false, "enable MaxDataPoints optimization (experimental)") | |||
apiCfg.BoolVar(&middleware.LogHeaders, "log-headers", false, "output query headers in logs") | |||
apiCfg.UintVar(&minSliceSize, "min-slice-pool-size", 2000, "Minimum (and default) length of slice to allocate from pool") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add this to metrictank-sample.ini and then run scripts/dev/sync-configs.sh
api/init.go
Outdated
@@ -9,7 +9,7 @@ import ( | |||
var pointSlicePool *pointslicepool.PointSlicePool | |||
|
|||
func init() { | |||
pointSlicePool = pointslicepool.New(pointslicepool.DefaultPointSliceSize) | |||
pointSlicePool = pointslicepool.New(int(minSliceSize)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this means we can now delete DefaultPointSliceSize
from the code
pointslicepool/pointslicepool.go
Outdated
p.putLarge.Inc() | ||
} else { | ||
} else if cap(s) < p.defaultSize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we no longer track puts and get-makes if they equal defaultSize.
I can see how that helps with troubleshooting the kind of efficiencies you're looking into. OTOH only having metrics for a subset of the puts and get-makes is a bit weird, especially because the sum no longer relates to the get-candidate metrics. should we introduce get-make-default and put-default (or get-make-min and put-min) to track these? or are all these metrics getting out of hand (seems we rarely use them. but then again, that goes for most of the metrics i guess)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I debated this back and forth with myself. I don't want to over-instrument the code and these metrics aren't generally interesting except when trying to determine the efficacy of the slice pool in reducing allocations (which is exactly what I was just doing).
Ultimately, it would be nice to know:
- Allocations/bytes saved using the pool
- Allocation/bytes required through the pool (and maybe additional bytes due to default size)
- Reason the allocation is required (miss or unfit)
I'm not sure I find the "put" metrics that interesting, except maybe in determining that we are "losing" slices that should be returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lean towards adding in get-make-min and put-min, to make the set of metrics coherent.
if we decide the metrics are not (or no longer) useful, we can take them out all at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
just want to keep this open for a while to give @robert-milan / @colega / @jesusvazquez a chance to check this out
Thanks for the heads up @Dieterbe, reviewed and makes sense to me. Offtopic: we're importing this in a codebase that is instrumented with Prometheus so I'd like to make the instrumentation configurable in a separate PR (should be straightforward since both Prometheus and Graphite counters are |
These changes are to try to improve the SlicePool hit rate. Our query nodes have a very disappointing hit rate of ~80%.
I believe the reason is that most queries handle raw series that aren't 2000 datapoints long, yet will be added to the pool. Then most functions will be asking for 2000 (because they don't specify what they need). Because unfit slices are added back to the pool, we could (unverified) also be repeatedly grabbing the same low capacity slice over and over again.
So these changes:
GetMin
everywhere we have a good idea of how many points we needlarge
with default, making it difficult to know how often the default size was insufficiently large.