Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit number of data points returned in a query #4414

Closed
tomwilkie opened this Issue Jul 25, 2018 · 24 comments

Comments

Projects
None yet
4 participants
@tomwilkie
Copy link
Member

tomwilkie commented Jul 25, 2018

We currently limit the number of steps to 11k, but if you query a very high cardinality metric we seem to be able OOM even a very large Prometheus. I propose we limit the number of points we return (step * timeseries).

Plumbing this through might be a little tricky though, I though @gouthamve had some ideas?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 25, 2018

The plan is to limit the number of points that a PromQL node can return. Limiting the ultimate output could be too late.

@tomwilkie

This comment has been minimized.

Copy link
Member Author

tomwilkie commented Jul 25, 2018

That sounds reasonable. Is this discussed anywhere?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 25, 2018

It came up on the developers list a while back.

@tomwilkie

This comment has been minimized.

Copy link
Member Author

tomwilkie commented Jul 25, 2018

Found it: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/prometheus-developers/bwcBD-JBXck/BiFO_eKbCgAJ

I'm planning on adding a returned sample limit for each node in a promql evaluation, which we can do with the new promql design and would be good enough for most purposes.

When you said "the plan" you meant "my plan", right? :-)

Is this something you have started working on or plan on working on soon? If not I'll take a stab.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 25, 2018

I've not started working on it yet, there's also #4384 which is taking a more complicated approach.

@tomwilkie

This comment has been minimized.

Copy link
Member Author

tomwilkie commented Jul 25, 2018

Roger, thanks for the pointer.

Do you have any thoughts on reasonable limits? IIRC we consider 100k to be a reasonable cardinality limit for a metric, and therefore each aggregation should be able to consider 100k samples for a given stpe. I guess a limit would also be appropriate on the output of aggregations (specifically, less than 100k - maybe 1k?) and on the number of samples in a vector selector (maybe a relatively small limit - 6k would be 1 day at 15s).

Even after we do all this, it still sounds like queries that returns lots of timeseries over a long range could be a problem, if they are very simple - I'm thinking something as basic as up over 1k targets and 11k datapoints.

Hence why I think we need to limit step * #timeseries. WDYT?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 25, 2018

My main thought was that sum by job({__name__=~".+"}) needs to keep working in the vast majority of setups, as we don't currently have anything else to help debug cardinality issues. That'd imply at least 20M which works out as ~1GB in the worst case (3 nodes worth of data can be in memory at once for a binary operator, at 16B per sample).

Hence why I think we need to limit step * #timeseries.

That's why I'm proposing limiting output samples of a node, as it's more accurate than that.

@tomwilkie

This comment has been minimized.

Copy link
Member Author

tomwilkie commented Jul 25, 2018

Basically limits:

  1. around case *VectorSelector
  2. in matrixSelector
  3. in vectorSelector

(maybe 1. and 3. could be unified?)

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 25, 2018

We need range vector function output plus rangeEval as well, those functions only cover raw data dumps.

I'm not sure we can unify those functions.

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Aug 13, 2018

Picking this up.

Just having a read over the discussion (here, the other PR, mailing list) and trying to get up to speed with the code in /promql.

@brian-brazil are there any docs for the promql internals similar to what you wrote for discovery, or Julius' internal arch. doc?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 13, 2018

There are not, Tom's post has the code points you need to touch.

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Aug 13, 2018

👍 I'll start there.

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Aug 20, 2018

Running some tests to confirm which types would have to be limited by input samples to expressions/functions, but it may just be NumberLiteral or other types that are limited by the max # of steps limit.

If we're going to set a default value for the max # of samples and (or) make that value configurable, we should provide an explanation of how we get to the number and how people should go about calculating what they might want to set it to. Unfortunately my operations exp. with Prometheus is still limited.

@brian-brazil would you mind explaining this comment

My main thought was that sum by job({name=~".+"}) needs to keep working in the vast majority of setups, as we don't currently have anything else to help debug cardinality issues. That'd imply at least 20M which works out as ~1GB in the worst case (3 nodes worth of data can be in memory at once for a binary operator, at 16B per sample).

Are you saying 20M samples? I'm missing something in how your calculating these numbers.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 21, 2018

It's 16 bytes per sample, multiplied by 3 giving ~1GB with 20M points.

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Aug 21, 2018

Okay so 16 bytes per sample, multiplied by a max of 3 nodes, and again by 20M points is ~1GB. I guess I'm curious if you're saying 20M samples should be the max, or 1GB per query? 20M is in some cases not very many samples.

If, for example, some query could return 100k series, that's only 200 samples per series. If the interval is 15s, that's only 13-14 steps, or 50 minutes.

Does that sound reasonable to me? What would a typical query return in terms of # of series?

I think the #'s we use to decide what the sane default for the max samples per query is should be in the docs for people who may want to configure that value themselves.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 21, 2018

When using range vector functions, we don't load in all the data at once - just one window at a time for one series.

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Aug 21, 2018

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 22, 2018

That was the old design. In the new design things like rate() work time series by time series, and within that iterate over the data. So a 5m rate only needs 5m worth of samples from one time series in memory at a time - plus space for the output.

The space taken by series themselves and their labels is not being taken into consideration here, we're just focusing on samples as that should be good enough.

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Aug 22, 2018

So in the case of range vector were you referring to a single timestamp step when you said 'one window'?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 22, 2018

Yes.

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Aug 22, 2018

Okay that makes sense, thanks. So then I guess I'm just back to the question of what seems reasonable for a default maxSamples values. You're saying the default should be 20M which results in ~1GB of memory?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 22, 2018

I'm saying that we should have a default setting that allows sum by job({__name__=~".+"}) to work on a Prometheus with 20M active series, whatever that setting is.

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Aug 22, 2018

Okay, that makes sense. I'd like to have a test for this in my PR. I'll have a look around but not really seeing any examples of queries for __name__=~".+" in engine_test or parse_test.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Oct 18, 2018

Closed in #4532

@gouthamve gouthamve closed this Oct 18, 2018

@lock lock bot locked and limited conversation to collaborators Apr 16, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.