Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PromQL bug with query_range and metrics name change #4562

Closed
roidelapluie opened this issue Aug 29, 2018 · 8 comments

Comments

Projects
None yet
3 participants
@roidelapluie
Copy link
Contributor

commented Aug 29, 2018

Bug Report

What did you do?

  • Upgrade node_exporter
  • run with query_range: sum(rate({__name__=~'node_network_receive_drop_total|node_network_receive_drop',env='prod',instance="nip-beprd51"}[5m]) > 0) by(instance,cluster)

What did you expect to see?

a constant result

What did you see instead? Under which circumstances?

when the query range overlaps the metric name change, I git no result when we switch to the new metric

Environment

  • Prometheus version:

prometheus, version 2.3.2 (branch: HEAD, revision: 71af5e2)
build user: root@5258e0bd9cc1
build date: 20180712-14:02:52
go version: go1.10.3

Raw data

{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"__name__":"node_network_receive_drop","cluster":"default","device":"bond0","env":"prod","instance":"nip-beprd51","job":"node_exporters"},"values":[[1535541130.395,"107315172"],[1535541160.395,"107315217"]]},{"metric":{"__name__":"node_network_receive_drop_total","cluster":"default","device":"bond0","env":"prod","instance":"nip-beprd51","job":"node_exporters"},"values":[[1535541190.396,"107315254"],[1535541220.395,"107315281"],[1535541250.395,"107315334"],[1535541280.395,"107315405"],[1535541310.395,"107315457"],[1535541340.395,"107315529"],[1535541370.395,"107315598"],[1535541400.395,"107315649"],[1535541430.395,"107315697"]]}]}}

Query range queries with sum() that cross 1535541430-1535541431 are stopped in the middle

query3
query4

@roidelapluie roidelapluie changed the title PromQL bug with query_range and staleness PromQL bug with query_range and metrics name change Aug 29, 2018

@brian-brazil

This comment has been minimized.

Copy link
Member

commented Aug 29, 2018

For anyone looking at this later, note that this type of query is non-advised due to a) using a regex on name rather than specifying names directly which is inefficient and indicates data model issues (use or for this sort of use case) and b) filtering in a graph, as this may return no results where a 0 was expected.

However an instant query and a range query step should always return the same result for a given timestamp, so this is a bug.

@brian-brazil

This comment has been minimized.

Copy link
Member

commented Aug 29, 2018

The issue here is the duplicate timestamp that happens when the > is applied to the duplicately named vector coming back from rate due to using a regex against name, which throws off the input-collecting logic in rangeEval.

The best course of action here is likely to forbid such expressions, that is to make it so that a given instant/range vector cannot contain two metrics with the same labelset and to return an error to the user. This will likely break some users doing inadvisable things (like the above query), but I don't see any other way that's semantically sane.

@roidelapluie

This comment has been minimized.

Copy link
Contributor Author

commented Aug 30, 2018

So https://user-images.githubusercontent.com/291750/44791137-3f024d80-aba1-11e8-9e08-f138cd6755e9.png would render an error? I can imagine that this is something that is between the bug fix and the breaking change.

@brian-brazil

This comment has been minimized.

Copy link
Member

commented Aug 30, 2018

Yes, that should render an error. There may be users depending on this, which is why it's potentially breaking, however it is not a sane query as it's semantics are undefined. Using regexes on metric names is very strongly discouraged, outside of debugging Prometheus performance issues.

@roidelapluie

This comment has been minimized.

Copy link
Contributor Author

commented Aug 30, 2018

Strangely enough I have not seen that in the doc:

Label matchers can also be applied to metric names by matching against the internal __name__ label. For example, the expression http_requests_total is equivalent to {__name__="http_requests_total"}. Matchers other than = (!=, =~, !~) may also be used. The following expression selects all metrics that have a name starting with job::

{__name__=~"job:.*"}

Maybe we should add a warning?

In our case the metric names change of the exporter were converted by name=~"old|new" on every dashboard.

@sipian

This comment has been minimized.

Copy link
Contributor

commented Sep 7, 2018

@brian-brazil
I'm a bit confused on where to add this check. Could you tell where should this test be added in promql?
instant/range vector cannot contain two metrics with the same labelset
I'm guessing somewhere in eval and rangeEval.

@brian-brazil

This comment has been minimized.

Copy link
Member

commented Sep 7, 2018

There's three places: range vector functions, range Eval, and the unary minus. We may also be able to remove the existing check from label_replace.

@lock

This comment has been minimized.

Copy link

commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.