Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect regex matching when querying using Prometheus 2 #2616

Closed
checketts opened this Issue Apr 13, 2017 · 15 comments

Comments

Projects
None yet
6 participants
@checketts
Copy link

checketts commented Apr 13, 2017

What did you do?
I ran the query container_cpu_system_seconds_total{container_name=~"prometheus.*"}

What did you expect to see?
I expected results with container_name="prometheus" AND container_name="prometheus2"

What did you see instead? Under which circumstances?
I have 2 Prometheus servers running side by side, monitoring the same targets. (Reproducing/Tracking @fabxc metrics listed in https://fabxc.org/blog/2017-04-10-writing-a-tsdb/)

My 1.5.2 server returns the expected (multiple) results. (6 in this case) but the v2.0.0-alpha.0 is only returning 4 of the values. It doesn't match against prometheus or prometheus2, but it does match against prometheus-load1, prometheus-dev3, etc.

I verified that the missing ones are present and can filter against them directly with queries like container_name=~"prometheus" (when I leave .* off the end)

Environment
Prometheus v2.0.0-alpha.0 running on Kubernetes 1.5

  • System information:

Linux 4.4.0-53-generic x86_64

  • Prometheus version:
prometheus, version 2.0.0-alpha.0 (branch: master, revision: ece483c0c1f2ad11e7dc749ea958669b0914c1c3)
  build user:       root@cd2fcfcce982
  build date:       20170410-11:14:31
  go version:       go1.8
@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Apr 17, 2017

Hmm, this is a little weird. So I am running the dev-2.0 branch and have two jobs prometheus and prometheus2, both scraping the Prometheus server itself. When I run {job=~"prometheus.*"}, both of them show up.

We don't handle lookups on job separately, so I think Prometheus is matching the container_name to prometheus too but for some reason (staleness?) it is not returning the values. @checketts Can you run a range query and see if you are still missing the values?

@checketts

This comment has been minimized.

Copy link
Author

checketts commented Apr 19, 2017

@gouthamve Sorry for the delay, I was traveling. The data is coming from the Kubernetes service discovery which means it is collecting the data from each target. I verified that each target is getting scraped within 30 seconds (my scrape rate). I haven't set a staleness parameter, so I assume the default staleness of 5 minutes should be used.

I ran the range query below and it behaved completely identical to when I left increase off.

increase(container_cpu_system_seconds_total{container_name=~"prometheus"}[5m])

Some more query datapoints:

container_name=~"prometheus|prometheus2" Returns 2 of the 6 results (prometheus and prometheus2)

But container_name=~"prometheus|prometheus2|prometheus.*" returns 2 other results ('prometheus-rigandprometheus2-rig`)

One last datapoint: container_name=~"prometheus|prometheus2|prometheus-rig" Return 3 results (the ones that match identically the names listed. But if I add in one more manually like so: container_name=~"prometheus|prometheus2|prometheus-rig|prometheus2-rig" (note I just add on the prometheus2-rig in the OR). Then results leave out the 'prometheus' one and keeps the other 3. It changes based how I order the OR clause.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented May 5, 2017

Regexp matching must be broken in some basic way.

I just tried node_network_transmit_bytes{job="node", device!~"bond[0-9]|cbr[0-9]|veth.*"}, resulting in far fewer results than I expected.

If I query node_network_transmit_bytes{job="node", device="eth3"}, I get loads of results, each of which should also be returned by the query above, but none does.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented May 6, 2017

It's not just regexp matching but any non-equal matcher potentially. My my debugging, the matching itself works correctly but the final postings list of series IDs is missing some.
Most likely it is mergePostings.Seek that advances the underlying postings too far sometimes and causes skipping some entries.

Seek is called when the postings intersections wants to advance an underlying merge postings to a certain value or higher. If I just completely expand a merge postings and intersect that result instead (so the non-lazy way), everything works fine as it seems.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented May 10, 2017

I'm currently testing v2 by letting it to the same thing as a number of our v1 servers. However, because of this bug, I cannot really compare the results: Many rules don't give the same result, and most dashboards are half-empty because of this.

@mwitkow

This comment has been minimized.

Copy link
Contributor

mwitkow commented May 16, 2017

@fabxc can we get prometheus/tsdb#77 vendored into dev-2.0. Would love to try it out :)

@checketts

This comment has been minimized.

Copy link
Author

checketts commented May 16, 2017

I also would be eager to get a new alpha build that I can hammer and report on.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented May 16, 2017

@checketts we'll probably wait just a bit more because. Some TODOs left before alpha.1.
However, you can build from the dev-2.0 branch, which basically reflects whatever would go into the next alpha as well.

#2728 syncs in some recent fixes in the TSDB

@checketts

This comment has been minimized.

Copy link
Author

checketts commented May 16, 2017

Would checking out the dev-2.0 branch and running make docker be sufficient? (I apologize I'm still pretty new to Go)

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented May 16, 2017

make promu; CGO_ENABLED=0 promu build; make docker would do it after checking out dev-2.0.

But @fabxc used to have latest dev-2.0 images pushed to Quay somewhere, not able to find the link now.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented May 16, 2017

Yea, basically this. If not on Linux addtionally GOOS=linux must be set.
GOOS=linux make build && make docker should work anywhere with a recent Go compiler.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Jun 8, 2017

I think this can be closed. Are you still seeing some lost series?

@checketts

This comment has been minimized.

Copy link
Author

checketts commented Jun 8, 2017

@gouthamve Yes, in my case you can close it. Not certain about @beorn7 case though

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jun 8, 2017

I think we have pretty clearly identified the source of this and fixed it.

@fabxc fabxc closed this Jun 8, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.