Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex-match not working in 2.1.0 #3815

Closed
dharrigan opened this Issue Feb 8, 2018 · 27 comments

Comments

Projects
None yet
6 participants
@dharrigan
Copy link

dharrigan commented Feb 8, 2018

What did you do?

Installed Prometheus 2.1.0 and executed this query:

sum by (topic)(rate(kafka_server_brokertopicmetrics_messagesin_total{topic=~'users'}[1d]))

What did you expect to see?

Filtered metrics based upon a Prometheus Query, with a non-zero value.

What did you see instead? Under which circumstances?

Wrong data

{topic="foo_users"} | 0

Environment

Linux

  • System information:

Linux 4.13.0-32-generic x86_64

  • Prometheus version:
prometheus, version 2.1.0 (branch: HEAD, revision: 85f23d82a045d103ea7f3c89a91fba4a93e6367a)
  build user:       root@6e784304d3ff
  build date:       20180119-12:01:23
  go version:       go1.9.2
  • Prometheus config:
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  - job_name: 'local-kafka'
    static_configs:
      - targets: ['localhost:7071']

Other information:

Here is the result of the raw metric from kafka:

kafka_server_brokertopicmetrics_messagesin_total{topic="foo_users",} 733.0

If I rerun the query, with the full topic name, it works:

sum by (topic)(rate(kafka_server_brokertopicmetrics_messagesin_total{topic=~'foo_users'}[1d]))

{topic="foo_users"} | 0.0009844983934935792

So it appears that the regexp match =~ doesn't work correctly in 2.1.0, as it's only working if it's an exact match against the topic name.

@dharrigan dharrigan changed the title regex-matches not working in 2.1.0 regex-match not working in 2.1.0 Feb 8, 2018

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Feb 8, 2018

From what I understand, it is selecting the right series but returning the wrong data (0)?

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Feb 8, 2018

Regexp matches are fully anchored. Have you tried sum by (topic)(rate(kafka_server_brokertopicmetrics_messagesin_total{topic=~'.*users.*'}[1d]))?

@dharrigan

This comment has been minimized.

Copy link
Author

dharrigan commented Feb 8, 2018

Hi,

I will try. What I would say is that this query was working prior to the upgrade to 2.1.0, i.e., with 2.0 and 1.8.2 prior versions.

-=david=-

@dharrigan

This comment has been minimized.

Copy link
Author

dharrigan commented Feb 8, 2018

@gouthamve it returns the wrong data, since it can't regexp match on the topic name thus it returns a default of 0.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Feb 8, 2018

I'm not sure I follow, as {topic="foo_users"} | 0 is being returned, the series is being selected?

@simonpasquier The original query as it is given should work too.

@dharrigan

This comment has been minimized.

Copy link
Author

dharrigan commented Feb 8, 2018

@gouthamve that's actually the result from a c&p from the Prometheus UI on 9090. I can't report on how the UI displays the data, or why it chooses to do so, all I can report is that prior to 2.1.0 the same query without modification worked, now it doesn't. Only by changing the query's topic name to be an exact match for the kafka topic name is the result returned, i.e., topic=~'users' fails, whereas topic=~'foo_users' does not. I'm about to try with the suggestion from @simonpasquier and see if original functionality can be restored (otherwise, it means rolling out exact matches for topic names across a lot of topics in a lot of environments - or rolling back to 2.0/1.8.2 - if that is even possible!)

@dharrigan

This comment has been minimized.

Copy link
Author

dharrigan commented Feb 8, 2018

@simonpasquier your suggestion works (it's bit too expansive for my use-case, but this works):

sum by (topic)(rate(kafka_server_brokertopicmetrics_messagesin_total{topic=~'.*users'}[1d]))

whereas this does not:

sum by (topic)(rate(kafka_server_brokertopicmetrics_messagesin_total{topic=~'users'}[1d]))

Thank you for your suggestion. Is this still a bug then, or has this functionality changed from 2.0 to 2.1.0 (being fully anchored?)

-=david=-

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Feb 8, 2018

This is still a bug. As the UI reports the series, I'm pretty sure its getting selected but not sure why the value is 0.

@brian-brazil Your thoughts on this?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 8, 2018

There are no defaults in PromQL, this sounds like incorrect regex matching down in local storage.

@brian-brazil brian-brazil added priority/P0 and removed priority/P2 labels Feb 8, 2018

@dharrigan

This comment has been minimized.

Copy link
Author

dharrigan commented Feb 8, 2018

If you require any more information, I would be very happy to help out - I'm just glad that I can work around the issue atm :) Top marks to @simonpasquier 👍

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 8, 2018

What does kafka_server_brokertopicmetrics_messagesin_total{topic=~'users'} return?

@dharrigan

This comment has been minimized.

Copy link
Author

dharrigan commented Feb 8, 2018

Hi,

With this query (original, working in 2.0 and 1.8.2)

kafka_server_brokertopicmetrics_messagesin_total{topic=~'users'}

kafka_server_brokertopicmetrics_messagesin_total{instance="10.1.1.70:7071",job="int-kafka-clst01-kafka",topic="foo_users"} 0
kafka_server_brokertopicmetrics_messagesin_total{instance="10.1.1.71:7071",job="int-kafka-clst02-kafka",topic="foo_users"} 0
kafka_server_brokertopicmetrics_messagesin_total{instance="10.1.1.72:7071",job="int-kafka-clst03-kafka",topic="foo_users"} 0

then, with the above work around (to work in 2.1.0)

kafka_server_brokertopicmetrics_messagesin_total{topic=~'.*_users'}

kafka_server_brokertopicmetrics_messagesin_total{instance="10.1.1.70:7071",job="int-kafka-clst01-kafka",topic="foo_users"} 733
kafka_server_brokertopicmetrics_messagesin_total{instance="10.1.1.71:7071",job="int-kafka-clst02-kafka",topic="foo_users"} 733
kafka_server_brokertopicmetrics_messagesin_total{instance="10.1.1.72:7071",job="int-kafka-clst03-kafka",topic="foo_users"} 1149

then, as an exact match

kafka_server_brokertopicmetrics_messagesin_total{topic=~'foo_users'}

kafka_server_brokertopicmetrics_messagesin_total{instance="10.1.1.70:7071",job="int-kafka-clst01-kafka",topic="foo_users"} 733
kafka_server_brokertopicmetrics_messagesin_total{instance="10.1.1.71:7071",job="int-kafka-clst02-kafka",topic="foo_users"} 733
kafka_server_brokertopicmetrics_messagesin_total{instance="10.1.1.72:7071",job="int-kafka-clst03-kafka",topic="foo_users"} 1149

Hope that helps!

-=david=-

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 8, 2018

Uhm, that's not the way regexes work in Prometheus at all in the original example. kafka_server_brokertopicmetrics_messagesin_total{topic=~'users'} should be matching only topic="users".

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 8, 2018

And I cannot reproduce in 2.0.0. I'm suspecting this may be an issue with this particular Prometheus rather than a bug.

If you bring up another Prometheus with an empty database, does it have the same issue?

@dharrigan

This comment has been minimized.

Copy link
Author

dharrigan commented Feb 8, 2018

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 8, 2018

For this problem the values shouldn't matter, only the time series which are present.

@dharrigan

This comment has been minimized.

Copy link
Author

dharrigan commented Feb 8, 2018

Hi,

Yes, I spun up a new instance of Prometheus, ran the same queries as above (#3815 (comment)) and only with the anchored query .*_ and the full name query foo_users is data returned. With the partial match query users nothing (0) is returned).

Helpful?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 8, 2018

Okay, sounds like just that Prometheus is broken then. Probably bad data in its storage.

@gouthamve Do you think there's a potential bug in storage here?

nothing (0) is returned

To be clear nothing should be returned. Nothing is not the same as a single time series with no labels and the value 0.

@brian-brazil brian-brazil added priority/P2 and removed priority/P0 labels Feb 8, 2018

@dharrigan

This comment has been minimized.

Copy link
Author

dharrigan commented Feb 8, 2018

Hi,

I'm a bit confused. With a brand new Prometheus install with a brand new database, I get the same problem. How can it be a problem with my Prometheus that has a broken database since it's been destroyed then recreated (in fact I deleted the entire Prometheus directory and extracted the tarball fresh). Pointed it to the metric on the remote Kafka server then gave it a bit of time to scrape a few data points and I observe the same behaviour that the regexp operator =~ doesn't work as I would expect it to work?

Perhaps I'm missing something in my understanding of Prometheus and I would welcome being corrected.

-=david=-

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Feb 14, 2018

@dharrigan this indeed sounds like a bug and no fault in your storage.
Do your metrics contain sensitive data or could you possible provide us some of the storage files for inspection?

A few hours after startup you should be seeing directories with random IDs appearing in your data directory. If you could send one of those my way that would be hugely helpful do debug this.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 8, 2018

Are you still seeing this with the 2.2.0 rcs?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 21, 2018

Is this still happening with 2.2.1?

@mattsdni

This comment has been minimized.

Copy link

mattsdni commented Apr 17, 2018

@brian-brazil I believe this may still be an issue in 2.2.1
I run this query

container_cpu_usage_seconds_total{namespace="default", job="kubernetes-cadvisor"}

And I get this a metric with this label: pod_name="prometheus-3101542156-zkk49"

If I update the query to this

container_cpu_usage_seconds_total{namespace="default", job="kubernetes-cadvisor", pod_name="prometheus-.*"}

I get no results back.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Apr 17, 2018

That's as expected, you're using an equality matcher.

@mattsdni

This comment has been minimized.

Copy link

mattsdni commented Apr 17, 2018

oh, my bad. changed to =~ and it works.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Apr 17, 2018

I'm going to presume this is resolved then.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.