Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Federate endpoint returns empty response for query #1811

Closed
svenmueller opened this Issue Jul 13, 2016 · 19 comments

Comments

Projects
None yet
4 participants
@svenmueller
Copy link

svenmueller commented Jul 13, 2016

What did you do?

When i call endpoint /federate?match[]={__name__%3D~"^monitor%3A.*"} i get an empty response (status code 200).

What did you expect to see?
The endpoint should return values having label starting with "monitor". Another query shows that the external label is set (see label external "monitor")

# TYPE node_netstat_IpExt_InBcastOctets untyped
node_netstat_IpExt_InBcastOctets{job="node",instance="xyz:9100",monitor="codelab-monitor"} 0 1468426209014
# TYPE node_vmstat_nr_written untyped
node_vmstat_nr_written{instance="xyz:9100",job="node",monitor="codelab-monitor"} 1.4616345e+07 1468426209014
# TYPE node_netstat_TcpExt_TCPOFOMerge untyped
node_netstat_TcpExt_TCPOFOMerge{job="node",instance="xyz:9100",monitor="codelab-monitor"} 1 1468426208080

What did you see instead? Under which circumstances?
Instead of values having label "monitor" i get an empty response with status code 200.

Environment

  • System information:

    Linux 3.13.0-91-generic x86_64

  • Prometheus version:

    prometheus, version 0.20.0 (branch: master, revision: aeab25c)
    build user: root@77050118f904
    build date: 20160616-08:38:14
    go version: go1.6.2

  • Prometheus configuration file:

# my global config
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'codelab-monitor'

# Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:
  - "/etc/prometheus/rules/sample.rules"
  # - "second.rules"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9090']

  # Scrape the Node Exporter every 5 seconds.
  - job_name: 'node'
    scrape_interval: 5s
    file_sd_configs:
      - files:
        - /etc/prometheus/targets/*.yaml

Any idea why the query doesn't match anything?

@svenmueller

This comment has been minimized.

Copy link
Author

svenmueller commented Jul 13, 2016

I'm using hierarchical federation and want to scrape values the from the "child" prometheus instances. Therefore i'm using the external label in the match query (i followed the blog post here: http://www.robustperception.io/scaling-and-federating-prometheus/).

Any idea why i get no results?

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 14, 2016

__name__ refers to the metric name. What you seem to want here is /federate?{monitor%3D~".*"}.

@svenmueller

This comment has been minimized.

Copy link
Author

svenmueller commented Jul 14, 2016

Hi,

Thx for the response. I just tried /federate?match[]={monitor%3D~".*"} but it returns an empty result.

But there are definetely values having label monitor:

example /federate?match[]={job%3D~"node.*"}

# TYPE node_memory_KernelStack untyped
node_memory_KernelStack{instance="xyz:9100",job="node",monitor="codelab-monitor"} 2.482176e+06 1468477408520
# TYPE node_network_transmit_compressed untyped
node_network_transmit_compressed{device="veth952ec83",instance="xyz9100",job="node",monitor="codelab-monitor"} 0 1468477404608
# TYPE node_vmstat_nr_slab_unreclaimable untyped
@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 14, 2016

My bad, at least one matcher must not match the empty string, so it should be /federate?match[]={monitor%3D~".+"} instead.

Your second one satisfies the requirement by having the plain node in there.

@svenmueller

This comment has been minimized.

Copy link
Author

svenmueller commented Jul 14, 2016

Thx a lot @fabxc, but URI /federate?match[]={monitor%3D~".+"} also doesn't work for me (same empty result). For me it looks like as if query against the external label "monitor" always returns an empty result.

More examples (see available examples metrics above):

Returns empty result

  • /federate?match[]={monitor%3D"codelab-monitor"}
  • /federate?match[]={job%3D~"n.+"}

Returns a result

  • /federate?match[]={job%3D~"node.*"} (also strange, why does .* work but .+ not)

Is there any way to get more information (logs) why queries against external label monitor return nothing? Is there any other match pattern to scrape all the data from the "child" nodes?

(@fabxc btw, i could also give you access to my playground env so you can try out yourself)

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 14, 2016

And again my bad. At least the + needs to be urlencoded as well, and in general other parts too. To avoid any ambiguities, this is the correct unencoded URL path: /federate?match[]={monitor=~".+"}.

Testing with curl can avoid any ambiguities:
curl -G --data-urlencode 'match[]={monitor=~".+"}' example.org/federate

Similarly, as Prometheus treats a non-existent label the same as an empty label so this should equally work:
curl -G --data-urlencode 'match[]={monitor!=""}' example.org/federate

@svenmueller

This comment has been minimized.

Copy link
Author

svenmueller commented Jul 14, 2016

Hi,

Many thx for your effort! Using external labels doesn't seem to work for me:

$ curl -v -u xxx:yyy -G --data-urlencode 'match[]={monitor!=""}' https://mydomain.com/federate
$ # again nothing returned, status code 200

same for:

$ curl -v -u xxx:yyy -G --data-urlencode 'match[]={monitor=~".+"}' https://mydomain.com/federate
$ # again nothing returned, status code 200

When using a different label, it works ('monitor' label is in the result)

$ curl -v -u xxx:yyy -G --data-urlencode 'match[]={job=~".+"}' https://mydomain.com/federate
# TYPE node_network_transmit_drop untyped
node_network_transmit_drop{device="eth2",job="node",instance="bla.com:9100",monitor="codelab-monitor"} 0 1468502652522
# TYPE node_network_transmit_multicast untyped
node_network_transmit_multicast{job="node",instance="bla:9100",device="vethwe-bridge",monitor="codelab-monitor"} 0 1468502654905
# TYPE node_filesystem_readonly untyped
node_filesystem_readonly{instance="bla:9100",job="node",device="/dev/xvda1",fstype="ext4",mountpoint="/",monitor="codelab-monitor"} 0 1468502652057
@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 14, 2016

Ah, this is about the external label. I totally missed that. Literally every time series eposed on /federate will have this label attached so selecting along it would literally be a no-op filtering-wise.

What do you want to achieve here?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 14, 2016

External labels are added after filtering on the /federate endpoint.

@svenmueller

This comment has been minimized.

Copy link
Author

svenmueller commented Jul 14, 2016

I'm having a "global" Prometheus server to scrape data from the other "slave" Prometheus servers (federation). These lower-level Prometheus servers (one per environment/DC) )scrape the data from the targets where the node-exporter is running to expose the metrics..

I want do the alerting and queries on the "global" Prometheus instance which therefore should have the data of all child Prometheus servers.

What kind of match query can is use to scrape all the metrics from the "slave" Prometheus servers?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 14, 2016

I want do the alerting and queries on the "global" Prometheus instance which therefore should have the data of all child Prometheus servers.

The global shouldn't have all data, it should have aggregated data from the slaves. This is what the first example on http://www.robustperception.io/scaling-and-federating-prometheus/ is showing.

@svenmueller

This comment has been minimized.

Copy link
Author

svenmueller commented Jul 14, 2016

@brian-brazil Our usecase is described in the documentation "Cross-service federation" (https://prometheus.io/docs/operating/federation/#cross-service-federation), We want to do the alerting and querying/graphing against a single Prometheus having all data (centralized).

@svenmueller

This comment has been minimized.

Copy link
Author

svenmueller commented Jul 14, 2016

@brian-brazil btw, in your slave example your are using the match query {__name__=~"^slave:.*"}. How does this prefix "slave:" gets added to the scraped metrics?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 14, 2016

We want to do the alerting and querying/graphing against a single Prometheus having all data (centralized).

This is generally a bad idea, you should push down a given alert as low as you can in your federation hierarchy to increase reliability.

The only time to have alerts from cross-service data is when there's no other way such an alert can be expressed, and in that case you'd only pull in exactly the metrics you need via federation.

How does this prefix "slave:" gets added to the scraped metrics?

That'd come from a recording rule doing aggregation.

@svenmueller

This comment has been minimized.

Copy link
Author

svenmueller commented Jul 14, 2016

Ok, that makes sense to me. Thx for the feedback! I think i will reconsider my setup and eventually get rid of the global instance and do alerting/querying against the the specific Prometheus instances.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 14, 2016

It's fine to have a global Prometheus, it's handy for some types of analysis and alerts such as around capacity planning.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jul 15, 2016

Yeah, it's fine to have a global Prometheus, just most of the alerts should
be in the local ones.

Just for completeness' sake, even though it's not recommended, you can
pull all metrics over federation by simply matching e.g.:

{name=~".+"}

But yeah, matching against external labels doesn't work since those are
applied on the way out after metric selection.

On Thu, Jul 14, 2016 at 11:21 PM, Brian Brazil notifications@github.com
wrote:

It's fine to have a global Prometheus, it's handy for some types of
analysis and alerts such as around capacity planning.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1811 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAg1mKahr-mUF3GPGSBnG-pwGbB1Dbqqks5qVqh0gaJpZM4JLl1T
.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jul 15, 2016

Seems we can close this, as this is intended behavior. Could be clearer in the docs somewhere maybe.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.