Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus federation #2318

Closed
vivekbny opened this Issue Jan 4, 2017 · 3 comments

Comments

Projects
None yet
4 participants
@vivekbny
Copy link

vivekbny commented Jan 4, 2017

###Hi All,

We are trying to cluster the prometheus instances. We are creating two prometheus instances. if one goes down(can be any reason), we should be able to retreive the data from the other instance.

Below is our yml file for 1st prometheus instance.(r0106sn0v)
#my global config
global:
scrape_interval: 15s #By default, scrape targets every 15 seconds.
evaluation_interval: 15s #By default, scrape targets every 15 seconds.
#scrape_timeout is set to the global default (10s).

Attach these labels to any time series or alerts when communicating with
#external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'prometheus.Prometheus-PRD-Env'

#Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:

  • "alert.rules"
    #- "first.rules"
    #- "second.rules"

#A scrape configuration containing exactly one endpoint to scrape:
#Here it's prometheus.Prometheus itself.
#DEV 1 Cluster
scrape_configs:
#The job name is added as a label job=<job_name> to any timeseries scraped from this config.

  • job_name: 'Cluster01'

    #Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 60s

    #metrics_path defaults to '/metrics'
    #scheme defaults to 'http'.

    static_configs:
    - targets: ['r01055n0v.bnymellon.net:9100']

  • job_name: 'federate'
    scrape_interval: 15s
    honor_labels: true
    metrics_path: '/federate'

    params:
    match[]: [job=="Cluster01"]

    static_configs:
    - targets: ['r01071n0v.bnymellon.net:9090']

for 2nd prometheus instance(r01071n0v)
#my global config
global:
scrape_interval: 15s #By default, scrape targets every 15 seconds.
evaluation_interval: 15s #By default, scrape targets every 15 seconds.
#scrape_timeout is set to the global default (10s).

#Attach these labels to any time series or alerts when communicating with
#external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'prometheus.Prometheus-PRD-Env'

#Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:

  • "alert.rules"
    #- "first.rules"
    #- "second.rules"

#A scrape configuration containing exactly one endpoint to scrape:
#Here it's prometheus.Prometheus itself.
#DEV 1 Cluster
scrape_configs:
#The job name is added as a label job=<job_name> to any timeseries scraped from this config.

  • job_name: 'Cluster02'

    #Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 60s

    #metrics_path defaults to '/metrics'
    #scheme defaults to 'http'.

    static_configs:
    - targets: ['r01055n0v.bnymellon.net:9100']

  • job_name: 'federate'
    scrape_interval: 15s
    honor_labels: true
    metrics_path: '/federate'

    params:
    match[]: [job=="Cluster02"]

    static_configs:
    - targets: ['r0106sn0v.bnymellon.net:9090']

When we try to access the federate URL; we are getting below error
parse error at char 4: could not parse remaining input "=="Cluster01""...

Is the above method is right way for federation??

Our current architecture is as below:
Exporters-> Promethues -> grafana

we are trying to setup the promethues in a way that if one node goes down the data should be still seen at grafana coming from the other node to avoid a single point of failure.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 4, 2017

Two mistakes here:

First, to run two Prometheus servers for redundancy and (manual) failover, run them both in parallel with the same config, i.e. both will scrape all targets directly, rather than having one federate from the other.

Second, the match[] parameter must be an instant vector selector as per https://prometheus.io/docs/operating/federation/#configuring-federation . [job=="Cluster02"] is not an instant vector selector. You probably want {job="Cluster02"}.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Jun 14, 2017

@brian-brazil This can be closed as it is a usage question.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.