Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus federation #2318
Comments
This comment has been minimized.
This comment has been minimized.
|
Two mistakes here: First, to run two Prometheus servers for redundancy and (manual) failover, run them both in parallel with the same config, i.e. both will scrape all targets directly, rather than having one federate from the other. Second, the |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil This can be closed as it is a usage question. |
brian-brazil
closed this
Jun 14, 2017
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
vivekbny commentedJan 4, 2017
•
edited
###Hi All,
We are trying to cluster the prometheus instances. We are creating two prometheus instances. if one goes down(can be any reason), we should be able to retreive the data from the other instance.
Below is our yml file for 1st prometheus instance.(r0106sn0v)
#my global config
global:
scrape_interval: 15s #By default, scrape targets every 15 seconds.
evaluation_interval: 15s #By default, scrape targets every 15 seconds.
#scrape_timeout is set to the global default (10s).
Attach these labels to any time series or alerts when communicating with
#external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'prometheus.Prometheus-PRD-Env'
#Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:
#- "first.rules"
#- "second.rules"
#A scrape configuration containing exactly one endpoint to scrape:
#Here it's prometheus.Prometheus itself.
#DEV 1 Cluster
scrape_configs:
#The job name is added as a label
job=<job_name>to any timeseries scraped from this config.job_name: 'Cluster01'
#Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 60s
#metrics_path defaults to '/metrics'
#scheme defaults to 'http'.
static_configs:
- targets: ['r01055n0v.bnymellon.net:9100']
job_name: 'federate'
scrape_interval: 15s
honor_labels: true
metrics_path: '/federate'
params:
match[]: [job=="Cluster01"]
static_configs:
- targets: ['r01071n0v.bnymellon.net:9090']
for 2nd prometheus instance(r01071n0v)
#my global config
global:
scrape_interval: 15s #By default, scrape targets every 15 seconds.
evaluation_interval: 15s #By default, scrape targets every 15 seconds.
#scrape_timeout is set to the global default (10s).
#Attach these labels to any time series or alerts when communicating with
#external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'prometheus.Prometheus-PRD-Env'
#Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:
#- "first.rules"
#- "second.rules"
#A scrape configuration containing exactly one endpoint to scrape:
#Here it's prometheus.Prometheus itself.
#DEV 1 Cluster
scrape_configs:
#The job name is added as a label
job=<job_name>to any timeseries scraped from this config.job_name: 'Cluster02'
#Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 60s
#metrics_path defaults to '/metrics'
#scheme defaults to 'http'.
static_configs:
- targets: ['r01055n0v.bnymellon.net:9100']
job_name: 'federate'
scrape_interval: 15s
honor_labels: true
metrics_path: '/federate'
params:
match[]: [job=="Cluster02"]
static_configs:
- targets: ['r0106sn0v.bnymellon.net:9090']
When we try to access the federate URL; we are getting below error
parse error at char 4: could not parse remaining input "=="Cluster01""...
Is the above method is right way for federation??
Our current architecture is as below:
Exporters-> Promethues -> grafana
we are trying to setup the promethues in a way that if one node goes down the data should be still seen at grafana coming from the other node to avoid a single point of failure.