Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate static_labels from target collectors config to metrics to enable re-use of query config across fleet of identical but unique server DSN's #58

Open
michaeljoy opened this issue Feb 27, 2020 · 2 comments

Comments

@michaeljoy
Copy link

This is a usability feature req, more than anything, but enables the use of this at scale across immense fleets without needless duplication of config entries.

In the following scenario, having the ability to propagate a static_label from the target collectors config to the metrics utilized as an inherited label would be amazing. Granted you can't technically label the target, but the metrics should not have to be uniquely configured for each and every unique DSN in a unique metric collector config when they are identical in every way except for label. Adding the ability to specify a static label at the target/collector would enable clean and reproducible filtering of targets in queries and dashboards without having unnecessary duplicative metric collector configs.

In our deployment scenario, we have no static collector hosts, and all scrapers are serverless hosts with dynamic DNS/Host names dynamically registered to Prometheus.

Scenario:

  • multiple db servers with identical databases, all uniquely identified by by a static cluster identifier
  • all db servers require the same monitoring queries, only difference between each db server / cluster is the label.

Problem:

  • Currently you have to create a UNIQUE metric config file for EACH and EVERY individual DSN to get UNIQUE labels on them to enable you to filter in prometheus per UNIQUE ID without breaking queries, dashboards, and alerts every time the scraper cycles.
  • In our case, each db server has a remote sql_exporter scraper running in an ephemeral docker container with dynamic names and addresses. The addresses are dynamically registered to SRV and picked up by prometheus when the container cycles to ensure monitoring continuity on redeploy updates of the container and container hardware failure / lifecycle.

Result:

  • 1000's of individual config metric templates each only differing with dozens of unique labels identifying the DSN/server/cluster they are associated with.
  • 1000's of metric configurations only different by a cluster_identifier label (this seems needlessly duplicative, and if propagated from target, would enable re-use of metric config's across all servers of similar persuasion

Preferred Result:

  • 1 config per DSN
  • 1 shared config for all DSN's propagating the DSN's unique label / identifier specified in the DSN config
  • Eliminates needless config duplication, and minimizes chance of accidental config drift when adding, removing, and updating monitoring queries across thousands of sql_exporter containers since all config's are identical save for the DSN
  • Makes templating of the config generation far less painful and needlessly duplicative
  • Perhaps this identifier label could be applied to the collector_files/collector_names to minimize duplication?

Current State Example:

  • unique-server-id.yml
global:
  scrape_timeout: 5m
  # Subtracted from Prometheus' scrape_timeout to give us some headroom and prevent Prometheus from timing out first.
  scrape_timeout_offset: 500ms
  # Minimum interval between collector runs: by default (0s) collectors are executed on every scrape.
  min_interval: 30s
  # Maximum number of open connections to any one target. Metric queries will run concurrently on multiple connections,
  # as will concurrent scrapes.
  max_connections: 3
  # Maximum number of idle connections to any one target. Unless you use very long collection intervals, this should
  # always be the same as max_connections.
  max_idle_connections: 3

# The target to monitor and the collectors to execute on it.
target:
  data_source_name: "postgres://secret_monitoring_user:SuperSecreteMonitoringPasswordHere@unique-server-id.fqdn.tld:5439/unique_identifier?sslmode=require"
  collectors: [unique-server-id]

# Collector files specifies a list of globs. One collector definition is read from each matching file.
collector_files:
  - "unique-server-id.collector.yml"

(Example with only 1 instead of dozens metrics for clarity and brevity. Primary problem lies in requiring a static_label per metric_name, when the static_label is really just a static label for the DSN to enable metric filtering based on that static label)

  • "unique-server-id.collector.yml"

metrics:
  - metric_name: prefix_something_nothing_age_seconds
    static_labels:
      cluster_identifier: unique-server-id
    type: gauge
    help: "Age in seconds of something from nothing"
    values: [age_seconds]
    query: |
      SELECT datediff(s, timestamp, getdate()) AS age_seconds FROM something.nothing ORDER BY timestamp DESC limit 1;
@free
Copy link
Owner

free commented Mar 24, 2020

Why not apply the "static label per DSN" in the Prometheus config? Something like this:

scrape_configs:
  - job_name: 'somedb'
    static_configs:
      - targets: [ 'instance1.cluster13.foo.org:9399', 'instance2.cluster13.foo.org:9399' ]
        labels:
            cluster_identifier: cluster13
      - targets: [ 'instance1.cluster14.foo.org:9399', 'instance2.cluster14.foo.org:9399' ]
        labels:
            cluster_identifier: cluster14

This also works with file_sd_config, but you have to place the labels in the targets file, next to the targets themselves (i.e. the part after static_configs above all goes into the targets file).

Or, if you're using some other form of service discovery (e.g. kubernetes_sd_config), you can use metric_relabel_configs to create a cluster_identifier label out of e.g. the instance name. It's all documented here: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config

@michaeljoy
Copy link
Author

True, dns_sd_config just gets messy with the regex, and we don't necessarily name the scraper running the same as the cluster_identifier as they aren't running on the same hosts.

That is certainly a valid idea, however we also have other labels we want to add that don't belong in the DNS like customer_name, customer_id, etc... so it still gets super messy trying to do it in the prometheus config (also we want this to be dynamic).

Long term we'll have to replace the service discovery with Consul to be able to do this OOB tagging without having to muck with automating hundreds of duplicative file templates that have to be pulled down at container start to bootstrap the config.

I guess, the min-maxer in me just wants to see the labels created once with the DSN instead of dozens of times per file alongside the DSN in the query target config since you can't have more than one server per config anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants