Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

selective metric drop with relabel config #3092

Closed
lastsky opened this Issue Aug 20, 2017 · 5 comments

Comments

Projects
None yet
2 participants
@lastsky
Copy link

lastsky commented Aug 20, 2017

node_systemd_unit_state is expensive_metric so I spent about 4h for googling and testing different configs and trying to create rule:

keep:

node_systemd_unit_state{name="nginx.service",state="active"} 1
node_systemd_unit_state{name="sshd.service",state="active"} 1

drop all others:

<...>
node_systemd_unit_state{name="nginx.service",state="deactivating"} 0
node_systemd_unit_state{name="nginx.service",state="failed"} 0
node_systemd_unit_state{name="nginx.service",state="inactive"} 0
node_systemd_unit_state{name="systemd-logind.service",state="active"} 1
node_systemd_unit_state{name="systemd-logind.service",state="deactivating"} 0
node_systemd_unit_state{name="systemd-logind.service",state="failed"} 0
node_systemd_unit_state{name="systemd-logind.service",state="inactive"} 0
<...>
  • Prometheus version:

version 1.7.1

  • Prometheus configuration file:
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 15s
    file_sd_configs:
        - files:
            - '/etc/prometheus/file_sd_configs/*.yml'

rule_files:
    - '/etc/prometheus/rules/*.conf'

not working strings:

- metric_relabel_config:
    - source_labels: [__name__, name, state]
      regex: node_systemd_unit_state;nginx.service;active
      relabel_action: keep

    - source_labels: [__name__, name, state]
      regex: node_systemd_unit_state;sshd.service;active
      relabel_action: keep

   - source_labels: [__name__]
     regex: node_systemd_unit_state
     relabel_action: drop

target sources for selective label drop:

# cat /etc/prometheus/file_sd_configs/hardware.yml 
- targets:
    - "10.137.161.82:9100"
    - "10.137.161.83:9100"
    - "10.137.161.84:9100"
    - "10.137.161.85:9100"
  labels:
    job: node

I'd tested about ~20 examples, and after time curl -XDELETE -g 'http://localhost:9090/api/v1/series?match[]=node_systemd_unit_state' I see them in prometheus again. If prometheus starts, of course.

Would you like to explain where I should insert metric_relabel_config? to prometheus.yml or to static configs? Please add working example to docs.

Tired. Thanks.

@lastsky lastsky changed the title relabel_config selective metric drop with relabel config Aug 20, 2017

@lastsky

This comment has been minimized.

Copy link
Author

lastsky commented Aug 20, 2017

another 4 hours of googling examples.

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 15s
    file_sd_configs:
        - files:
            - '/etc/prometheus/file_sd_configs/*.yml'

    metric_relabel_configs:
      - source_labels: [__name__,name]
        regex:         node_systemd_unit_state;nginx.*
        action:        drop

this works for drop labels like:

node_systemd_unit_state{name="nginx.service",state="activating"} 0
node_systemd_unit_state{name="nginx.service",state="active"} 1
node_systemd_unit_state{name="nginx.service",state="deactivating"} 0
node_systemd_unit_state{name="nginx.service",state="failed"} 0
node_systemd_unit_state{name="nginx.service",state="inactive"} 0

ok. but any types of exclude inside regex doesnt work.

@lastsky

This comment has been minimized.

Copy link
Author

lastsky commented Aug 20, 2017

so temporary solution is -collector.systemd.unit-whitelist=(nginx.service|ssh.service) but I would like to know the correct solution for flexible and selective metric drops in scrape_configs.

how to:

  1. keep node_systemd_unit_state for list of services and state="active":
node_systemd_unit_state{name="nginx.service",state="active"} 1
node_systemd_unit_state{name="sshd.service",state="active"} 1
  1. drop other node_systemd_unit_state*:
<...>
node_systemd_unit_state{name="nginx.service",state="deactivating"} 0
node_systemd_unit_state{name="nginx.service",state="failed"} 0
node_systemd_unit_state{name="nginx.service",state="inactive"} 0
node_systemd_unit_state{name="systemd-logind.service",state="active"} 1
node_systemd_unit_state{name="systemd-logind.service",state="deactivating"} 0
node_systemd_unit_state{name="systemd-logind.service",state="failed"} 0
node_systemd_unit_state{name="systemd-logind.service",state="inactive"} 0
<...>
  1. and keep all other metrics from target: node_.*, process_.* etc.

read node_systemd_unit_state as another_i_fucked_this_target_expensive_metric if you wish.

@lastsky lastsky closed this Aug 20, 2017

@lastsky lastsky reopened this Aug 21, 2017

@lastsky

This comment has been minimized.

Copy link
Author

lastsky commented Aug 21, 2017

this is NOT urgent. but please provide an example. thanks.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Oct 18, 2017

It makes more sense to ask questions like this on the prometheus-users mailing list rather than in a GitHub issue. On the mailing list, more people are available to potentially respond to your question, and the whole community can benefit from the answers provided.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.