Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

targets defined by consul_sd_config listed multiple times #1083

Closed
guoshimin opened this Issue Sep 14, 2015 · 23 comments

Comments

Projects
None yet
5 participants
@guoshimin
Copy link

guoshimin commented Sep 14, 2015

I built a version from master on 20150909. The commit hash is 9a70ee7. I use consul service discovery to define targets. Over time, the same targets would be added again and again. Over the weekend, some targets were added half a dozen times on the status page.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Sep 14, 2015

If you hover over the "Base Labels" column in the status page, you are shown some labels.
Can you verify whether those are also the same for all those targets?

Also your configuration would be helpful, especially any relabeling.

@robbiet480

This comment has been minimized.

Copy link
Contributor

robbiet480 commented Sep 19, 2015

I'm also having the same problem now, but on 0.16.0rc1. Base Labels are the same for duplicated targets. My config is below:

global:
  scrape_interval:     15s  # default = every 15 seconds.
  evaluation_interval: 15s  # default = every 15 seconds.
  # scrape_timeout is set to the global default (10s).

rule_files:
  - "/etc/prometheus/rules/prometheus.rules"

scrape_configs:

  - job_name: 'prometheus'
    scrape_interval: 5s
    scrape_timeout: 10s
    target_groups:
      - targets: ['localhost:9090']

  - job_name: 'consul'
    target_groups:
      - targets: ['localhost:9107']

  - job_name: "overwritten-default"
    consul_sd_configs:
    - server:   '127.0.0.1:8500'
      datacenter: 'aws-us-east-1'
      services: ['api', 'data-processor', 'data-importer', 'app-proxy', 'memcached_exporter', 'node_exporter', 'jmx_exporter', 'cloudwatch']

    relabel_configs:
    - source_labels: ['__meta_consul_service']
      regex:         '(.*)'
      target_label:  'job'
      replacement:   '$1'
    - source_labels: ['__meta_consul_tags']
      regex:         '.*,jobname=(.*?),.*'
      target_label:  'job'
      replacement:   '$1'
    - source_labels: ['__meta_consul_node']
      regex:         '(.*)'
      target_label:  'instance'
      replacement:   '$1'
    - source_labels: ['__meta_consul_tags']
      regex:         '.*,Type=(.*?),.*'
      target_label:  'purpose'
      replacement:   '$1'
    - source_labels: ['__meta_consul_tags']
      regex:         '.*,Environment=(.*?),.*'
      target_label:  'environment'
      replacement:   '$1'
    - source_labels: ['__meta_consul_tags']
      regex:         '.*,instancetype=(.*?),.*'
      target_label:  'instancetype'
      replacement:   '$1'
    - source_labels: ['__meta_consul_tags']
      regex:         '.*,jmx_type=(.*?),.*'
      target_label:  'jmx_monitored_application'
      replacement:   '$1'
    - source_labels: ['__meta_consul_tags']
      regex:         '.*,jmx_type=(.*?),.*'
      target_label:  'job'
      replacement:   '$1'
    - source_labels: ['__meta_consul_dc']
      regex:         '(.*?)-(.*)'
      target_label:  'vendor'
      replacement:   '$1'
    - source_labels: ['__meta_consul_dc']
      regex:         '(.*?)-(.*)'
      target_label:  'region'
      replacement:   '$2'

    # metric_relabel_configs:
    # - source_labels: ['exported_job']
    #   regex:         '(.*?)'
    #   target_label:  'job'
    #   replacement:   '$1'
@robbiet480

This comment has been minimized.

Copy link
Contributor

robbiet480 commented Sep 19, 2015

FYI Reverting to Prometheus 0.15.1 fixes this problem and doesn't leave the duplicated targets in my status page

@guoshimin

This comment has been minimized.

Copy link
Author

guoshimin commented Sep 19, 2015

Base labels completely identical.

Config:

consul_sd_configs:
  - server: 'consul1.test.databricks.com:8500'
    services:
      - 'jenkins-slave'
relabel_configs:
  - source_labels: ['__meta_consul_address']
    target_label: '__address__'
    regex: '(.*)'
    replacement: '${1}:8080'
  - source_labels: ['__meta_consul_tags']
    target_label: 'slave_name'
    regex: '.*slave_name=([^,]*).*'
    replacement: '${1}'
  - source_labels: ['__meta_consul_tags']
    target_label: 'public_hostname'
    regex: '.*public_hostname=([^,]*).*'
    replacement: '${1}'

On Fri, Sep 18, 2015 at 6:43 PM, Robbie Trencheny notifications@github.com
wrote:

FYI Reverting to Prometheus 0.15.1 fixes this problem and doesn't leave
the duplicated targets in my status page


Reply to this email directly or view it on GitHub
#1083 (comment)
.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Sep 19, 2015

Sorry, I should have been clearer. If you move your mouse over the base labels you see in general, a tooltip pops up that shows the label set before digestion to the base labels. The question is whether the labels in this those tooltips are also identical.

@robbiet480

This comment has been minimized.

Copy link
Contributor

robbiet480 commented Sep 19, 2015

@fabxc Yup, those are the labels I checked. Base labels are the same across duplicated targets. I checked multiple tags and targets to make sure.

@guoshimin

This comment has been minimized.

Copy link
Author

guoshimin commented Sep 19, 2015

Yep, same here. And it would be nice if the base labels can be displayed in
a copy-and-pastable way.
On Sep 19, 2015 5:13 AM, "Robbie Trencheny" notifications@github.com
wrote:

@fabxc https://github.com/fabxc Yup, those are the labels I checked.
Base labels are the same across duplicated targets. I checked multiple tags
and targets to make sure.


Reply to this email directly or view it on GitHub
#1083 (comment)
.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 19, 2015

The current way they're displayed was a quick hack to aid debugging, ultimately there should be a page showing a breakdown of all the labels.

@fabxc fabxc added this to the v0.16.0 milestone Sep 21, 2015

@robbiet480

This comment has been minimized.

Copy link
Contributor

robbiet480 commented Sep 21, 2015

I'm assuming that this was caused by one of these commits: 4e84b86#diff-49cc5c249318707068b4168c27ab0260, 0138d37#diff-49cc5c249318707068b4168c27ab0260

Will look into it more in the next few days and will report back with what I find out.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Sep 21, 2015

Yes, those are the most likely candidates. Thanks.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Sep 28, 2015

@robbiet480 @guoshimin there's a slight chance that this was coincidentally fixed by #1116, which has just been merged. Could you give master another try?

@guoshimin

This comment has been minimized.

Copy link
Author

guoshimin commented Sep 29, 2015

Built a binary at bf4e4a8, still having the same problem.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Sep 30, 2015

Ok, thanks! We'll need to dig deeper into that then.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Oct 6, 2015

@robbiet480 Any news on figuring out which of the two commits you mentioned creates this bug?

@robbiet480

This comment has been minimized.

Copy link
Contributor

robbiet480 commented Oct 6, 2015

@juliusv Haven't gotten to it yet, wrote a note to test again later today and will report back.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Oct 6, 2015

@robbiet480 Awesome - that's the last critical bug before we can release 0.16.0 :)

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Oct 8, 2015

I now managed to reproduce this locally against our own Consul setup (we don't usually use Consul's interface directly for SD). Digging deeper now.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Oct 8, 2015

Can confirm that this regression was introduced by #970. Looking deeper...

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Oct 8, 2015

Found the bug, PR incoming.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Oct 8, 2015

Fix PR is out here: #1151

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Oct 8, 2015

Thanks a lot!

On Thu, Oct 8, 2015, 6:55 PM Julius Volz notifications@github.com wrote:

Fix PR is out here: #1151
#1151


Reply to this email directly or view it on GitHub
#1083 (comment)
.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Oct 8, 2015

@fabxc It's the only thing blocking a 0.16.0 release :) Go go go :P

juliusv added a commit that referenced this issue Oct 9, 2015

Fix SD mechanism source prefix handling.
The prefixed target provider changed a pointerized target group that was
reused in the wrapped target provider, causing an ever-increasing chain
of source prefixes in target groups from the Consul target provider.

We now make this bug generally impossible by switching the target group
channel from pointer to value type and thus ensuring that target groups
are copied before being passed on to other parts of the system.

I tried to not let the depointerization leak too far outside of the
channel handling (both upstream and downstream) because I tried that
initially and caused some nasty bugs, which I want to minimize.

Fixes #1083

@juliusv juliusv closed this in #1151 Oct 9, 2015

juliusv added a commit that referenced this issue Oct 16, 2015

Fix SD mechanism source prefix handling.
The prefixed target provider changed a pointerized target group that was
reused in the wrapped target provider, causing an ever-increasing chain
of source prefixes in target groups from the Consul target provider.

We now make this bug generally impossible by switching the target group
channel from pointer to value type and thus ensuring that target groups
are copied before being passed on to other parts of the system.

I tried to not let the depointerization leak too far outside of the
channel handling (both upstream and downstream) because I tried that
initially and caused some nasty bugs, which I want to minimize.

Fixes #1083

fabxc added a commit that referenced this issue Jan 11, 2016

Fix SD mechanism source prefix handling.
The prefixed target provider changed a pointerized target group that was
reused in the wrapped target provider, causing an ever-increasing chain
of source prefixes in target groups from the Consul target provider.

We now make this bug generally impossible by switching the target group
channel from pointer to value type and thus ensuring that target groups
are copied before being passed on to other parts of the system.

I tried to not let the depointerization leak too far outside of the
channel handling (both upstream and downstream) because I tried that
initially and caused some nasty bugs, which I want to minimize.

Fixes #1083
@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.