Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraping both file_sd_configs and dns_sd_configs scrapes only dns #3365

Closed
theonlydoo opened this Issue Oct 27, 2017 · 10 comments

Comments

Projects
None yet
5 participants
@theonlydoo
Copy link

theonlydoo commented Oct 27, 2017

What did you do?

- job_name: 'foobar'
  scrape_interval: 30s
  file_sd_configs:
    - files:
      -  /etc/prometheus/config/_p_foobar_s__foobar/_p_foobar_s__foobar.json

- job_name: 'servdisc'
  scrape_interval: 30s
  dns_sd_configs:
    - names:
      - _supporthttp._tcp.barfoo-foobar.tld.

What did you expect to see?

Scraping on both targets

What did you see instead? Under which circumstances?

only scraping on dns_sd_configs

Environment

  • System information:

    Linux 4.9.0-3-amd64 x86_64 / debian stretch

  • Prometheus version:

$ prometheus --version
prometheus, version 1.8.0 (branch: HEAD, revision: 3569eef8b1bc062bb5df43181b938277818f365b)
  build user:       root@bd4857492255
  build date:       20171006-22:12:46
  go version:       go1.9.1
  • Prometheus configuration file:
scrape_configs:

  - job_name: 'foobar'
    scrape_interval: 30s
    file_sd_configs:
      - files:
        -  /etc/prometheus/config/_p_foobar_s__foobar/_p_foobar_s__foobar.json

  - job_name: 'servdisc'
    scrape_interval: 30s
    dns_sd_configs:
      - names:
        - _supporthttp._tcp.barfoo-foobar.tld.

the content of the json is :

[{
    "labels": {
        "env": "prod",
        "group": "foo_bar_service",
        "service_type": "webservice"
    },
    "target": [
        "foo01.tld:7646",
        "foo02.tld:7646",
        "foo03.tld:7646",
        "foo04.tld:7646"
    ]
}]

if I completely disable dns_sd_configs it is scraped again

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Dec 5, 2017

@theonlydoo I tried this out on 2.0.0 and wasn't able to reproduce the issue you're seeing. Can you try upgrading and confirm/deny that you still see the issue. My DNS setup wasn't entirely working (connection refused to the dns resolved address) but it's finding the target, as you can see below.

Targets:
2017-12-04-165702_477x357_scrot

Config:
2017-12-04-165702_477x357_scrot

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Dec 13, 2017

@theonlydoo
I did a big refactoring of the SD Service discovery if you want to give it a try and report if the bug is still there.

Here is a link to download an executable for Linux 64bit
https://github.com/krasi-georgiev/prometheus/releases/download/v2.0.0-beta.x/prometheus

@theonlydoo

This comment has been minimized.

Copy link
Author

theonlydoo commented Dec 19, 2017

@krasi-georgiev thank's for the link, I've tried it out :

not doing so well
vs the same config on a stable release (2.0, built on 08/11/2017) :

and still no file_sd scraping, for information here is a sanitized configfile :

# my global config
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s
  scrape_timeout:       15s
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'foobar'

# Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['127.0.0.1:9090']


### Infrastructure autoconf ###

scrape_configs:



  - job_name: 'foobarbaz'
    scrape_interval: 30s
    file_sd_configs:
      - files:
        -  /etc/prometheus/config/_p_foobarbaz_s__foobarstats/_p_foobarbaz_s__foobarstats.json


  - job_name: 'servdisc'
    scrape_interval: 30s
    dns_sd_configs:
      - names:
        - _supporthttp._tcp.foobarbaz.sd.

as you can see, I've got both dns_sd and file_sd but only dns_sd is handled in both cases, even though promtool fully validates this config

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Dec 20, 2017

@theonlydoo is this a copy/paste error you you really have 2 x scrape_configs: blocks?
I think this is what is causing your issue.

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Dec 20, 2017

@theonlydoo can you try some things?

  • removing the . at the end of your dns srv name
  • file permissions for your file_sd files, does the user running prometheus have read access for those files?

here's my example config:

scrape_configs:
- job_name: "overwritten-default"
  scrape_interval: 10s

  file_sd_configs:
  - files:
    - filesd1.json
  dns_sd_configs:
  - names:
    - _http._tcp.appServer

which works even if I move one of the SD methods into a separate job

@theonlydoo

This comment has been minimized.

Copy link
Author

theonlydoo commented Dec 21, 2017

@krasi-georgiev I have 2 scrape configs, but the first one is not read by prom. I've removed it, and still no file_sd_configs enabled

@cstyan I've tried to remove the . at the end of my internal tld in the config : still the same.

File permissions are OK, if I disable the dns_sd_config, I do not see any file_sd with this config. This is weird, since I've tried this simple config :

scrape_configs:
  - job_name: 'node_exporter'
    scrape_interval: 30s
    file_sd_configs:
      - files:
        -  /etc/prometheus/config/_p_node_exporter_s__cassandra_log/_p_node_exporter_s__cassandra_log.json



  - job_name: 'servdisc'
    scrape_interval: 30s
    dns_sd_configs:
      - names:
        - _supporthttp._tcp.foobar.sd.

it starts and validate, but I have no file_sd job

If I move

/etc/prometheus/config/_p_node_exporter_s__cassandra_log/_p_node_exporter_s__cassandra_log.json

it doesnt raises an error (so the file is not watched, as I do not see it while I do a lsof -p $(pidof prometheus)

and if I append a new file_sd who doesnt exist at all, promtool refuses to validate the configuration and prometheus refuses to start.

@theonlydoo

This comment has been minimized.

Copy link
Author

theonlydoo commented Dec 21, 2017

OK i've got it... This was a HUGE pebkac, apparently, there is no error raised when the json config file is "not readable" for prometheus, so it is silently ignored. I've found the error while doing a diff between my previous config management branch and this one.

On one hand, you had :

[{
    "labels": {
        "env": "prod", 
        "group": "foo_log", 
        "hosting": "company"
    }, 
    "targets": [
        "bar-3.foo:19100", 
        "bar-1.foo:19100", 
        "bar-2.foo:19100"
    ]
}]

on the other :

[{
    "labels": {
        "env": "prod", 
        "group": "foo_log", 
        "hosting": "company"
    }, 
    "target": [
        "bar-3.foo:19100", 
        "bar-1.foo:19100", 
        "bar-2.foo:19100"
    ]
}]

so the typo between target and targets raised no error, json was still valid but syntactically incorrect.

So the problem was not at all in the file_sd or dns_sd, but in the json parsing!

Thank you all for the debug effort 😃

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Dec 21, 2017

That sounds like a bug on our side, we should be verifying that the files parse and not doing an update if they don't - same as if EC2 started returning errors half way through a poll.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Dec 21, 2017

strict JSON unmarshaling will be added in golang 1.10
golang/go#15314

in the meanwhile we can use the function from this PR to return an error if the parsed file has some unknown fields
sloppyio/cli#1

I will open a PR.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.