Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus not scraping all subdomains correctly? #4646

Closed
augmenter opened this Issue Sep 21, 2018 · 14 comments

Comments

Projects
None yet
2 participants
@augmenter
Copy link

augmenter commented Sep 21, 2018

Prometheus 2.3.2 (tried 2.4.2 which did not scrape any of the below).

I have 3 identical jobs, all connecting to a working product but a different instance.
See configuration below to see Job settings.

  • Job1 works perfectly
  • Job2 shows up as 'UP' in the Targets GUI, but data does not show in Graph Gui.
  • Job3 is the scrape data from Job3 downloaded by hand and saved to a local server as a .txt file, it works perfectly and the data gets scraped.

Job1 and Job2 both return the same headers on request, and both are HTTPS with a valid certificate. This cant be a format issue in the metric, as job3 is verbatim txt copy of job2, and it works?

Prometheus seems to not scrape subdomains unless they are www. ?
This my only reasoning that I was able to do.

When changing Job1 from www. to bo. subdomain, it stops scraping as well.

Proposal

This should just work.

  • System information:

    Windows 10

  • Prometheus version:

    prometheus, version 2.3.2 (branch: HEAD, revision: 71af5e2)
    build user: root@5258e0bd9cc1
    build date: 20180712-14:13:08
    go version: go1.10.3

  • Prometheus configuration file:

  - job_name: 'job1'
    scheme: https
    static_configs:
      - targets: ['www.example.com']
    metrics_path: '/prom/?txt'  
  - job_name: 'job2'
    scheme: https
    static_configs:
      - targets: ['api.example2.com']
    metrics_path: '/prom/?txt'    
  - job_name: 'job3'
    scheme: http
    static_configs:
      - targets: ['test.ddev']
    metrics_path: '/ca_prom_tmp.txt'     
@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Sep 21, 2018

Prometheus doesn't do anything fancy with respect to DNS lookups. You can have a look at the targets page in the UI to get more information. Also running the server with --log.level=debug.

@augmenter

This comment has been minimized.

Copy link
Author

augmenter commented Sep 21, 2018

@simonpasquier: Log file
level=info ts=2018-09-21T13:17:05.72614Z caller=main.go:543 msg="TSDB started" level=info ts=2018-09-21T13:17:05.72614Z caller=main.go:603 msg="Loading configuration file" filename=prometheus.yml level=info ts=2018-09-21T13:17:05.7291404Z caller=main.go:629 msg="Completed loading of configuration file" filename=prometheus.yml level=info ts=2018-09-21T13:17:05.7291404Z caller=main.go:502 msg="Server is ready to receive web requests."

This is all I see with --log.level=debug

Gui shows the target as UP. Last scrape: 6.498s ago.

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Sep 21, 2018

What does {job="job2"} return?

@augmenter

This comment has been minimized.

Copy link
Author

augmenter commented Sep 21, 2018

`
scrape_duration_seconds{instance="api.example2.com:443",job="job2"} 0.1509415

scrape_samples_post_metric_relabeling{instance="in.game-program.com:443",job="job2"} 0

scrape_samples_scraped{instance="api.example2.com:443",job="job2"} 0

up{instance="api.example2.com:443",job="job2"} 1

`

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Sep 21, 2018

hmm, why is it that the instance label has changed for scrape_samples_post_metric_relabeling? Have you pasted the full scrape configuration?

@augmenter

This comment has been minimized.

Copy link
Author

augmenter commented Sep 21, 2018

Yes this is it:

`
global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 15s
alerting:
alertmanagers:

  • static_configs:
    • targets: []
      scheme: http
      timeout: 10s
      scrape_configs:
  • job_name: job2
    scrape_interval: 15s
    scrape_timeout: 10s
    metrics_path: /prom/?txt
    scheme: https
    static_configs:
    • targets:
      • api.example.com
        `
@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Sep 21, 2018

Really Prometheus doesn't have special requirements on DNS names. If scrape_samples_scraped{instance="api.example2.com:443",job="job2"} is 0 and up is 1 for the same job/instance then it means that the endpoint didn't expose any metric.

@augmenter

This comment has been minimized.

Copy link
Author

augmenter commented Sep 21, 2018

But if i point my browser to the same url, or curl the url, I can download the scrape contents fine. I put them in Job3 and they work. Can I somehow enable more logging by editing the source code and trying that?

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Sep 25, 2018

Yes additional logs is your best bet. You can look here:

scrapeErr := sl.scraper.scrape(scrapeCtx, buf)

@augmenter

This comment has been minimized.

Copy link
Author

augmenter commented Oct 3, 2018

I don't know how to build Prometheus on Go. However I tried the proomtool to check the metrics. And it gives the following output:
error while linting: text format parsing error in line 1955: invalid metric name
Which is unfortunately the last line of the metric (followed by 4 newlines x0D00 x0A00):
`telemetry_kafka_total{group="raw_bets",server="ca-v2"} 171
telemetry_kafka_total{group="raw_player",server="ca-v2"} 108

However the same test fails on a working endpoint that is being scrapped fine, so maybe its a proom tool bug...

`

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Oct 3, 2018

The 0x0D character is the culprit. The Prometheus format only supports line-feeds, see https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details

@augmenter

This comment has been minimized.

Copy link
Author

augmenter commented Oct 3, 2018

Ok, until I removed the trailing \r I was recieving errors like "error while linting: text format parsing error in line 1731: invalid metric name", and "api_response_average counter metrics should have "_total" suffix". When I fixed those, the promtool stop returning errors. However the scraping still does not work. When I curl the metric output to a txt file, the contents are visible.

$ curl -s https://example.com/prom?txt > metric.txt
$ curl -s https://example.com/prom?txt | ./promtool.exe check metrics

Odd thing about all this is, that the working domain had the same wrong line ending and some wrong types, and it could still be scraped.
Working endponit metric is 170k, non-working is 100k, so file size cant be an issue... But working domain has gzip enabled, while the non-working does not, but in the docs it says its optional.

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Oct 4, 2018

Very weird... Would you be able to run the same configuration on a Linux box so that we can ensure whether it is a platform issue or a more general one?

@augmenter

This comment has been minimized.

Copy link
Author

augmenter commented Oct 4, 2018

Tried running it in Docker for Mac, same result. Retried on windows again with 2.4.2 after fixing scraping errors from promtool, still no scraping is being done. Ran promtool on both 2.3.2 and 2.4.2 gave no errors.

@augmenter augmenter closed this Oct 8, 2018

@lock lock bot locked and limited conversation to collaborators Apr 6, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.