Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posting alert to alertmanager does not work with DNS wildcard record #611

Closed
quolix opened this Issue Mar 23, 2015 · 8 comments

Comments

Projects
None yet
2 participants
@quolix
Copy link

quolix commented Mar 23, 2015

On trying to post an alert, prometheus outputs:

Error sending notification: Post http://_prom-alertmanager._tcp.marathon.mesos.redir.corp.quobyte.com/api/alerts: dial tcp: lookup _prom-alertmanager._tcp.marathon.mesos.redir.corp.quobyte.com: no such host

redir.corp.quobyte.com is a wildcard DNS record that points to a http redirector (which would send a 303 to the http client in prometheus).

prometheus is invoked with

  -alertmanager.url="http://_prom-alertmanager._tcp.marathon.mesos.redir.corp.quobyte.com" \
$ dig _prom-alertmanager._tcp.marathon.mesos.redir.corp.quobyte.com

; <<>> DiG 9.9.4-RedHat-9.9.4-14.el7_0.1 <<>> _prom-alertmanager._tcp.marathon.mesos.redir.corp.quobyte.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23640
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;_prom-alertmanager._tcp.marathon.mesos.redir.corp.quobyte.com. IN A

;; ANSWER SECTION:
_prom-alertmanager._tcp.marathon.mesos.redir.corp.quobyte.com. 0 IN A 192.168.1.250

;; Query time: 1 msec
;; SERVER: 192.168.1.32#53(192.168.1.32)
;; WHEN: Mo Mär 23 17:51:45 CET 2015
;; MSG SIZE  rcvd: 95
@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Mar 23, 2015

Strange, I couldn't reproduce it with this little snippet yet (which is basically what Prometheus does internally):

package main

import (
  "fmt"
  "time"

  "github.com/prometheus/prometheus/utility"
)

func main() {
  c := utility.NewDeadlineClient(10 * time.Second)
  resp, err := c.Post("http://foobar.mindbasket.com/", "text/plain", nil)
  fmt.Println(resp.Status, err)
}

That seems to work for *.mindbasket.com. Not exactly sure what to try next. Maybe I could come and work from your office for a day :)

@quolix

This comment has been minimized.

Copy link
Author

quolix commented Mar 23, 2015

Update: this is a plain CentOS 7. wget and curl have the same problem, whereas dig and nslookup happily return the record.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Mar 23, 2015

What does your /etc/resolv.conf look like? Also, tcpdumping the difference between dig and curl could be interesting.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Mar 23, 2015

Also, /etc/nsswitch.conf...

@quolix

This comment has been minimized.

Copy link
Author

quolix commented Mar 23, 2015

tcpdump was a good hint. It seems to be an ipv6 Linux problem. All the hints on the net don't help (neither on Ubuntu nor CentOS 7).

This works (dig):
21:52:44.321490 IP (tos 0x0, ttl 64, id 21192, offset 0, flags [none], proto UDP (17), length 118)
    192.168.1.250.46644 > 192.168.1.32.53: [bad udp cksum 0x84de -> 0xed24!] 29300+ [1au] A? _prom-alertmanager._tcp.marathon.mesos.redir.corp.quobyte.com. ar: . OPT UDPsize=4096 (90)
21:52:44.322057 IP (tos 0x0, ttl 64, id 55235, offset 0, flags [DF], proto UDP (17), length 123)
    192.168.1.32.53 > 192.168.1.250.46644: [udp sum ok] 29300* q: A? _prom-alertmanager._tcp.marathon.mesos.redir.corp.quobyte.com. 1/0/0 _prom-alertmanager._tcp.marathon.mesos.redir.corp.quobyte.com. [0s] A 192.168.1.250 (95)

This doesn't (curl, wget)

21:38:54.928992 IP (tos 0x0, ttl 64, id 43913, offset 0, flags [DF], proto UDP (17), length 107)
    192.168.1.250.53247 > 192.168.1.32.53: [bad udp cksum 0x84d3 -> 0xa81f!] 44558+ A? _prom-alertmanager._tcp.marathon.mesos.redir.corp.quobyte.com. (79)
21:38:54.929359 IP (tos 0x0, ttl 64, id 55073, offset 0, flags [DF], proto UDP (17), length 123)
    192.168.1.32.53 > 192.168.1.250.53247: [udp sum ok] 44558* q: A? _prom-alertmanager._tcp.marathon.mesos.redir.corp.quobyte.com. 1/0/0 _prom-alertmanager._tcp.marathon.mesos.redir.corp.quobyte.com. [0s] A 192.168.1.250 (95)
21:38:54.929480 IP (tos 0x0, ttl 64, id 43914, offset 0, flags [DF], proto UDP (17), length 107)
    192.168.1.250.53247 > 192.168.1.32.53: [bad udp cksum 0x84d3 -> 0x8858!] 45781+ AAAA? _prom-alertmanager._tcp.marathon.mesos.redir.corp.quobyte.com. (79)
21:38:54.929875 IP (tos 0x0, ttl 64, id 55074, offset 0, flags [DF], proto UDP (17), length 107)
    192.168.1.32.53 > 192.168.1.250.53247: [udp sum ok] 45781 q: AAAA? _prom-alertmanager._tcp.marathon.mesos.redir.corp.quobyte.com. 0/0/0 (79)
@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Mar 23, 2015

Ok, so it looks like the tools that don't work also try an IPv6 AAAA lookup after the A lookup, but then see the empty AAAA reply and think the record doesn't exist? Looks like the DNS server is doing the right thing though, judging by https://www.ietf.org/rfc/rfc4074.txt, Section 3.

Does curl -4 work?

You could try disabling IPv6 on the machine, but I don't know if that's still a reasonable option nowadays... not sure about the proper fix yet.

@quolix

This comment has been minimized.

Copy link
Author

quolix commented May 4, 2015

curl -4 indeed exhibits the same problem.

I've changed now my setup for hostnames to not contain _, and the DNS lookups work now.

@quolix quolix closed this May 4, 2015

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.