Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check_dig: -T soa broken as of version 2.4.3 #738

Open
maggu opened this issue Oct 30, 2023 · 12 comments
Open

check_dig: -T soa broken as of version 2.4.3 #738

maggu opened this issue Oct 30, 2023 · 12 comments
Labels

Comments

@maggu
Copy link

maggu commented Oct 30, 2023

It would appear that check_dig now requires exact matches (PR #652), or the result is a "WARNING".

This completely breaks any monitoring of SOA records. Unless I'm mistaken, there's no way to even use "-T soa" and not get a "WARNING"? That is unless the SOA serial number is included, and it goes without saying that's not really feasible.

% /usr/lib64/nagios/plugins/check_dig -H 8.8.8.8 -T soa -l se. -a "catcher-in-the-rye.nic.se. registry-default.nic.se. 2023102813 1800 1800 864000 7200"
DNS OK - 0,022 seconds response time (se.   10011 IN SOA catcher-in-the-rye.nic.se. registry-default.nic.se. 2023102813 1800 1800 864000 7200)|time=0,022517s;;;0,000000
% /usr/lib64/nagios/plugins/check_dig -H 8.8.8.8 -T soa -l se. -a catcher-in-the-rye.nic.se.
DNS WARNING - 0,022 seconds response time (Server not found in ANSWER SECTION)|time=0,022317s;;;0,000000

This worked fine before 2.4.3. I would appreciate to have an option for partial matches available again.

@sawolf sawolf added the Bug label Nov 10, 2023
@kraken-jim
Copy link
Contributor

kraken-jim commented Nov 13, 2023

Hi, Maggu. IMO, the problem with the old behaviour is that, when for example one is testing any given DNS record for a required specific value:

# host b.resolvers.level3.net
b.resolvers.level3.net has address 4.2.2.2

If partial matches are accepted, then it's impossible for Nagios to check that the A record is correct, because a match string of "4.2.2.2" could match 4.2.2.200 or 204.2.2.29, or any of dozens of things.

What specific DNS functionality are you hoping to monitor/verify with your SOA request? I'm wondering if there is a less verbose resource record, one that is also less likely to frequently change, that might be a better surrogate.

If all you want to check is that an SOA of some type does indeed exist, then your example works for me if I omit the trailing dot on the domain:

/usr/local/libexec/nagios/check_dig -H 8.8.8.8 -T soa -l se
DNS OK - 0.019 seconds response time (se. 30 IN SOA catcher-in-the-rye.nic.se. registry-default.nic.se. 2023111321 1800 1800 864000 7200)|time=0.019092s;;;0.000000

@maggu
Copy link
Author

maggu commented Nov 14, 2023

Hello!

Yes, I completely understand why it was changed and what the change was meant to solve. I'm trying to point out that the change is also breaking other things.

One thing we're monitoring is that the domain delegation is correct. If you have several domains, not all of them might be actively used and a change might not otherwise be noted. For example, if the registration expires or the domain is somehow hijacked. If you're a large organization, at least the first case tends to happen occasionally because of human errors.

I don't believe we're alone in this. I also currently don't see a good alternative. Just verifying that a SOA exists is not enough. NS records are hard to monitor, since there are always multiple ones and you can't know which one you will get back in the reply.

I suppose possible solutions might be to introduce a new flag to modify the behaviour, or to stop comparing if whitespace is encountered. That latter should be enough to solve the A/AAAA issue I guess.

@maggu
Copy link
Author

maggu commented Nov 14, 2023

Hmm. Actually, monitoring NS records seems to work. check_dig handles it better than I thought.

@maggu
Copy link
Author

maggu commented Nov 22, 2023

I've rewritten our SOA checks to NS checks and upgraded from 2.4.0 to 2.4.6 in the test environment. Those now work.

However, there are apparently also issues with PTR records in 2.4.3:

With 2.4.0:

% /usr/lib64/nagios/plugins/check_dig -H 8.8.8.8 -l 28.2.20.149.in-addr.arpa. -T ptr -a victor.isc.org.
DNS OK - 0,034 seconds response time (28.2.20.149.in-addr.arpa. 3591 IN PTR victor.isc.org.)|time=0,034218s;;;0,000000
% /usr/lib64/nagios/plugins/check_dig -H 8.8.8.8 -l 8.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.0.b.6.0.0.0.0.5.0.1.0.0.2.ip6.arpa. -T ptr -a victor.isc.org.
DNS OK - 0,034 seconds response time (8.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.0.b.6.0.0.0.0.5.0.1.0.0.2.ip6.arpa. 3474 IN PTR victor.isc.org.)|time=0,034350s;;;0,000000

With 2.4.6:

% /usr/lib64/nagios/plugins/check_dig -H 8.8.8.8 -l 28.2.20.149.in-addr.arpa. -T ptr -a victor.isc.org.
DNS OK - 0,034 seconds response time (28.2.20.149.in-addr.arpa. 3554 IN PTR victor.isc.org.)|time=0,033959s;;;0,000000
% /usr/lib64/nagios/plugins/check_dig -H 8.8.8.8 -l 8.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.0.b.6.0.0.0.0.5.0.1.0.0.2.ip6.arpa. -T ptr -a victor.isc.org.
DNS WARNING - 0,038 seconds response time (Server not found in ANSWER SECTION)|time=0,037693s;;;0,000000

@maggu maggu changed the title check_dig: -T soa broken as of version 2.4.3 check_dig: -T soa and -T ptr broken as of version 2.4.3 Nov 22, 2023
@kraken-jim
Copy link
Contributor

kraken-jim commented Nov 23, 2023

Just as additional information, even as of 2.4.4:

$ /usr/local/libexec/nagios/check_dig -H 8.8.8.8 -l 8.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.0.b.6.0.0.0.0.5.0.1.0.0.2.ip6.arpa. -T ptr -a victor.isc.org.
DNS OK - 0.018 seconds response time (8.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.0.b.6.0.0.0.0.5.0.1.0.0.2.ip6.arpa. 3191 IN PTR victor.isc.org.)|time=0.018500s;;;0.000000

@maggu
Copy link
Author

maggu commented Nov 23, 2023

I see. So in fact a different bug then. Should I create a new issue?

@kraken-jim
Copy link
Contributor

kraken-jim commented Nov 23, 2023

My hexdump example from earlier seems to have dropped off, but using your IPv6 example, please post the result of running the ANSWER SECTION: of dig through hexdump. This may still be the lack of delimiting tabs in the dig output.

Gee, I asked you this, but I didn't ponder the question myself. I am using drill, ala alias dig='drill' in the login script for users globally. I would imagine putting it in nagios's own login script would work as well. You may also have to install the ldns-utils package if you don't have drill on your system. I have no access to any RH systems, so can only speculate on where to source drill.

Once you have both drill and dig installed, test for spaces vs. tabs:

# dig output shows spaces in ANSWER SECTION:
$ /usr/local/bin/dig @8.8.8.8 8.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.0.b.6.0.0.0.0.5.0.1.0.0.2.ip6.arpa. ptr | grep ' '
; <<>> DiG 9.18.19 <<>> @8.8.8.8 8.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.0.b.6.0.0.0.0.5.0.1.0.0.2.ip6.arpa. ptr
...
;; QUESTION SECTION:
;8.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.0.b.6.0.0.0.0.5.0.1.0.0.2.ip6.arpa. IN PTR
;; ANSWER SECTION:
8.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.0.b.6.0.0.0.0.5.0.1.0.0.2.ip6.arpa. 2751 IN PTR victor.isc.org.
;; Query time: 15 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
...

# drill output has no spaces in ANSWER SECTION:
$ /usr/bin/drill @8.8.8.8 8.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.0.b.6.0.0.0.0.5.0.1.0.0.2.ip6.arpa. ptr | grep ' '
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 44239
;; flags: qr rd ra ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 
;; QUESTION SECTION:
;; 8.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.0.b.6.0.0.0.0.5.0.1.0.0.2.ip6.arpa.    IN      PTR
;; ANSWER SECTION:
;; AUTHORITY SECTION:
;; ADDITIONAL SECTION:
;; Query time: 13 msec
;; SERVER: 8.8.8.8
;; WHEN: Thu Nov 23 12:20:56 2023
;; MSG SIZE  rcvd: 118
# using drill in lieu of dig will provide tab-delimited output:
$ alias dig='drill'
$ dig @8.8.8.8 8.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.0.b.6.0.0.0.0.5.0.1.0.0.2.ip6.arpa. ptr | 
  grep -xA3 ';; ANSWER SECTION:' |
  hexdump -Cv
00000000  3b 3b 20 41 4e 53 57 45  52 20 53 45 43 54 49 4f  |;; ANSWER SECTIO|
00000010  4e 3a 0a 38 2e 32 2e 30  2e 30 2e 30 2e 30 2e 30  |N:.8.2.0.0.0.0.0|
00000020  2e 30 2e 30 2e 30 2e 30  2e 30 2e 30 2e 30 2e 30  |.0.0.0.0.0.0.0.0|
00000030  2e 30 2e 32 2e 30 2e 30  2e 30 2e 62 2e 36 2e 30  |.0.2.0.0.0.b.6.0|
00000040  2e 30 2e 30 2e 30 2e 35  2e 30 2e 31 2e 30 2e 30  |.0.0.0.5.0.1.0.0|
00000050  2e 32 2e 69 70 36 2e 61  72 70 61 2e 09 33 36 30  |.2.ip6.arpa..360|
00000060  30 09 49 4e 09 50 54 52  09 76 69 63 74 6f 72 2e  |0.IN.PTR.victor.|
00000070  69 73 63 2e 6f 72 67 2e  0a 0a 3b 3b 20 41 55 54  |isc.org...;; AUT|
00000080  48 4f 52 49 54 59 20 53  45 43 54 49 4f 4e 3a 0a  |HORITY SECTION:.|
00000090

HTH,

Jim

@maggu
Copy link
Author

maggu commented Nov 24, 2023

I do get spaces with dig and tabs with drill, yes.

% drill @8.8.8.8 8.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.0.b.6.0.0.0.0.5.0.1.0.0.2.ip6.arpa. ptr | grep -xA3 ';; ANSWER SECTION:' | hexdump -Cv
00000000  3b 3b 20 41 4e 53 57 45  52 20 53 45 43 54 49 4f  |;; ANSWER SECTIO|
00000010  4e 3a 0a 38 2e 32 2e 30  2e 30 2e 30 2e 30 2e 30  |N:.8.2.0.0.0.0.0|
00000020  2e 30 2e 30 2e 30 2e 30  2e 30 2e 30 2e 30 2e 30  |.0.0.0.0.0.0.0.0|
00000030  2e 30 2e 32 2e 30 2e 30  2e 30 2e 62 2e 36 2e 30  |.0.2.0.0.0.b.6.0|
00000040  2e 30 2e 30 2e 30 2e 35  2e 30 2e 31 2e 30 2e 30  |.0.0.0.5.0.1.0.0|
00000050  2e 32 2e 69 70 36 2e 61  72 70 61 2e 09 33 36 30  |.2.ip6.arpa..360|
00000060  30 09 49 4e 09 50 54 52  09 76 69 63 74 6f 72 2e  |0.IN.PTR.victor.|
00000070  69 73 63 2e 6f 72 67 2e  0a 0a 3b 3b 20 41 55 54  |isc.org...;; AUT|
00000080  48 4f 52 49 54 59 20 53  45 43 54 49 4f 4e 3a 0a  |HORITY SECTION:.|
00000090

Would you mind sharing the thoughts behind this experiment? I take it then that you are unable to reproduce it with 2.4.6 on your system?

@maggu
Copy link
Author

maggu commented Nov 24, 2023

# mv /usr/bin/dig /usr/bin/dig.backup
# ln -s /usr/bin/drill /usr/bin/dig
# /usr/lib64/nagios/plugins/check_dig -H 8.8.8.8 -l 8.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.0.b.6.0.0.0.0.5.0.1.0.0.2.ip6.arpa. -T ptr -a victor.isc.org.
DNS WARNING - 0,016 seconds response time (Server not found in ANSWER SECTION)|time=0,016466s;;;0,000000
# dig -v
dig version 1.7.1 (ldns version 1.7.1)
Written by NLnet Labs.

Copyright (c) 2004-2008 NLnet Labs.
Licensed under the revised BSD license.
There is NO warranty; not even for MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE.
#

@pemensik
Copy link

My PR #745 seems to fixes this issue also.

@maggu
Copy link
Author

maggu commented Dec 1, 2023

My PR #745 seems to fixes this issue also.

Awesome! Then it looks like all my problems have some solution. If the decision is that SOA monitoring shouldn't work with this plugin in the future, I think the issue can be closed.

@maggu maggu changed the title check_dig: -T soa and -T ptr broken as of version 2.4.3 check_dig: -T soa broken as of version 2.4.3 Dec 1, 2023
@sawolf
Copy link
Member

sawolf commented Dec 6, 2023

I'm realizing now that I never commented on this issue - sorry about that. If you're relying on the plugin to monitor SOA records and that's broken now, then we broke backward compatibility and my intention is for us to fix that in a future release.

I agree with other commenters that the majority of DNS monitoring will probably want an exact match, but the decision we usually make there is to introduce a flag that allows exact matching (see also check_http which uses plain HTTP by default, and requires you to specify that you expect SSL w/ server name indication checking). I took the patch as-is for last release because I didn't realize there was anyone relying on partial-matching capability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants