-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dns: short read lock #619
Comments
@MikeSchroll ac8cd78 almost certainly exposed the error you're seeing, but that error has always existed, it's simply that before ac8cd78 it was ignored. The error is coming from Line 636 in dcdbddd
Line 24 in dcdbddd
n == 0 && err == nil .
I'm not familiar enough to say what the correct behaviour here is, or whether this error is safe to ignore, but a simple (untested) patch to address this issue is:
|
This is I think the main issue here, as I'm also not 100% what is correct; of course #613 is somewhat more theoretical than stuff not working in prod, i.e. rollback? OTOH we should stop on some errors. Add a bunch of other error conditions sounds a bit like whack a mole. So I'm included to rollback ac8cd78 (and do a similar thing the tcp handler) and try again. The main error we want to break on is: socket is closed (i.e. the server one) |
Agree rollback is safest till we work out the path forward. If doing if err != nil {
if strings.Contains(err.Error(), "use of closed network connection") {
return err
}
continue
} Other thoughts:
|
[ Quoting <notifications@github.com> in "Re: [miekg/dns] dns: short read loc..." ]
Agree rollback is safest till we work out the path forward.
Done, and pushed a 1.0.3 tag (and version.go update) for those tracking version
(now that we have them).
If doing `!opErr.Temporary()` is unsafe (for reasons above) and we dont want to risk further unexpected cases, we may want to fallback to the ugly (but effective) version I had originally:
```go
if err != nil {
if strings.Contains(err.Error(), "use of closed network connection") {
return err
}
continue
}
```
Think this makes sense to do, those strings are defacto part of the API, so
there is little chance they will change.
Other thoughts:
1. It is valid for a UDP datagram to have a zero length (though ofc not valid for DNS). As such, `ErrShortRead` represents a temporary failure, and IMHO should implement `net.OpError` to return `.Temporary() == true`, and re-instate the error handling we rolled back.
2. @MikeSchroll does your server code retry after it recieves an error from `ActivateAndServe()` or `ListenAndServe()` ?
Re 2: that was never part of the implicit API, i.e. the error returned would be
on startup, not while running.
/Miek
…--
Miek Gieben
|
If the implicit intention of Depending on the application and environment, listener sockets failing/closing is not uncommon. Consider one of my use cases for this library: A container engine runs a DNS server on a bridge interface IP (specific to each container) for the life of the container, but when the interfaces are brought down, linux cascades the failures to the sockets bound to the interfaces' IPs, which manifests as a Recommend you change the definition of an error returned from |
Following the code from
Apparently this only applies to TCP sockets though. In this case it would seem that the zero return value literally means a zero-byte datagram was received. I would say that the code in #617 was actually correct and that I'll open a pull request with my suggested fix for both #619 (this issue) and #613. |
I'm not sure if I've read this correctly, but are you suggesting we send back a reply to a an empty UDP packet? If this is the case, I strongly disagree. That will make all such servers planetwide vulnerable to being abused as part of a DNS amplification attack (for DDOS), albeit with a low multiplier. This would be a risky code change. |
@twitchyliquid64 That's exactly what I'm suggesting. While I do agree it's a potential for amplification, it's exactly one-byte worse than the existing code (i.e. it's not any worse). If you send a 1-byte invalid datagram, you'll already receive the exact same "FormatError" response. |
@twitchyliquid64 Basically the current code is wrong, |
Agreed. I'll file a separate bug when it's not almost midnight to discuss
DNS amplification.
On Jan 9, 2018 11:39 PM, "Tom Thorogood" <notifications@github.com> wrote:
@twitchyliquid64 <https://github.com/twitchyliquid64> Basically the current
code is wrong, n == 0 *does not* indicate an error. If you think returning
a "FormatError" response to small datagrams is a problem, that is a
separate matter. #621 <#621> simply makes
the code correct.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#619 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AGCRDabFBsyTS0quwnhulalcgLCaSPUeks5tI13_gaJpZM4RXCma>
.
|
@twitchyliquid64 (FYI with a really quick reading of the code, I believe the "FormatError" response should be exactly 12-bytes. I don't know whether that changes your assessment at all). |
A few bytes is unlikely to matter, but best not for me to try and reason
implications rn (lying in bed with the lights off). Tomorrow :)
…On Jan 9, 2018 11:47 PM, "Tom Thorogood" ***@***.***> wrote:
@twitchyliquid64 <https://github.com/twitchyliquid64> (FYI with a *really
quick* reading of the code, I believe the "FormatError" response should
be exactly 12-bytes. I don't know whether that changes your assessment at
all).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#619 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AGCRDe1JyLftWHy-lWLIA3MKvIfT4cwXks5tI1_igaJpZM4RXCma>
.
|
An udp datagram smaller than 12 bytes, is not a correct dns package. The header is a dns packet is 12 bytes and then you still need a question section (domain, type, class), which (I think) the shortest can be 2 +2 +2, is 6 bytes. Not sure what other nameserver do in this case |
CoreDNS, NSD and BIND return FormErr for the case where you only send the header:
|
Seems smaller write (<12) are not responded to. We should probably do the same. |
It would also be good to have some exposure to a prob env for these changes. (I can recompile CoreDNS locally). What about TCP? |
* Do not reutrn ErrShortRead in readUDP A read of zero bytes indicates a peer shutdown for TCP sockets -- and thus returning ErrShortRead is fine in readTCP -- but not for UDP sockets. For UDP sockets a read of zero bytes literally indicates a zero-byte datagram, and is a valid return value not indicating an error. Removing this case will cause readUDP to correctly return a zero-byte message. * Return non-temporary error from serveUDP loop Fixes #613
@miekg #619 (this issue) is definitely caused by the misplaced Merging #621 removes the It would definitely be good to have some sort of test case covering this and #613. |
To mirror what I said on the PR for completeness:
Therefore, we should avoid responding to packets that small, as the PR does. |
This morning, having done a new release yesterday with the latest miekg/dns, we began receiving traffic (even traffic we don't consider authorized, and drop) to a dev instance which caused our server code to lock up, and not process any additional incoming queries.
This is the error message:
level=error msg=“unable to listen” address=“REDACTED:53" error=“dns: short read” log_message=“unable to listen” protocol=udp region=FRA serverID=71302
Initially looking into it, we suspected the commit ac8cd78 tied to issue:
#613
We've noticed that this appears to produce errors and lock only on each IP/port we're listening on; and so for instance one server which listens on 20+ IPs/ports had 5 locked from this error. The remaining continued to function.
We've seen this only on UDP, but that may be a factor of it's where most of our traffic comes.
We've initially confirmed this suspicion by rolling back miekg/dns versions, and have not had the error recur. We do not have a way to replicate the error, but we'd like to open this new issue with hopes that others will have some thoughts on the change which is causing this.
The text was updated successfully, but these errors were encountered: