-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
too many goroutines #997
Comments
@miekg I think this is something we should fix. I'm thinking a simple opt-in/opt-out semaphore w/ optional timeout. We can just use a Something like this perhaps: type Server struct {
// Controls the maximum number of UDP queries to process concurrently. If the value is
// negative/non-zero, no limit is applied. Once the limit is reached, the server will block or
// reject the query depending on UDPQueryTimeout.
MaxConcurrentUDPQueries int
// If this timeout is positive, the server will reject the query if it can't handle the request
// within the timeout.
UDPQueryTimeout time.Duration
} This approach isn't perfect and has problems around choosing the correct value, but I think we'll keep seeing these problems otherwise. I'm not sure whether it makes sense to try and respond with an error or just drop the query on the floor. @miekg Thoughts? |
[ Quoting <notifications@github.com> in "Re: [miekg/dns] too many goroutines..." ]
I'm not sure whether it makes sense to try and respond with an error or just drop the query on the floor.
@miekg Thoughts?
Indiscriminately dropping on the ground doesn't solve it. This may actually
trigger more queries because of bad retry behavior of most resolvers (which we
don't control).
What would be better is do check the original arrival time of the query and drop
it if it's still around and older than 5s. This is what the cancel plugin in
CoreDNS now does.
|
@miekg I’m not sure it’s possible to do that though. Because we read packets in a single loop before dispatching, we’ll only ever have one blocking while they’re buffered by the kernel. |
Another thing that came to mind is that you can't do this after the server loop.
In an ill-fated attempt to do this in coredns, it just dropped after it saw N
concurrent goroutines - i.e. there is no core support neeed to drop packets.
Don't hang on to you go-routines, or at least calculate how many you can handle,
setup in internal buffer, work with context to drop older ones.
This is not thing we need to implement here IMO
|
In coredns we can (not enabled) throw away everything older than 5 seconds, because of client timeouts, if we can get a hold of the packet's timestamp we can implement that here |
@tmthrgd we good bring back the workers? I think this was ripped out a bit hastily, although that code could be massively improved. |
Actually correctly reading your proposal; so what's that value going to be and can it be dynamically determined? If not - I've seen this in Google - you're just passing the problem down to the operator (SRE in Google), and they are left with the same question. To add to the difficulty we compile for various platforms and cpu archs |
The workers introduced bugs so I’d rather not just bring them back without thought. That’s exactly what my earlier objections were about. If the value isn’t obvious or computable, you just shift the problem elsewhere. I’m not sure what the right approach is. Just thinking aloud, perhaps we could add some sort of |
That still pushed the problem downwards - I rather do something sensible here. I'm also load testing (localhost <-> localhost so slight grain of salt) coredns, with the backend being served with the erratic plugin (which can introduces delays and drops, sofar only tested with delays). Doing this with |
Personally, as I've run into some of these problems here myself, I would love an option to essentially use my callback instead of |
this might be a way forward, but the original issue in CoreDNS, or the above issue as initially posted hasn't been root caused. If you don't close out (old) go-routines, you'll end up with a lot of them. |
an insightful comment on the coredns issue: coredns/coredns#2593 (comment) In general getting rid of your goroutines is what you want to do; usually this is fine, except when your backend is so slow that you can't. There is two ways out here:
For (2) even with worker-pools we got in the bad place (this might be the impl. we had at the time, but I'm not too sure about that). Even if you think you've done (2) you still want (1). So I think focussing on (1) makes sense here. WDYT @tmthrgd ? |
@miekg for 1. you would need to respond in the same goroutine you got the connection, so before https://github.com/miekg/dns/blob/master/server.go#L434 and also before https://github.com/miekg/dns/blob/master/server.go#L479 Detection and prevention needs to know the payload size parsed in the request, number of goroutines, goroutine size and the available memory for the process (cgroup v1 /sys/fs/cgroup/memory/memory.limit_in_bytes). |
@miekg I would be very open to trying 1) if someone has a solid idea about how to do that. |
@szuecs getting the current memory allocated to a process is not portable (sadly). We could potentially make the knob you need to tweak; i.e. I have 2 GB, please figure out how many things I can do with that, and SERVFAIL if I hit it. @tmthrgd memory usage seems to be the overarching thing we can something sensible about. We could start with the dumb thing of Followup question: should this be a core "feature" or left to the application? I.e. even in the case for coredns, you don't want to make a cache plugin suffer from a slow backend used in the forward plugin. |
One of the more interesting bits we could do here, is slow down; i.e. intentionally start sleeping in the loop that accepts packets once we detect that we're are going to breach some limit in the next second or so. I think this pushes out the queue of waiting packets to the network interface where kernel level limits kick in, meaning eventually you'll reach a state where you (hopefully) send back an icmp unreachable mesg. |
I'm thinking along these lines:
|
Slightly better, I think:
I'll factor this out a user callable function. Not sure what signature that should have or how it should be named. Also need to test it somehow |
If possible, can you also make configurable and/or opt-in/out? (ideally give the tools to build |
[ Quoting <notifications@github.com> in "Re: [miekg/dns] too many goroutines..." ]
If possible, can you also make configurable and/or opt-in/out? (ideally give the tools to build `serveUDP` ourselves to call `serveUDPPacket` in our own ways, but I understand that's unfortunately not a focus)
If we agree this is a sensible way forward, it will be a user defined function
with a default impl. Just like the AcceptFunc. But I'm also afraid of capping
performance unnecessary .
|
you can add a limitreader or something, or keep track of concurrency yourself. After lots of back and forth and I don't think we should provide something out of the box. |
Hi,
i'm use this in windows10. When I use 200 threads to nslookup and send UDP package to dns server, it finds that there are too many goroutines and occupying large memory of heaps.
So I review the code, and find that there is no limits when proccess UDP packages in server.go. I don't sure if it is all right. May anybody help me?
my code:
dns code:
The text was updated successfully, but these errors were encountered: