-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: mass connection spike leads to unpredictable amount of memory usage #35407
Comments
Related #15735 (comment) |
@networkimprov, I don't see the relation? That they both involve network stuff? |
@bradfitz you proposed to eliminate goroutines for idle connections:
|
@szuecs why memory consumption ~= sizeof(http.Request) * sizeof(goroutine) * number(connections), not |
@rootdeep I guess, because I did it wrong, thanks for pointing it out! |
Could this https://pkg.go.dev/golang.org/x/net/netutil#LimitListener be a solution? |
A spike in TCP connections can lead to spike in memory usage in TCP handlers (for example http.ServeHTTP), which leads to unpredictable memory usage. The same happens for UDP servers.
Example projects, that have this behavior:
Known cases when this can happen:
While investigating a problem with connection spikes, that caused an oom kill of our http proxy skipper, I tried to understand the underlying issue.
The problem is caused by unbounded goroutines in the Accept() loop, see last line of the function https://golang.org/src/net/http/server.go#L2895 in line 2927 you create the goroutine.
I could reproduce a DoS kind of situation with minimal Go code running in docker containers.
In production memory spikes up to more than 2Gi. Normal memory usage in the same production setup is less than 100Mi.
Below I will show how to create spikes that are not manageable with unbounded number of goroutines.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
To show the impact I create a test setup:
[attack client] -> [backend]
backend:
Create a docker container:
build:
Start the minimal go backend
Create attack client, that does the connection spike
When the connection refused starts, the backends shows:
Exit code 137 is oom kill.
What did you expect to see?
no oom kill, but http 5xx or connection refused or similar errors
What did you see instead?
oom kill
Possible solution
http.Serve{}
could have a MaxConcurrency option that would limit the number of goroutines that are created. An impementation could be done with a semaphore. it is possible to implement a fix without a breaking change, such that unbounded number of goroutines is the 0 value for the mentioned new option. Another idea would be to set this value automatically via finding the cgroup memory limit for the current process, because the relation should be:The text was updated successfully, but these errors were encountered: