New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: read: connection reset by peer under high load #20960

Open
liranp opened this Issue Jul 9, 2017 · 8 comments

Comments

Projects
None yet
10 participants
@liranp

liranp commented Jul 9, 2017

What version of Go are you using (go version)?

go1.8.3

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOOS="linux"

What did you do?

  1. Edit /etc/sysctl.conf and reload with sysctl -p.
  2. Edit /etc/security/limits.conf.
  3. Execute the following server and client (on different hosts) to simulate multiple HTTP requests:
    Server https://play.golang.org/p/B3UCQ_4mWm
    Client https://play.golang.org/p/_XQRqp8K5b

What did you expect to see?

No errors at all.

What did you see instead?

Errors like the following:

Get http://<addr>:8080: read tcp [::1]:65244->[::1]:8080: read: connection reset by peer
Get http://<addr>:8080: write tcp [::1]:62575->[::1]:8080: write: broken pipe
Get http://<addr>:8080: dial tcp [::1]:8080: getsockopt: connection reset by peer
Get http://<addr>:8080: readLoopPeekFailLocked: read tcp [::1]:51466->[::1]:8080: read: connection reset by peer

@ALTree ALTree added this to the Go1.10 milestone Jul 11, 2017

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

@teejays

This comment has been minimized.

teejays commented Dec 1, 2017

I've been seeing similar issues with a similar set up where the server is on Go 1.6 while the client, which is making many concurrent requests, is on Go 1.9.2. Are we aware of what causes this? Seems like some kind of unintentional in-build rate limiting?

@bradfitz

This comment has been minimized.

Member

bradfitz commented Dec 1, 2017

Go has no built-in rate limiting.

If you get broken network connections, they might be real (a peer really did disconnect), or they might be from hitting kernel limits on one of the sides.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jul 9, 2018

Can anybody reproduce this and investigate?

@bradfitz bradfitz modified the milestones: Go1.11, Unplanned Jul 9, 2018

@bradfitz bradfitz added the help wanted label Jul 9, 2018

@GeorgeErickson

This comment has been minimized.

GeorgeErickson commented Jul 12, 2018

Adding a comment to remind myself but:

I’ve seen similar issues with a should have time to investigate in the next month or so.

@dubbelpunt

This comment has been minimized.

dubbelpunt commented Aug 24, 2018

In my opinion this issue occurs when the response is streamed into a dynamic allocated buffer.

No error is thrown when conn.Read() is send to a buffer with a fixed length.
Error is thrown sporadic when response is read via:
io.Copy(&buf, conn)
ioutil.ReadAll(conn)
conn.Read() streamed until EOF

Notice: even when the error is thrown, the response is buffered correct and complete.

GO Version I used: 1.10.3
OS: FreeBSD 11.2 (64 bit)

@drasko

This comment has been minimized.

drasko commented Sep 17, 2018

I am also seeing this issue on http load testing.

Myabe I am hitting kernel tcp stack limits. @bradfitz any idea how to check if this is kernel issue?

@dxjones

This comment has been minimized.

dxjones commented Nov 21, 2018

I am seeing this error frequently (read: connection reset by peer) when testing an HTTP 1.1 client that is generating heavy load on a server. I have not checked if the problem goes away with HTTP/2. (It could, since requests would be interleaved on a single connection).

I am running the client and server both on the same "localhost", and both are running a managed pool of "worker" goroutines in an attempt to limit the demand for resources.

The client and server each have 350+ goroutines (as reported by runtime.NumGoroutine())and are running as root on a Mac laptop with the max number of open files and sockets maxed out at 8192 (ulimit -n 8192).

I can run the same code on Ubuntu 18.04 LTS with more cores, memory, sockets, etc, and I eventually run into the same error message, although at a higher traffic load.

If there is a kernel limit, it would be nice to know what it is, so we could write our code so that it intentionally stays below that limit, rather than tweaking parameters and hoping it won't be a problem.

I see this topic was active a year ago, ... has there been any progress on this in the past year @bradfitz ?

@mcauto

This comment has been minimized.

mcauto commented Nov 27, 2018

Can I retry the HTTP request in second?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment