Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: TestIssue3595 failure with "write: broken pipe" #51104

Open
bcmills opened this issue Feb 9, 2022 · 4 comments
Open

net/http: TestIssue3595 failure with "write: broken pipe" #51104

bcmills opened this issue Feb 9, 2022 · 4 comments
Labels
NeedsInvestigation

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Feb 9, 2022

--- FAIL: TestIssue3595 (0.67s)
    transport_test.go:2031: Post: Post "http://127.0.0.1:12511": write tcp 127.0.0.1:12512->127.0.0.1:12511: write: broken pipe
FAIL
FAIL	net/http

greplogs --dashboard -md -l -e 'FAIL: TestIssue3595 ' --since=2021-01-01

2022-02-08T23:38:05-0b7e586/freebsd-amd64-race

@bcmills
Copy link
Member Author

@bcmills bcmills commented Feb 9, 2022

Just looking at the scenario this is testing, I suspect that it is a real bug in the net/http client (CC @neild).

My hypothesis is that the test is racing between filling up the server's input buffer and receiving its error response.

@bcmills bcmills added the NeedsInvestigation label Feb 9, 2022
@bcmills
Copy link
Member Author

@bcmills bcmills commented Feb 9, 2022

There is another issue here that seems to be closely related.

Running the test locally, I noticed that it always takes a little over 500ms. Most of that time is spent waiting on the sync.WaitGroup that wraps the Serve call on the underlying (*net/http).Server:
https://cs.opensource.google/go/go/+/master:src/net/http/httptest/server.go;l=249;drc=770f1de8c54256d5b17447028e47b201ba8e62c8

That server is itself stuck waiting on the following goroutine:

goroutine 18 [sleep]:
runtime.gopark(0x7367ee60ccff9?, 0x5fa4db?, 0x0?, 0x0?, 0x89cc00?)
        /usr/local/google/home/bcmills/go/src/runtime/proc.go:366 +0xd6 fp=0xc000133ae0 sp=0xc000133ac0 pc=0x439b96
time.Sleep(0x1dcd6500)
        /usr/local/google/home/bcmills/go/src/runtime/time.go:194 +0x12e fp=0xc000133b20 sp=0xc000133ae0 pc=0x46622e
net/http.(*conn).closeWriteAndWait(0xc0000b2000)
        /usr/local/google/home/bcmills/go/src/net/http/server.go:1734 +0x4f fp=0xc000133b58 sp=0xc000133b20 pc=0x6c786f
net/http.(*conn).serve(0xc0000b2000, {0x97f4a0, 0xc0000a00c0})
        /usr/local/google/home/bcmills/go/src/net/http/server.go:1975 +0x79a fp=0xc000133fb8 sp=0xc000133b58 pc=0x6c839a
net/http.(*Server).Serve.func3()
        /usr/local/google/home/bcmills/go/src/net/http/server.go:3071 +0x2e fp=0xc000133fe0 sp=0xc000133fb8 pc=0x6cddae
runtime.goexit()
        /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc000133fe8 sp=0xc000133fe0 pc=0x4693c1
created by net/http.(*Server).Serve
        /usr/local/google/home/bcmills/go/src/net/http/server.go:3071 +0x4db

which was the very delay added for #3595. 🤔

@bcmills
Copy link
Member Author

@bcmills bcmills commented Feb 9, 2022

That delay occurs even if the client closes the request body immediately upon receiving the server's response, which seems like a bug to me (but perhaps a separate one).

@bcmills
Copy link
Member Author

@bcmills bcmills commented Feb 9, 2022

Indeed, I can replicate this failure mode by commenting out the time.Sleep in closeWriteAndWait. The test is marked parallel (presumably precisely because of that extra shutdown latency), and it's plausible that — especially on a race-detector-enabled builder — the scheduling overhead could perhaps push the test over that arbitrary 500ms boundary, or select the wrong branch when both the response and the write error are available.

(The latter scenario was described in #3595 (comment).)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation
Projects
None yet
Development

No branches or pull requests

1 participant