Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: TestIssue3595 failure with "write: broken pipe" #51104

Closed
bcmills opened this issue Feb 9, 2022 · 5 comments
Closed

net/http: TestIssue3595 failure with "write: broken pipe" #51104

bcmills opened this issue Feb 9, 2022 · 5 comments
Assignees
Labels
NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Feb 9, 2022

#!watchflakes
post <- pkg == "net/http" && test == "TestIssue3595" && `write: broken pipe`
--- FAIL: TestIssue3595 (0.67s)
    transport_test.go:2031: Post: Post "http://127.0.0.1:12511": write tcp 127.0.0.1:12512->127.0.0.1:12511: write: broken pipe
FAIL
FAIL	net/http

greplogs --dashboard -md -l -e 'FAIL: TestIssue3595 ' --since=2021-01-01

2022-02-08T23:38:05-0b7e586/freebsd-amd64-race

@bcmills
Copy link
Contributor Author

bcmills commented Feb 9, 2022

Just looking at the scenario this is testing, I suspect that it is a real bug in the net/http client (CC @neild).

My hypothesis is that the test is racing between filling up the server's input buffer and receiving its error response.

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 9, 2022
@bcmills
Copy link
Contributor Author

bcmills commented Feb 9, 2022

There is another issue here that seems to be closely related.

Running the test locally, I noticed that it always takes a little over 500ms. Most of that time is spent waiting on the sync.WaitGroup that wraps the Serve call on the underlying (*net/http).Server:
https://cs.opensource.google/go/go/+/master:src/net/http/httptest/server.go;l=249;drc=770f1de8c54256d5b17447028e47b201ba8e62c8

That server is itself stuck waiting on the following goroutine:

goroutine 18 [sleep]:
runtime.gopark(0x7367ee60ccff9?, 0x5fa4db?, 0x0?, 0x0?, 0x89cc00?)
        /usr/local/google/home/bcmills/go/src/runtime/proc.go:366 +0xd6 fp=0xc000133ae0 sp=0xc000133ac0 pc=0x439b96
time.Sleep(0x1dcd6500)
        /usr/local/google/home/bcmills/go/src/runtime/time.go:194 +0x12e fp=0xc000133b20 sp=0xc000133ae0 pc=0x46622e
net/http.(*conn).closeWriteAndWait(0xc0000b2000)
        /usr/local/google/home/bcmills/go/src/net/http/server.go:1734 +0x4f fp=0xc000133b58 sp=0xc000133b20 pc=0x6c786f
net/http.(*conn).serve(0xc0000b2000, {0x97f4a0, 0xc0000a00c0})
        /usr/local/google/home/bcmills/go/src/net/http/server.go:1975 +0x79a fp=0xc000133fb8 sp=0xc000133b58 pc=0x6c839a
net/http.(*Server).Serve.func3()
        /usr/local/google/home/bcmills/go/src/net/http/server.go:3071 +0x2e fp=0xc000133fe0 sp=0xc000133fb8 pc=0x6cddae
runtime.goexit()
        /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc000133fe8 sp=0xc000133fe0 pc=0x4693c1
created by net/http.(*Server).Serve
        /usr/local/google/home/bcmills/go/src/net/http/server.go:3071 +0x4db

which was the very delay added for #3595. 🤔

@bcmills
Copy link
Contributor Author

bcmills commented Feb 9, 2022

That delay occurs even if the client closes the request body immediately upon receiving the server's response, which seems like a bug to me (but perhaps a separate one).

@bcmills
Copy link
Contributor Author

bcmills commented Feb 9, 2022

Indeed, I can replicate this failure mode by commenting out the time.Sleep in closeWriteAndWait. The test is marked parallel (presumably precisely because of that extra shutdown latency), and it's plausible that — especially on a race-detector-enabled builder — the scheduling overhead could perhaps push the test over that arbitrary 500ms boundary, or select the wrong branch when both the response and the write error are available.

(The latter scenario was described in #3595 (comment).)

@gopherbot
Copy link

Change https://go.dev/cl/527196 mentions this issue: net/http: scale rstAvoidanceDelay to reduce test flakiness

@bcmills bcmills added the Testing An issue that has been verified to require only test changes, not just a test failure. label Sep 11, 2023
@bcmills bcmills modified the milestones: Backlog, Go1.22 Sep 11, 2023
@bcmills bcmills added the NeedsFix The path to resolution is known, but the work has not been done. label Sep 11, 2023
@gopherbot gopherbot removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
Status: Done
Development

No branches or pull requests

3 participants