Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: bundled x/net/http2.responseWriter hangs forever #42534

Closed
dtelyukh opened this issue Nov 12, 2020 · 22 comments
Closed

net/http: bundled x/net/http2.responseWriter hangs forever #42534

dtelyukh opened this issue Nov 12, 2020 · 22 comments
Milestone

Comments

@dtelyukh
Copy link

@dtelyukh dtelyukh commented Nov 12, 2020

What version of Go are you using (go version)?

go version go1.15.4 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/dionysius/.cache/go-build"
GOENV="/home/dionysius/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/dionysius/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/dionysius/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build977072123=/tmp/go-build -gno-record-gcc-switches"

What did you do?

So, let's begin

High-Level Problem Description
I'm creating reverse proxy with Caddy Server and my own plugin. I use http/2 and Server Push. Sometimes requests hang forever. Here is screenshot from Chrome DevTools:
scr2020-11-12 15-02-06

Low-Level Problem Description
So, I started to debug this situation. I found that my code execution stuck at (http.responseWriter).Write(), which is an instance of http2responseWriter.
With help of pprof I found that lockup happens in two functions: http2serverConn.writeHeaders and http2serverConn.writeDataFromHandler - endless waiting of data from done channel.
Here is an illustration from pprof:
out02

Next I built go from source with adding some debug messages and start to dive deeper.
I found a problem with frames are sent to output. At this line N frames were pushed: https://github.com/golang/go/blob/go1.15.4/src/net/http/h2_bundle.go#L4692. After push-function scheduleFrameWrite-function is called. I watched into it and found that it often exit here: https://github.com/golang/go/blob/go1.15.4/src/net/http/h2_bundle.go#L4817. And only M (M < N) frames were popped from queue here: https://github.com/golang/go/blob/go1.15.4/src/net/http/h2_bundle.go#L4837

Pushed Frames
2020/11/12 12:12:35 push writeFrame for streamID=0 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=26 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=26 size=15169
2020/11/12 12:12:35 push writeFrame for streamID=28 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=26 size=0
2020/11/12 12:12:35 push writeFrame for streamID=28 size=5566
2020/11/12 12:12:35 push writeFrame for streamID=30 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=28 size=0
2020/11/12 12:12:35 push writeFrame for streamID=30 size=77162
2020/11/12 12:12:35 push writeFrame for streamID=32 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=32 size=354
2020/11/12 12:12:35 push writeFrame for streamID=30 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=34 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=34 size=483
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=38 size=0
2020/11/12 12:12:35 push writeFrame for streamID=36 size=0
2020/11/12 12:12:35 push writeFrame for streamID=38 size=27485
2020/11/12 12:12:35 push writeFrame for streamID=40 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=36 size=6726
2020/11/12 12:12:35 push writeFrame for streamID=40 size=6293
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=42 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=44 size=0
2020/11/12 12:12:35 push writeFrame for streamID=42 size=11249
2020/11/12 12:12:35 push writeFrame for streamID=46 size=0
2020/11/12 12:12:35 push writeFrame for streamID=40 size=0
2020/11/12 12:12:35 push writeFrame for streamID=48 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=44 size=9293
2020/11/12 12:12:35 push writeFrame for streamID=46 size=2626
2020/11/12 12:12:35 push writeFrame for streamID=48 size=346
2020/11/12 12:12:35 push writeFrame for streamID=119 size=6381
2020/11/12 12:12:35 push writeFrame for streamID=42 size=0
2020/11/12 12:12:35 push writeFrame for streamID=44 size=0
2020/11/12 12:12:35 push writeFrame for streamID=36 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=38 size=0
2020/11/12 12:12:36 push writeFrame for streamID=121 size=0
2020/11/12 12:12:36 push writeFrame for streamID=121 size=1021
2020/11/12 12:12:36 push writeFrame for streamID=123 size=0
2020/11/12 12:12:36 push writeFrame for streamID=125 size=0
2020/11/12 12:12:36 push writeFrame for streamID=127 size=0
2020/11/12 12:12:36 push writeFrame for streamID=129 size=0
2020/11/12 12:12:36 push writeFrame for streamID=131 size=0
2020/11/12 12:12:36 push writeFrame for streamID=133 size=0
2020/11/12 12:12:36 push writeFrame for streamID=135 size=0
2020/11/12 12:12:36 push writeFrame for streamID=137 size=0
2020/11/12 12:12:36 push writeFrame for streamID=139 size=0
2020/11/12 12:12:36 push writeFrame for streamID=141 size=0
2020/11/12 12:12:36 push writeFrame for streamID=143 size=0
2020/11/12 12:12:36 push writeFrame for streamID=145 size=0
2020/11/12 12:12:36 push writeFrame for streamID=147 size=0
2020/11/12 12:12:36 push writeFrame for streamID=149 size=0
2020/11/12 12:12:36 push writeFrame for streamID=151 size=0
2020/11/12 12:12:36 push writeFrame for streamID=153 size=0
2020/11/12 12:12:36 push writeFrame for streamID=155 size=0
2020/11/12 12:12:36 push writeFrame for streamID=157 size=0
2020/11/12 12:12:36 push writeFrame for streamID=159 size=0
2020/11/12 12:12:36 push writeFrame for streamID=161 size=0
2020/11/12 12:12:36 push writeFrame for streamID=163 size=0
2020/11/12 12:12:36 push writeFrame for streamID=165 size=0
2020/11/12 12:12:36 push writeFrame for streamID=167 size=0
2020/11/12 12:12:36 push writeFrame for streamID=169 size=0
2020/11/12 12:12:36 push writeFrame for streamID=171 size=0
2020/11/12 12:12:36 push writeFrame for streamID=173 size=0
2020/11/12 12:12:36 push writeFrame for streamID=175 size=0
2020/11/12 12:12:36 push writeFrame for streamID=201 size=0
2020/11/12 12:12:36 push writeFrame for streamID=179 size=0
2020/11/12 12:12:36 push writeFrame for streamID=181 size=0
2020/11/12 12:12:36 push writeFrame for streamID=183 size=0
2020/11/12 12:12:36 push writeFrame for streamID=185 size=0
2020/11/12 12:12:36 push writeFrame for streamID=187 size=0
2020/11/12 12:12:36 push writeFrame for streamID=207 size=0
2020/11/12 12:12:36 push writeFrame for streamID=203 size=0
2020/11/12 12:12:36 push writeFrame for streamID=209 size=0
2020/11/12 12:12:36 push writeFrame for streamID=205 size=0
2020/11/12 12:12:36 push writeFrame for streamID=189 size=0
2020/11/12 12:12:36 push writeFrame for streamID=213 size=0
2020/11/12 12:12:36 push writeFrame for streamID=211 size=0
2020/11/12 12:12:36 push writeFrame for streamID=221 size=0
2020/11/12 12:12:36 push writeFrame for streamID=193 size=0
2020/11/12 12:12:36 push writeFrame for streamID=191 size=0
2020/11/12 12:12:36 push writeFrame for streamID=197 size=0
2020/11/12 12:12:36 push writeFrame for streamID=199 size=0
2020/11/12 12:12:36 push writeFrame for streamID=195 size=0
2020/11/12 12:12:36 push writeFrame for streamID=215 size=0
2020/11/12 12:12:36 push writeFrame for streamID=217 size=0
2020/11/12 12:12:36 push writeFrame for streamID=223 size=0
2020/11/12 12:12:36 push writeFrame for streamID=219 size=0
2020/11/12 12:12:36 push writeFrame for streamID=225 size=0
2020/11/12 12:12:36 push writeFrame for streamID=227 size=0
2020/11/12 12:12:36 push writeFrame for streamID=123 size=3464
2020/11/12 12:12:36 push writeFrame for streamID=225 size=312351
2020/11/12 12:12:36 push writeFrame for streamID=225 size=0
2020/11/12 12:12:36 push writeFrame for streamID=229 size=0
2020/11/12 12:12:36 push writeFrame for streamID=231 size=0
2020/11/12 12:12:37 push writeFrame for streamID=177 size=0
Popped Frames
2020/11/12 12:12:35 sched pop writeFrame for streamID=0 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=26 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=26 size=15169
2020/11/12 12:12:35 sched pop writeFrame for streamID=28 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=26 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=28 size=5566
2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=28 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=16384
2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=16384
2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=16384
2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=16384
2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=11626
2020/11/12 12:12:35 sched pop writeFrame for streamID=32 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=32 size=354
2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=34 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=38 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=34 size=483
2020/11/12 12:12:35 sched pop writeFrame for streamID=36 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=40 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=38 size=16384
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=40 size=6293
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=42 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=44 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=42 size=11249
2020/11/12 12:12:35 sched pop writeFrame for streamID=46 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=48 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=44 size=9293
2020/11/12 12:12:35 sched pop writeFrame for streamID=48 size=346
2020/11/12 12:12:35 sched pop writeFrame for streamID=46 size=2626
2020/11/12 12:12:35 sched pop writeFrame for streamID=36 size=6726
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=6381
2020/11/12 12:12:35 sched pop writeFrame for streamID=38 size=11101
2020/11/12 12:12:35 sched pop writeFrame for streamID=40 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=42 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=44 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=36 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=38 size=0
2020/11/12 12:12:36 sched pop writeFrame for streamID=121 size=0
2020/11/12 12:12:36 sched pop writeFrame for streamID=121 size=1021
2020/11/12 12:12:36 sched pop writeFrame for streamID=123 size=0
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=0
2020/11/12 12:12:36 sched pop writeFrame for streamID=123 size=3464
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=1055
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=0

What did you expect to see?

No lockups.

What did you see instead?

Random lockups.

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Nov 12, 2020

Possible related issue #23559

@networkimprov
Copy link

@networkimprov networkimprov commented Nov 12, 2020

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Nov 13, 2020

@dtelyukh We are going to need something that can reproduce the issue. It would also help to enable http2 debug and a thread dump when it hangs.

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Nov 13, 2020

Your title says the bundled version of http2 has this issue. Are you implying that if you use the latest x/net/http2 you dont?

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Nov 13, 2020

@dtelyukh We are going to need something that can reproduce the issue. It would also help to enable http2 debug and a thread dump when it hangs.

It's not easy to prepare code, which can reproduce this problem with 100% guarantee, but I think I could try.
Here is debug.log for GODEBUG=http2debug=2
http2.debug.log
and goroutines dump with pprof
goroutine.dump.log

Your title says the bundled version of http2 has this issue. Are you implying that if you use the latest x/net/http2 you dont?

h2_bundle.go used by third-party code. I didn't try to use x/net/http2 directly. Do you mean that I should do that?

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Nov 13, 2020

Don't worry. I am going to need something that can reproduce this issue.
I can see why nothing is making progress but I don't know why.
The debug log is incomplete or slightly broken, but from what I do see there is an oddity.

2020/11/13 10:44:00 http2: Framer 0xc000af61c0: read HEADERS flags=END_STREAM|END_HEADERS|PRIORITY stream=1511 len=21
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: wrote PUSH_PROMISE flags=END_HEADERS stream=1511 len=293
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: read PRIORITY stream=314 len=5
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: wrote PUSH_PROMISE flags=END_HEADERS stream=1511 len=33
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: wrote HEADERS flags=END_HEADERS stream=314 len=120
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: wrote PUSH_PROMISE flags=END_HEADERS stream=1511 len=44
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: wrote HEADERS flags=END_HEADERS stream=316 len=115

Notice the stream for the PUSH is 1511 but the above is the first time I see that Framer. And the rest are in the 300s. I don't exactly see how this happened.
There are multiple PUSH_PROMISE all with the same stream id which is also odd.

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Nov 13, 2020

I truncated log-file after each successful request until it was hanged. Maybe that is why the log-file was broken.

I attach here other log-file, which was made when I caught problem from the first time. This log-file was never truncated.
http2.debug.log

@cagedmantis cagedmantis added this to the Backlog milestone Nov 13, 2020
@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Nov 14, 2020

@fraenkel, we prepared test application for problem reproducing. My apologies for so complicated app. We cannot extract some small piece of code, because we don't know where is the problem exactly.
I sent credentials to michael.fraenkel@gmail.com.

  1. Point Chromium-based browser to https://cardonecapital.hc04.dorofeev.me/.
  2. Press F12 and check "Disable cache" box.
  3. Press F5 until request will hangs.

To have more chances to catch the problem it should to remove proxy cache:

  1. Stop Caddy Server kill -SIGTERM <caddy process id>
  2. rm -fR /home/user/caddy-cache
  3. ./caddy run&

To patch or debug server:

  1. Custom cache plugin code is here: /home/user/smart-cache
  2. Caddy Server code is here: /home/user/go/src/github.com/caddyserver/caddy
  3. To rebuild Caddy:
  • cd /home/user/go/src/github.com/caddyserver/caddy/cmd/caddy
  • CGO_ENABLED=0 go build
  • mv ./caddy ~/caddy
  • cd
  • sudo setcap 'cap_net_bind_service=+ep' ./caddy
  • kill -SIGTERM <caddy process id>
  • ./caddy run&
@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Nov 15, 2020

@dtelyukh I did find a way to cause the hang locally, from my machine it would never happen.
for i in {1..1000}; do echo $i; nghttp https://cardonecapital.hc04.dorofeev.me/ -n; done would eventually hang.
Once I attempted to compile a new Go, I could never get caddy to rebuild. I always ended up with
qtls init failure. Attempting to fix that, I didn't realize at the time that the src tree was a bit special so I can no longer make any progress since I cannot download your smart-cache module.
If you could fix the tree, at least next time I know to make a copy of the entire tree before doing anything. I am a bit concerned that a simple go mod tidy prevented any further compilation of caddy.

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Nov 15, 2020

Never mind, I got it working again....

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Nov 15, 2020

So one thing I did verify is that using the latest golang/x/net/http2 code does not cause the hang I see with my simple testcase.

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Nov 15, 2020

@fraenkel, what can I help?

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Nov 15, 2020

You can see the one line change I made to caddyserver with a go mod tidy. See if this new version hangs for you as well.

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Nov 15, 2020

@fraenkel, this new version is never hangs for us. And also we noticed that app become faster. 👍 Thank you!

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Nov 15, 2020

Full page load time (with all resources) is 2% less than with old http/2, and median absolute deviation is 1% less too. So it's both faster, and shows more stable performance.

mholt pushed a commit to caddyserver/caddy that referenced this issue Nov 18, 2020
@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Nov 19, 2020

@fraenkel, should I close this ticket?

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Nov 19, 2020

yes, given there is a solution and this should be fixed in 1.16 although one should verify that is true.

@dtelyukh dtelyukh closed this Nov 19, 2020
mholt added a commit to caddyserver/caddy that referenced this issue Nov 20, 2020
* implement default values for header directive

closes #3804

* remove `set_default` header op and rely on "require" handler instead

This has the following advantages over the previous attempt:

- It does not introduce a new operation for headers, but rather nicely
  extends over an existing feature in the header handler.
- It removes the need to specify the header as "deferred" because it is
  already implicitely deferred by the use of the require handler. This
  should be less confusing to the user.

* add integration test for header directive in caddyfile

* bubble up errors when parsing caddyfile header directive

* don't export unnecessarily and don't canonicalize headers unnecessarily

* fix response headers not passed in blocks

* caddyfile: fix clash when using default header in block

Each header is now set in a separate handler so that it doesn't clash
with other headers set/added/deleted in the same block.

* caddyhttp: New idle_timeout default of 5m

* reverseproxy: fix random hangs on http/2 requests with server push (#3875)

see golang/go#42534

* Refactor and cleanup with improvements

* More specific link

Co-authored-by: Matthew Holt <mholt@users.noreply.github.com>
Co-authored-by: Денис Телюх <telyukh.denis@gmail.com>
@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Dec 3, 2020

@fraenkel, this bug is still exist. But we found more clear way to reproduce it.

An issue in Caddy's repository: caddyserver/caddy#3896

How to reproduce

It depends on the proxied website and caddy config, and some random factors, thus it occurs with different frequency on different hardware. The steps are:

  1. Add this line to /etc/hosts 127.0.0.1 terem-pro.localhost
  2. Clone Caddy repository git clone https://github.com/caddyserver/caddy.git
  3. Build binary from master (a patch with the version from Nov 10, 2020 of x/net is already included) cd caddy/cmd/caddy && go build
  4. sudo setcap 'cap_net_bind_service=+ep' ./caddy
  5. Create Caddyfile with this content
https://terem-pro.localhost {                                                 
    handle {                                                                                                                                                           
        reverse_proxy https://www.terem-pro.ru {
            header_up host {http.reverse_proxy.upstream.host}
        }                                
                                         
        push / {                         
            /local/components/terem/catalog.list/templates/index.best.seller/style.css
            /local/components/terem/new_services.content/templates/home.banner.lots/style.css
            /local/components/terem/slider.blocks/templates/slider.useful/style.css
            /local/components/terem/standard.blocks/templates/call.action.white/style.css
            /local/components/terem/review.list/templates/carousel.home/style.css
            /local/components/terem/standard.blocks/templates/promo.red.home/style.css
            /local/components/terem/promotion.list/templates/home.slider/style.css
            /local/components/terem/form.form/templates/template.pdf/style.css
            /local/templates/terem/components/bitrix/menu/template.header.menu.top/desktop-menu.css
            /local/components/terem/form.form/templates/template.taxi/style.css
            /assets/resources/css/home.css
            /local/templates/terem/components/bitrix/menu/template.header.menu-mobile/style_menu.css
            /local/components/terem/catalog.type.list/templates/.default/style.css
            /assets/resources/css/styles.css
            /bitrix/cache/css/s1/terem/template_ad73b02503569e1113abf0b013fdbb28/template_ad73b02503569e1113abf0b013fdbb28_v1.css?16067202133580
            /bitrix/cache/css/s1/terem/page_074396ca6d41424fe878cb365c109aa1/page_074396ca6d41424fe878cb365c109aa1_v1.css?160672023225970
        }
    }
}
  1. Run Caddy server ./caddy run
  2. Open developer tools network tab, to visually see the hangs; checking the "disable cache" toggle will help reproduce the problem faster, but is not necessary
  3. Navigate to / page in a browser (i.e., https://terem-pro.localhost)
  4. Wait till it fully loads
  5. If it didn't hang on step 8 - hit f5, and again, wait till it fully loads; repeat several times if needed
  6. On our test server, it usually hangs after 2-3 reloads. On some devices, it might require 10-15 attempts but still hangs at some point.
@dtelyukh dtelyukh reopened this Dec 3, 2020
@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Dec 7, 2020

I reproduced a hang but its using the bundle http2 stack.

goroutine 5252 [select, 3 minutes]:
net/http.(*http2serverConn).writeHeaders(0xc000582900, 0xc000feec60, 0xc0003c93b0, 0xa, 0x24a6800)
	/snap/go/6745/src/net/http/h2_bundle.go:5753 +0x172
net/http.(*http2responseWriterState).writeChunk(0xc000be7200, 0xc000fc1000, 0xdf, 0x1000, 0xc000fcfdf0, 0x1, 0xc0009a8180)
	/snap/go/6745/src/net/http/h2_bundle.go:6020 +0x3bf
@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Dec 8, 2020

Write me, please, to fix it should bundle a recent version of http2-library? Or does this library itself need to be fixed?
Is there a some workaround? Should we use http2-library directly?

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Dec 8, 2020

Looks like with this hack it never hangs
http2.ConfigureServer(s, nil)
With it bundled version is not use, doesn't it?

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Dec 9, 2020

The solution is to use explicitly x/net/http2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.