Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: bundled x/net/http2.responseWriter hangs forever #42534

Open
dtelyukh opened this issue Nov 12, 2020 · 27 comments
Open

net/http: bundled x/net/http2.responseWriter hangs forever #42534

dtelyukh opened this issue Nov 12, 2020 · 27 comments

Comments

@dtelyukh
Copy link

@dtelyukh dtelyukh commented Nov 12, 2020

What version of Go are you using (go version)?

go version go1.15.4 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/dionysius/.cache/go-build"
GOENV="/home/dionysius/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/dionysius/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/dionysius/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build977072123=/tmp/go-build -gno-record-gcc-switches"

What did you do?

So, let's begin

High-Level Problem Description
I'm creating reverse proxy with Caddy Server and my own plugin. I use http/2 and Server Push. Sometimes requests hang forever. Here is screenshot from Chrome DevTools:
scr2020-11-12 15-02-06

Low-Level Problem Description
So, I started to debug this situation. I found that my code execution stuck at (http.responseWriter).Write(), which is an instance of http2responseWriter.
With help of pprof I found that lockup happens in two functions: http2serverConn.writeHeaders and http2serverConn.writeDataFromHandler - endless waiting of data from done channel.
Here is an illustration from pprof:
out02

Next I built go from source with adding some debug messages and start to dive deeper.
I found a problem with frames are sent to output. At this line N frames were pushed: https://github.com/golang/go/blob/go1.15.4/src/net/http/h2_bundle.go#L4692. After push-function scheduleFrameWrite-function is called. I watched into it and found that it often exit here: https://github.com/golang/go/blob/go1.15.4/src/net/http/h2_bundle.go#L4817. And only M (M < N) frames were popped from queue here: https://github.com/golang/go/blob/go1.15.4/src/net/http/h2_bundle.go#L4837

Pushed Frames
2020/11/12 12:12:35 push writeFrame for streamID=0 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=26 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=26 size=15169
2020/11/12 12:12:35 push writeFrame for streamID=28 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=26 size=0
2020/11/12 12:12:35 push writeFrame for streamID=28 size=5566
2020/11/12 12:12:35 push writeFrame for streamID=30 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=28 size=0
2020/11/12 12:12:35 push writeFrame for streamID=30 size=77162
2020/11/12 12:12:35 push writeFrame for streamID=32 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=32 size=354
2020/11/12 12:12:35 push writeFrame for streamID=30 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=34 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=34 size=483
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=38 size=0
2020/11/12 12:12:35 push writeFrame for streamID=36 size=0
2020/11/12 12:12:35 push writeFrame for streamID=38 size=27485
2020/11/12 12:12:35 push writeFrame for streamID=40 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=36 size=6726
2020/11/12 12:12:35 push writeFrame for streamID=40 size=6293
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=42 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=44 size=0
2020/11/12 12:12:35 push writeFrame for streamID=42 size=11249
2020/11/12 12:12:35 push writeFrame for streamID=46 size=0
2020/11/12 12:12:35 push writeFrame for streamID=40 size=0
2020/11/12 12:12:35 push writeFrame for streamID=48 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=44 size=9293
2020/11/12 12:12:35 push writeFrame for streamID=46 size=2626
2020/11/12 12:12:35 push writeFrame for streamID=48 size=346
2020/11/12 12:12:35 push writeFrame for streamID=119 size=6381
2020/11/12 12:12:35 push writeFrame for streamID=42 size=0
2020/11/12 12:12:35 push writeFrame for streamID=44 size=0
2020/11/12 12:12:35 push writeFrame for streamID=36 size=0
2020/11/12 12:12:35 push writeFrame for streamID=119 size=0
2020/11/12 12:12:35 push writeFrame for streamID=38 size=0
2020/11/12 12:12:36 push writeFrame for streamID=121 size=0
2020/11/12 12:12:36 push writeFrame for streamID=121 size=1021
2020/11/12 12:12:36 push writeFrame for streamID=123 size=0
2020/11/12 12:12:36 push writeFrame for streamID=125 size=0
2020/11/12 12:12:36 push writeFrame for streamID=127 size=0
2020/11/12 12:12:36 push writeFrame for streamID=129 size=0
2020/11/12 12:12:36 push writeFrame for streamID=131 size=0
2020/11/12 12:12:36 push writeFrame for streamID=133 size=0
2020/11/12 12:12:36 push writeFrame for streamID=135 size=0
2020/11/12 12:12:36 push writeFrame for streamID=137 size=0
2020/11/12 12:12:36 push writeFrame for streamID=139 size=0
2020/11/12 12:12:36 push writeFrame for streamID=141 size=0
2020/11/12 12:12:36 push writeFrame for streamID=143 size=0
2020/11/12 12:12:36 push writeFrame for streamID=145 size=0
2020/11/12 12:12:36 push writeFrame for streamID=147 size=0
2020/11/12 12:12:36 push writeFrame for streamID=149 size=0
2020/11/12 12:12:36 push writeFrame for streamID=151 size=0
2020/11/12 12:12:36 push writeFrame for streamID=153 size=0
2020/11/12 12:12:36 push writeFrame for streamID=155 size=0
2020/11/12 12:12:36 push writeFrame for streamID=157 size=0
2020/11/12 12:12:36 push writeFrame for streamID=159 size=0
2020/11/12 12:12:36 push writeFrame for streamID=161 size=0
2020/11/12 12:12:36 push writeFrame for streamID=163 size=0
2020/11/12 12:12:36 push writeFrame for streamID=165 size=0
2020/11/12 12:12:36 push writeFrame for streamID=167 size=0
2020/11/12 12:12:36 push writeFrame for streamID=169 size=0
2020/11/12 12:12:36 push writeFrame for streamID=171 size=0
2020/11/12 12:12:36 push writeFrame for streamID=173 size=0
2020/11/12 12:12:36 push writeFrame for streamID=175 size=0
2020/11/12 12:12:36 push writeFrame for streamID=201 size=0
2020/11/12 12:12:36 push writeFrame for streamID=179 size=0
2020/11/12 12:12:36 push writeFrame for streamID=181 size=0
2020/11/12 12:12:36 push writeFrame for streamID=183 size=0
2020/11/12 12:12:36 push writeFrame for streamID=185 size=0
2020/11/12 12:12:36 push writeFrame for streamID=187 size=0
2020/11/12 12:12:36 push writeFrame for streamID=207 size=0
2020/11/12 12:12:36 push writeFrame for streamID=203 size=0
2020/11/12 12:12:36 push writeFrame for streamID=209 size=0
2020/11/12 12:12:36 push writeFrame for streamID=205 size=0
2020/11/12 12:12:36 push writeFrame for streamID=189 size=0
2020/11/12 12:12:36 push writeFrame for streamID=213 size=0
2020/11/12 12:12:36 push writeFrame for streamID=211 size=0
2020/11/12 12:12:36 push writeFrame for streamID=221 size=0
2020/11/12 12:12:36 push writeFrame for streamID=193 size=0
2020/11/12 12:12:36 push writeFrame for streamID=191 size=0
2020/11/12 12:12:36 push writeFrame for streamID=197 size=0
2020/11/12 12:12:36 push writeFrame for streamID=199 size=0
2020/11/12 12:12:36 push writeFrame for streamID=195 size=0
2020/11/12 12:12:36 push writeFrame for streamID=215 size=0
2020/11/12 12:12:36 push writeFrame for streamID=217 size=0
2020/11/12 12:12:36 push writeFrame for streamID=223 size=0
2020/11/12 12:12:36 push writeFrame for streamID=219 size=0
2020/11/12 12:12:36 push writeFrame for streamID=225 size=0
2020/11/12 12:12:36 push writeFrame for streamID=227 size=0
2020/11/12 12:12:36 push writeFrame for streamID=123 size=3464
2020/11/12 12:12:36 push writeFrame for streamID=225 size=312351
2020/11/12 12:12:36 push writeFrame for streamID=225 size=0
2020/11/12 12:12:36 push writeFrame for streamID=229 size=0
2020/11/12 12:12:36 push writeFrame for streamID=231 size=0
2020/11/12 12:12:37 push writeFrame for streamID=177 size=0
Popped Frames
2020/11/12 12:12:35 sched pop writeFrame for streamID=0 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=26 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=26 size=15169
2020/11/12 12:12:35 sched pop writeFrame for streamID=28 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=26 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=28 size=5566
2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=28 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=16384
2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=16384
2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=16384
2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=16384
2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=11626
2020/11/12 12:12:35 sched pop writeFrame for streamID=32 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=32 size=354
2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=34 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=38 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=34 size=483
2020/11/12 12:12:35 sched pop writeFrame for streamID=36 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=40 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=38 size=16384
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=40 size=6293
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=42 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=44 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=42 size=11249
2020/11/12 12:12:35 sched pop writeFrame for streamID=46 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=48 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=44 size=9293
2020/11/12 12:12:35 sched pop writeFrame for streamID=48 size=346
2020/11/12 12:12:35 sched pop writeFrame for streamID=46 size=2626
2020/11/12 12:12:35 sched pop writeFrame for streamID=36 size=6726
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=6381
2020/11/12 12:12:35 sched pop writeFrame for streamID=38 size=11101
2020/11/12 12:12:35 sched pop writeFrame for streamID=40 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=42 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=44 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=36 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0
2020/11/12 12:12:35 sched pop writeFrame for streamID=38 size=0
2020/11/12 12:12:36 sched pop writeFrame for streamID=121 size=0
2020/11/12 12:12:36 sched pop writeFrame for streamID=121 size=1021
2020/11/12 12:12:36 sched pop writeFrame for streamID=123 size=0
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=0
2020/11/12 12:12:36 sched pop writeFrame for streamID=123 size=3464
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=1055
2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=0

What did you expect to see?

No lockups.

What did you see instead?

Random lockups.

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Nov 12, 2020

Possible related issue #23559

@networkimprov
Copy link

@networkimprov networkimprov commented Nov 12, 2020

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Nov 13, 2020

@dtelyukh We are going to need something that can reproduce the issue. It would also help to enable http2 debug and a thread dump when it hangs.

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Nov 13, 2020

Your title says the bundled version of http2 has this issue. Are you implying that if you use the latest x/net/http2 you dont?

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Nov 13, 2020

@dtelyukh We are going to need something that can reproduce the issue. It would also help to enable http2 debug and a thread dump when it hangs.

It's not easy to prepare code, which can reproduce this problem with 100% guarantee, but I think I could try.
Here is debug.log for GODEBUG=http2debug=2
http2.debug.log
and goroutines dump with pprof
goroutine.dump.log

Your title says the bundled version of http2 has this issue. Are you implying that if you use the latest x/net/http2 you dont?

h2_bundle.go used by third-party code. I didn't try to use x/net/http2 directly. Do you mean that I should do that?

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Nov 13, 2020

Don't worry. I am going to need something that can reproduce this issue.
I can see why nothing is making progress but I don't know why.
The debug log is incomplete or slightly broken, but from what I do see there is an oddity.

2020/11/13 10:44:00 http2: Framer 0xc000af61c0: read HEADERS flags=END_STREAM|END_HEADERS|PRIORITY stream=1511 len=21
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: wrote PUSH_PROMISE flags=END_HEADERS stream=1511 len=293
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: read PRIORITY stream=314 len=5
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: wrote PUSH_PROMISE flags=END_HEADERS stream=1511 len=33
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: wrote HEADERS flags=END_HEADERS stream=314 len=120
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: wrote PUSH_PROMISE flags=END_HEADERS stream=1511 len=44
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: wrote HEADERS flags=END_HEADERS stream=316 len=115

Notice the stream for the PUSH is 1511 but the above is the first time I see that Framer. And the rest are in the 300s. I don't exactly see how this happened.
There are multiple PUSH_PROMISE all with the same stream id which is also odd.

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Nov 13, 2020

I truncated log-file after each successful request until it was hanged. Maybe that is why the log-file was broken.

I attach here other log-file, which was made when I caught problem from the first time. This log-file was never truncated.
http2.debug.log

@cagedmantis cagedmantis added this to the Backlog milestone Nov 13, 2020
@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Nov 14, 2020

@fraenkel, we prepared test application for problem reproducing. My apologies for so complicated app. We cannot extract some small piece of code, because we don't know where is the problem exactly.
I sent credentials to michael.fraenkel@gmail.com.

  1. Point Chromium-based browser to https://cardonecapital.hc04.dorofeev.me/.
  2. Press F12 and check "Disable cache" box.
  3. Press F5 until request will hangs.

To have more chances to catch the problem it should to remove proxy cache:

  1. Stop Caddy Server kill -SIGTERM <caddy process id>
  2. rm -fR /home/user/caddy-cache
  3. ./caddy run&

To patch or debug server:

  1. Custom cache plugin code is here: /home/user/smart-cache
  2. Caddy Server code is here: /home/user/go/src/github.com/caddyserver/caddy
  3. To rebuild Caddy:
  • cd /home/user/go/src/github.com/caddyserver/caddy/cmd/caddy
  • CGO_ENABLED=0 go build
  • mv ./caddy ~/caddy
  • cd
  • sudo setcap 'cap_net_bind_service=+ep' ./caddy
  • kill -SIGTERM <caddy process id>
  • ./caddy run&

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Nov 15, 2020

@dtelyukh I did find a way to cause the hang locally, from my machine it would never happen.
for i in {1..1000}; do echo $i; nghttp https://cardonecapital.hc04.dorofeev.me/ -n; done would eventually hang.
Once I attempted to compile a new Go, I could never get caddy to rebuild. I always ended up with
qtls init failure. Attempting to fix that, I didn't realize at the time that the src tree was a bit special so I can no longer make any progress since I cannot download your smart-cache module.
If you could fix the tree, at least next time I know to make a copy of the entire tree before doing anything. I am a bit concerned that a simple go mod tidy prevented any further compilation of caddy.

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Nov 15, 2020

Never mind, I got it working again....

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Nov 15, 2020

So one thing I did verify is that using the latest golang/x/net/http2 code does not cause the hang I see with my simple testcase.

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Nov 15, 2020

@fraenkel, what can I help?

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Nov 15, 2020

You can see the one line change I made to caddyserver with a go mod tidy. See if this new version hangs for you as well.

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Nov 15, 2020

@fraenkel, this new version is never hangs for us. And also we noticed that app become faster. 👍 Thank you!

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Nov 15, 2020

Full page load time (with all resources) is 2% less than with old http/2, and median absolute deviation is 1% less too. So it's both faster, and shows more stable performance.

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Nov 19, 2020

@fraenkel, should I close this ticket?

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Nov 19, 2020

yes, given there is a solution and this should be fixed in 1.16 although one should verify that is true.

@dtelyukh dtelyukh closed this Nov 19, 2020
mholt added a commit to caddyserver/caddy that referenced this issue Nov 20, 2020
* implement default values for header directive

closes #3804

* remove `set_default` header op and rely on "require" handler instead

This has the following advantages over the previous attempt:

- It does not introduce a new operation for headers, but rather nicely
  extends over an existing feature in the header handler.
- It removes the need to specify the header as "deferred" because it is
  already implicitely deferred by the use of the require handler. This
  should be less confusing to the user.

* add integration test for header directive in caddyfile

* bubble up errors when parsing caddyfile header directive

* don't export unnecessarily and don't canonicalize headers unnecessarily

* fix response headers not passed in blocks

* caddyfile: fix clash when using default header in block

Each header is now set in a separate handler so that it doesn't clash
with other headers set/added/deleted in the same block.

* caddyhttp: New idle_timeout default of 5m

* reverseproxy: fix random hangs on http/2 requests with server push (#3875)

see golang/go#42534

* Refactor and cleanup with improvements

* More specific link

Co-authored-by: Matthew Holt <mholt@users.noreply.github.com>
Co-authored-by: Денис Телюх <telyukh.denis@gmail.com>
@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Dec 3, 2020

@fraenkel, this bug is still exist. But we found more clear way to reproduce it.

An issue in Caddy's repository: caddyserver/caddy#3896

How to reproduce

It depends on the proxied website and caddy config, and some random factors, thus it occurs with different frequency on different hardware. The steps are:

  1. Add this line to /etc/hosts 127.0.0.1 terem-pro.localhost
  2. Clone Caddy repository git clone https://github.com/caddyserver/caddy.git
  3. Build binary from master (a patch with the version from Nov 10, 2020 of x/net is already included) cd caddy/cmd/caddy && go build
  4. sudo setcap 'cap_net_bind_service=+ep' ./caddy
  5. Create Caddyfile with this content
https://terem-pro.localhost {                                                 
    handle {                                                                                                                                                           
        reverse_proxy https://www.terem-pro.ru {
            header_up host {http.reverse_proxy.upstream.host}
        }                                
                                         
        push / {                         
            /local/components/terem/catalog.list/templates/index.best.seller/style.css
            /local/components/terem/new_services.content/templates/home.banner.lots/style.css
            /local/components/terem/slider.blocks/templates/slider.useful/style.css
            /local/components/terem/standard.blocks/templates/call.action.white/style.css
            /local/components/terem/review.list/templates/carousel.home/style.css
            /local/components/terem/standard.blocks/templates/promo.red.home/style.css
            /local/components/terem/promotion.list/templates/home.slider/style.css
            /local/components/terem/form.form/templates/template.pdf/style.css
            /local/templates/terem/components/bitrix/menu/template.header.menu.top/desktop-menu.css
            /local/components/terem/form.form/templates/template.taxi/style.css
            /assets/resources/css/home.css
            /local/templates/terem/components/bitrix/menu/template.header.menu-mobile/style_menu.css
            /local/components/terem/catalog.type.list/templates/.default/style.css
            /assets/resources/css/styles.css
            /bitrix/cache/css/s1/terem/template_ad73b02503569e1113abf0b013fdbb28/template_ad73b02503569e1113abf0b013fdbb28_v1.css?16067202133580
            /bitrix/cache/css/s1/terem/page_074396ca6d41424fe878cb365c109aa1/page_074396ca6d41424fe878cb365c109aa1_v1.css?160672023225970
        }
    }
}
  1. Run Caddy server ./caddy run
  2. Open developer tools network tab, to visually see the hangs; checking the "disable cache" toggle will help reproduce the problem faster, but is not necessary
  3. Navigate to / page in a browser (i.e., https://terem-pro.localhost)
  4. Wait till it fully loads
  5. If it didn't hang on step 8 - hit f5, and again, wait till it fully loads; repeat several times if needed
  6. On our test server, it usually hangs after 2-3 reloads. On some devices, it might require 10-15 attempts but still hangs at some point.

@dtelyukh dtelyukh reopened this Dec 3, 2020
@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Dec 7, 2020

I reproduced a hang but its using the bundle http2 stack.

goroutine 5252 [select, 3 minutes]:
net/http.(*http2serverConn).writeHeaders(0xc000582900, 0xc000feec60, 0xc0003c93b0, 0xa, 0x24a6800)
	/snap/go/6745/src/net/http/h2_bundle.go:5753 +0x172
net/http.(*http2responseWriterState).writeChunk(0xc000be7200, 0xc000fc1000, 0xdf, 0x1000, 0xc000fcfdf0, 0x1, 0xc0009a8180)
	/snap/go/6745/src/net/http/h2_bundle.go:6020 +0x3bf

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Dec 8, 2020

Write me, please, to fix it should bundle a recent version of http2-library? Or does this library itself need to be fixed?
Is there a some workaround? Should we use http2-library directly?

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Dec 8, 2020

Looks like with this hack it never hangs
http2.ConfigureServer(s, nil)
With it bundled version is not use, doesn't it?

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Dec 9, 2020

The solution is to use explicitly x/net/http2.

@dtelyukh
Copy link
Author

@dtelyukh dtelyukh commented Feb 8, 2021

The problem is not solved neither by 1.16rc1, nor by 1.15.8. Easy steps for reproducing are here: caddyserver/caddy#3896

WIthout explicit usage of /x/net/http2 http2.ConfigureServer(s, nil) hangs still happen.

@dtelyukh dtelyukh reopened this Feb 8, 2021
harshavardhana added a commit to harshavardhana/minio that referenced this issue Feb 11, 2021
due to lots of issues with x/net/http2, as
well as the hundled h2_bundle.go in the go
runtime should be avoided for now.

golang/go#23559
golang/go#42534
golang/go#43989
golang/go#33425
golang/go#29246

With collection of such issues present, it
make sense to remove HTTP2 support for now
harshavardhana added a commit to harshavardhana/minio that referenced this issue Feb 11, 2021
due to lots of issues with x/net/http2, as
well as the bundled h2_bundle.go in the go
runtime should be avoided for now.

golang/go#23559
golang/go#42534
golang/go#43989
golang/go#33425
golang/go#29246

With collection of such issues present, it
make sense to remove HTTP2 support for now
harshavardhana added a commit to minio/minio that referenced this issue Feb 11, 2021
due to lots of issues with x/net/http2, as
well as the bundled h2_bundle.go in the go
runtime should be avoided for now.

golang/go#23559
golang/go#42534
golang/go#43989
golang/go#33425
golang/go#29246

With collection of such issues present, it
make sense to remove HTTP2 support for now
poornas added a commit to poornas/minio that referenced this issue Feb 16, 2021
* fix: metacache should only rename entries during cleanup (minio#11503)

To avoid large delays in metacache cleanup, use rename
instead of recursive delete calls, renames are cheaper
move the content to minioMetaTmpBucket and then cleanup
this folder once in 24hrs instead.

If the new cache can replace an existing one, we should
let it replace since that is currently being saved anyways,
this avoids pile up of 1000's of metacache entires for
same listing calls that are not necessary to be stored
on disk.

* turn off http2 for TLS setups for now (minio#11523)

due to lots of issues with x/net/http2, as
well as the bundled h2_bundle.go in the go
runtime should be avoided for now.

golang/go#23559
golang/go#42534
golang/go#43989
golang/go#33425
golang/go#29246

With collection of such issues present, it
make sense to remove HTTP2 support for now

* fix: save ModTime properly in disk cache (minio#11522)

fix minio#11414

* fix: osinfos incomplete in case of warnings (minio#11505)

The function used for getting host information
(host.SensorsTemperaturesWithContext) returns warnings in some cases.

Returning with error in such cases means we miss out on the other useful
information already fetched (os info).

If the OS info has been succesfully fetched, it should always be
included in the output irrespective of whether the other data (CPU
sensors, users) could be fetched or not.

* fix: avoid timed value for network calls (minio#11531)

additionally simply timedValue to have RWMutex
to avoid concurrent calls to DiskInfo() getting
serialized, this has an effect on all calls that
use GetDiskInfo() on the same disks.

Such as getOnlineDisks, getOnlineDisksWithoutHealing

* fix: support IAM policy handling for wildcard actions (minio#11530)

This PR fixes

- allow 's3:versionid` as a valid conditional for
  Get,Put,Tags,Object locking APIs
- allow additional headers missing for object APIs
- allow wildcard based action matching

* fix: in MultiDelete API return MalformedXML upon empty input (minio#11532)

To follow S3 spec

* Update yaml files to latest version RELEASE.2021-02-14T04-01-33Z

* fix: multiple pool reads parallelize when possible (minio#11537)

* Add support for remote tier management

With this change MinIO's ILM supports transitioning objects to a remote tier.
This change includes support for Azure Blob Storage, AWS S3 and Google Cloud
Storage as remote tier storage backends.

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Krishna Srinivas <krishna@minio.io>
Co-authored-by: Krishnan Parthasarathi <kp@minio.io>

Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Poorna Krishnamoorthy <poornas@users.noreply.github.com>
Co-authored-by: Shireesh Anjal <355479+anjalshireesh@users.noreply.github.com>
Co-authored-by: Anis Elleuch <vadmeste@users.noreply.github.com>
Co-authored-by: Minio Trusted <trusted@minio.io>
Co-authored-by: Krishna Srinivas <krishna.srinivas@gmail.com>
Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Krishna Srinivas <krishna@minio.io>
harshavardhana added a commit to minio/minio that referenced this issue Feb 17, 2021
due to lots of issues with x/net/http2, as
well as the bundled h2_bundle.go in the go
runtime should be avoided for now.

golang/go#23559
golang/go#42534
golang/go#43989
golang/go#33425
golang/go#29246

With collection of such issues present, it
make sense to remove HTTP2 support for now
kerneltime pushed a commit to kerneltime/minio that referenced this issue Feb 18, 2021
due to lots of issues with x/net/http2, as
well as the bundled h2_bundle.go in the go
runtime should be avoided for now.

golang/go#23559
golang/go#42534
golang/go#43989
golang/go#33425
golang/go#29246

With collection of such issues present, it
make sense to remove HTTP2 support for now
harshavardhana pushed a commit to minio/minio that referenced this issue Feb 18, 2021
due to lots of issues with x/net/http2, as
well as the bundled h2_bundle.go in the go
runtime should be avoided for now.

golang/go#23559
golang/go#42534
golang/go#43989
golang/go#33425
golang/go#29246

With collection of such issues present, it
make sense to remove HTTP2 support for now
@ReillyTevera
Copy link

@ReillyTevera ReillyTevera commented Mar 4, 2021

I was able to reproduce this with the reverse proxy Traefik, and can confirm that directly calling the /x/net/http2 ConfigureServer method fixes this issue. I can also confirm that this is not fixed in the golang 1.16.0 stdlib, @fraenkel any idea when we can expect the fix to make its way there?

@networkimprov
Copy link

@networkimprov networkimprov commented Mar 4, 2021

cc @toothrot @dmitshur re possible release issue

@dmitshur
Copy link
Contributor

@dmitshur dmitshur commented Mar 4, 2021

@geropl
Copy link

@geropl geropl commented May 10, 2021

👋 Any update on this issue? Any idea how and when this might be resolved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants