New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional sigsegv with QUIC in 0.11.1 #2346

Closed
vcsjones opened this Issue Nov 14, 2018 · 41 comments

Comments

6 participants
@vcsjones
Copy link

vcsjones commented Nov 14, 2018

1. What version of Caddy are you using (caddy -version)?

0.11.1.

2. What are you trying to do?

Run caddy with -quic.

3. What is your entire Caddyfile?

Omitted for now, but can try to make a sample that reproduces the problem if relevant.

4. How did you run Caddy (give the full command and describe the execution environment)?

/usr/local/bin/caddy -log stdout -quic -agree=true -conf=/opt/caddy/Caddyfile/Caddyfile -root=/var/tmp

lsb_release -a:

Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.5 LTS
Release:	16.04
Codename:	xenial

Caddy starts up fine under this scenario, but Caddy appears to occasionally crash with the backtrace below in the journalctl log. I can't reliably reproduce it, and I'm not sure if the issue is upstream in quic-go or if it is Caddy's use of quic-go (I do not see an issue in quic-go that looks like this backtrace).

I did not have a single crash with months of uptime in Caddy with 0.11.0, but observed this in 0.11.1 a few hours afters updating.

Removing -quic from the startup appears to have resolved the issue.

Nov 14 05:57:51 ip-10-0-0-24 caddy[1296]: panic: runtime error: invalid memory address or nil pointer dereference
Nov 14 05:57:51 ip-10-0-0-24 caddy[1296]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x8b734c]
Nov 14 05:57:51 ip-10-0-0-24 caddy[1296]: goroutine 3829 [running]:
Nov 14 05:57:51 ip-10-0-0-24 caddy[1296]: github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto.(*certChain).getCertForSNI(0xc00000c5c0, 0xc0000315c0, 0x10, 0xc000400ea0, 0xfde33730, 0x28f7167692445711)
Nov 14 05:57:51 ip-10-0-0-24 caddy[1296]: #011/tmp/gopath_11-14-0232.202407298/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto/cert_chain.go:66 +0x6c
Nov 14 05:57:51 ip-10-0-0-24 caddy[1296]: github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto.(*certChain).GetLeafCert(0xc00000c5c0, 0xc0000315c0, 0x10, 0xc000000000, 0xc000031500, 0x10, 0xc000400f90, 0x10)
Nov 14 05:57:51 ip-10-0-0-24 caddy[1296]: #011/tmp/gopath_11-14-0232.202407298/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto/cert_chain.go:51 +0x43
Nov 14 05:57:51 ip-10-0-0-24 caddy[1296]: github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake.(*cryptoSetupServer).handleMessage(0xc0000e2c80, 0xc00025b080, 0x410, 0x518, 0xc000270780, 0x1, 0xc8b180, 0xc00037b1e0)
Nov 14 05:57:51 ip-10-0-0-24 caddy[1296]: #011/tmp/gopath_11-14-0232.202407298/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake/crypto_setup_server.go:162 +0x18f
Nov 14 05:57:51 ip-10-0-0-24 caddy[1296]: github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake.(*cryptoSetupServer).HandleCryptoStream(0xc0000e2c80, 0xbf0858, 0x0)
Nov 14 05:57:51 ip-10-0-0-24 caddy[1296]: #011/tmp/gopath_11-14-0232.202407298/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake/crypto_setup_server.go:119 +0x21a
Nov 14 05:57:51 ip-10-0-0-24 caddy[1296]: github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go.(*session).run.func1(0xc000394200)
Nov 14 05:57:51 ip-10-0-0-24 caddy[1296]: #011/tmp/gopath_11-14-0232.202407298/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/session.go:452 +0x3a
Nov 14 05:57:51 ip-10-0-0-24 caddy[1296]: created by github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go.(*session).run
Nov 14 05:57:51 ip-10-0-0-24 caddy[1296]: #011/tmp/gopath_11-14-0232.202407298/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/session.go:451 +0x8a
@david9991

This comment has been minimized.

Copy link

david9991 commented Nov 26, 2018

Same issue here, 0.11.1 crashing frequently with -quic enabled.

@marten-seemann

This comment has been minimized.

Copy link
Contributor

marten-seemann commented Nov 29, 2018

Hello @vcsjones and @david9991, I looked into this crash, and it seems to come from the fact that quic-go doesn't have a certificate chain:
The crash occurs when accessing the tls.Config in the certChain struct in https://github.com/lucas-clemente/quic-go/blob/gquic/internal/crypto/cert_chain.go#L66, which is constructed right when quic.Listen is called: https://github.com/lucas-clemente/quic-go/blob/gquic/server.go#L119-L126.

Since the tls.Config parameter of Listen is documented to required a non-nil tls.Config, this is probably a problem with how Caddy is using quic-go.

It would therefore be very helpful if you could post your Caddyfile here, maybe we can then trace back what's going wrong here.

@david9991

This comment has been minimized.

Copy link

david9991 commented Nov 29, 2018

My configuration is simple:

example.com:443 {
	tls test@example.com
        root /www
	proxy /ws http://127.0.0.1:8080 {
		header_upstream Host "xxx.com"
	}
}

and start with $ caddy -quic.
Then open it with Chrome, I can see QUIC is enabled correctly.
46525954_10217004521521803_7228765546437672960_n
But after several hours running, the side become inaccessible, and I found Caddy has crashed. It doesn't crash immediately when it's started.

@marten-seemann

This comment has been minimized.

Copy link
Contributor

marten-seemann commented Nov 29, 2018

@david9991 Is this the only host you have configured? Or do you have any other configuration entries, and do they all have a certificate configured?

@david9991

This comment has been minimized.

Copy link

david9991 commented Nov 29, 2018

@marten-seemann Yes, I indeed have some other HTTP only entries with TLS off, and configured as simple as my previous one and serves static only contents. And they also can keep running with QUIC enabled entries for hours before the crash.
Now I deleted all these HTTP only entries, leave the only automatic HTTPS entry, and see if the crash will still happen.

@david9991

This comment has been minimized.

Copy link

david9991 commented Nov 29, 2018

@marten-seemann Crashed with only one entry after an hour:

xxx.com:443 {
	tls david@xxx.com
	root /var/www/html
}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x895b1c]

goroutine 582 [running]:
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto.(*certChain).getCertForSNI(0xc42000e030, 0xc42001c770, 0x10, 0xc4200d1ea0, 0xa4f4f522, 0xfcbae7e689a3560d)
        /root/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto/cert_chain.go:66 +0x6c
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto.(*certChain).GetLeafCert(0xc42000e030, 0xc42001c770, 0x10, 0xc400000000, 0xc42001c700, 0x10, 0xc4200d1f90, 0x10)
        /root/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto/cert_chain.go:51 +0x43
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake.(*cryptoSetupServer).handleMessage(0xc4200feb40, 0xc4203ca580, 0x410, 0x518, 0xc4201fd0e0, 0x1, 0x38, 0xc420236ea0)
        /root/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake/crypto_setup_server.go:162 +0x190
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake.(*cryptoSetupServer).HandleCryptoStream(0xc4200feb40, 0x45bb16, 0xba24de)
        /root/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake/crypto_setup_server.go:119 +0x22b
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go.(*session).run.func1(0xc420334200)
        /root/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/session.go:452 +0x3a
created by github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go.(*session).run
        /root/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/session.go:451 +0xa8
@marten-seemann

This comment has been minimized.

Copy link
Contributor

marten-seemann commented Nov 29, 2018

That and the fact that the crash only occurs after a certain time is interesting.

Maybe @mholt has any idea what could be going on here? quic-go doesn’t clone the tls.Config, maybe Caddy is making any changes to it later that could cause the crash?

@vcsjones

This comment has been minimized.

Copy link

vcsjones commented Nov 29, 2018

My whole caddyfile is here: https://github.com/vcsjones/vcsjones.com/blob/master/_server/Caddyfile

Of note, I haven't configured a certificate manually. I'm letting caddy manage everything with Let's Encrypt / automatic TLS.

The crash was also not immediately for me as well. It started fine, and in a few hours the first crash occurred.

@marten-seemann

This comment has been minimized.

Copy link
Contributor

marten-seemann commented Nov 29, 2018

Ok, I had another look into this. It seems like the only way for this crash to occur is when the tls.Config.GetConfigForClient callback is set and returns a nil in https://github.com/lucas-clemente/quic-go/blob/da8415324bf76b3c2455ebfbb6dd80e41d331493/internal/crypto/cert_chain.go#L106-L113. This is valid according to the documentation:

GetConfigForClient, if not nil, is called after a ClientHello is received from a client. It may return a non-nil Config in order to change the Config that will be used to handle this connection. If the returned Config is nil, the original Config will be used.

I'll prepare a fix in quic-go.

@marten-seemann

This comment has been minimized.

Copy link
Contributor

marten-seemann commented Nov 29, 2018

@vcsjones and @david9991: I just merged lucas-clemente/quic-go#1657, which should hopefully fix this. Could you recompile Caddy with this fix, and see if it works now? If so, I'd then prepare a PR for Caddy to get this fix merged asap.

@vcsjones

This comment has been minimized.

Copy link

vcsjones commented Nov 29, 2018

Yep, need a few hours to get through some work stuff.

@david9991

This comment has been minimized.

Copy link

david9991 commented Nov 30, 2018

Seems Caddy using an older version of quic-go

root@debian:~/go/src/github.com/mholt/caddy# ~/go/bin/govendor fetch github.com/lucas-clemente/quic-go
root@debian:~/go/src/github.com/mholt/caddy# go install github.com/mholt/caddy/caddy
# github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go
vendor/github.com/lucas-clemente/quic-go/crypto_stream.go:15:21: undefined: wire.CryptoFrame
vendor/github.com/lucas-clemente/quic-go/crypto_stream.go:21:38: undefined: wire.CryptoFrame
vendor/github.com/lucas-clemente/quic-go/crypto_stream.go:41:49: undefined: wire.CryptoFrame
vendor/github.com/lucas-clemente/quic-go/crypto_stream.go:101:71: undefined: wire.CryptoFrame
vendor/github.com/lucas-clemente/quic-go/crypto_stream_manager.go:33:56: undefined: wire.CryptoFrame
vendor/github.com/lucas-clemente/quic-go/packet_packer.go:28:19: undefined: wire.ExtendedHeader
vendor/github.com/lucas-clemente/quic-go/receive_stream.go:18:26: undefined: wire.ResetStreamFrame
@david9991

This comment has been minimized.

Copy link

david9991 commented Nov 30, 2018

May be it's my fault, how to upgrade vendor package properly? XD

@marten-seemann

This comment has been minimized.

Copy link
Contributor

marten-seemann commented Nov 30, 2018

Please use the gquic branch.

@david9991

This comment has been minimized.

Copy link

david9991 commented Nov 30, 2018

@marten-seemann New problem, I have upgrade the vendor package, switch to gquic, and recompile Caddy, and browse with Chrome, I found the alt-svc header in response is quic=":443"; ma=2592000; v="101" that I don't think v="101" is supported in Chrome. So Chrome will always use TLS 1.2 instead.

@marten-seemann

This comment has been minimized.

Copy link
Contributor

marten-seemann commented Nov 30, 2018

That doesn’t sound right. You can also just apply the diff of the commit, if vendoring causes too many problems.

@david9991

This comment has been minimized.

Copy link

david9991 commented Nov 30, 2018

Cherry-picked 20719a7 & f95db23, now testing...

@david9991

This comment has been minimized.

Copy link

david9991 commented Nov 30, 2018

After cherry-picked these two commits to Caddy's base instead, it also cannot switch to QUIC mode any more (Chrome see them as BROKEN QUIC connection) and keep using TLS 1.2 instead. Something wrong in the fix, or in Caddy's usage? I am stuck. XD
@vcsjones Can you confirm? or if I did something wrong.

@david9991

This comment has been minimized.

Copy link

david9991 commented Nov 30, 2018

@marten-seemann I did two more tests by revert and re-apply this two commits, and now I can confirm that they stopped Chrome to detect QUIC connection.

@marten-seemann

This comment has been minimized.

Copy link
Contributor

marten-seemann commented Nov 30, 2018

I have no idea what's going on there. These commits don't even touch QUIC versions or how QUIC is advertised.

@david9991

This comment has been minimized.

Copy link

david9991 commented Nov 30, 2018

@marten-seemann Please confirm the patch. diff.txt

@marten-seemann

This comment has been minimized.

Copy link
Contributor

marten-seemann commented Nov 30, 2018

Looks good, although you just need the commit modifying cert_chain.go.

@david9991

This comment has been minimized.

Copy link

david9991 commented Dec 3, 2018

More investigation, I tried to add some log here:

func (c *certChain) getCertForSNI(sni string) (*tls.Certificate, error) {
	log.SetOutput(os.Stdout)
	conf := c.config
	log.Println("TEST 11", sni)
	conf, err := maybeGetConfigForClient(conf, sni)
	if err != nil {
		log.Println("TEST 8", conf, err)
		return nil, err
	}
	...
}

And got:

2018/12/03 11:47:03 TEST 11 gooddomain.com
2018/12/03 11:47:03 TEST 9 &{<nil> <nil> [] map[] 0x1430410 <nil> <nil> <nil> <nil> [h2 http/1.1]  0 <nil> false [...] true false [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] <nil> ... ... [... ...] false 0 <nil> {{0 0} 1} {{0 0} 0 0 0 0} [{[...] [...] [...]}]} ...
2018/12/03 11:47:03 TEST 10 &{[[...] [] <nil>}
2018/12/03 11:47:03 TEST 1 &{[[...]] ... [...] [] <nil>} <nil>
2018/12/03 11:47:03 TEST 11 lst.kjxeyr
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x14bc35b]

goroutine 838 [running]:
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto.(*certChain).getCertForSNI(0xc00000e028, 0xc000300180, 0xa, 0xc000278ea0, 0xc43afca, 0x71c6a8c857f7d0d)
	/Users/David/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto/cert_chain.go:70 +0x27b
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto.(*certChain).GetLeafCert(0xc00000e028, 0xc000300180, 0xa, 0xc000000000, 0xc000300100, 0xa, 0xc000278f90, 0x10)
	/Users/David/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto/cert_chain.go:53 +0x43
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake.(*cryptoSetupServer).handleMessage(0xc00051ea00, 0xc0001ae000, 0x410, 0x51e, 0xc0001d2630, 0x1, 0xc0000c4be8, 0xc00020a101)
	/Users/David/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake/crypto_setup_server.go:162 +0x18f
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake.(*cryptoSetupServer).HandleCryptoStream(0xc00051ea00, 0x17f6530, 0x0)
	/Users/David/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake/crypto_setup_server.go:119 +0x21a
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go.(*session).run.func1(0xc0000ac600)
	/Users/David/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/session.go:452 +0x3a
created by github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go.(*session).run
	/Users/David/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/session.go:451 +0x8a

Which gooddomain.com is the good domain name, but before the crash it became invalid lst.kjxeyr.

Then I added some more log here:

func maybeGetConfigForClient(c *tls.Config, sni string) (*tls.Config, error) {
	log.Println("TEST 12", c, sni)
	log.Println("TEST 14", c.GetConfigForClient, sni)
	if c.GetConfigForClient == nil {
		return c, nil
	}
	log.Println("TEST 13", c.GetConfigForClient)
	return c.GetConfigForClient(&tls.ClientHelloInfo{
		ServerName: sni,
	})
}

Second time I got:

...
2018/12/03 15:06:25 TEST 14 0x1430480 gnwv.asrpkn
2018/12/03 15:06:25 TEST 13 0x1430480
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x14bc35b]

goroutine 1148 [running]:
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto.(*certChain).getCertForSNI(0xc000254000, 0xc000344044, 0xb, 0xc000274ea0, 0x44ecb86a, 0x2c9f5d3552221985)
	/Users/David/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto/cert_chain.go:70 +0x27b
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto.(*certChain).GetLeafCert(0xc000254000, 0xc000344044, 0xb, 0xc000000000, 0xc000344000, 0xb, 0xc000274f90, 0x10)
	/Users/David/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto/cert_chain.go:53 +0x43
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake.(*cryptoSetupServer).handleMessage(0xc000530500, 0xc000223180, 0x410, 0x51d, 0xc0002520c0, 0x1, 0x0, 0x0)
	/Users/David/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake/crypto_setup_server.go:162 +0x18f
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake.(*cryptoSetupServer).HandleCryptoStream(0xc000530500, 0xc0005497d0, 0x1035b59)
	/Users/David/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake/crypto_setup_server.go:119 +0x21a
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go.(*session).run.func1(0xc000358000)
	/Users/David/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/session.go:452 +0x3a
created by github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go.(*session).run
	/Users/David/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/session.go:451 +0x8a

After sni became invalid, it seems crashed in the line:

	return c.GetConfigForClient(&tls.ClientHelloInfo{
		ServerName: sni,
	})

is a crash what will happen if sni is invalid here?

@david9991

This comment has been minimized.

Copy link

david9991 commented Dec 3, 2018

The investigation is based on quic-go v0.10.0. With your fixes, Chrome indeed no longer switch to QUIC mode.

@david9991

This comment has been minimized.

Copy link

david9991 commented Dec 3, 2018

I changed

	return c.GetConfigForClient(&tls.ClientHelloInfo{
		ServerName: sni,
	})

to

	return c.GetConfigForClient(&tls.ClientHelloInfo{
		ServerName: "test.crash",
	})

Guess what, it crashed...

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x14bc14c]

goroutine 82 [running]:
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto.(*certChain).getCertForSNI(0xc0000ba5a8, 0xc0001e0a30, 0xd, 0xc000255ea0, 0xb29b4555, 0x2c576134c8b84cea)
	/Users/David/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto/cert_chain.go:66 +0x6c
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto.(*certChain).GetLeafCert(0xc0000ba5a8, 0xc0001e0a30, 0xd, 0xc00007b620, 0xc0001e0a01, 0xd, 0xc000255f90, 0x10)
	/Users/David/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/crypto/cert_chain.go:51 +0x43
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake.(*cryptoSetupServer).handleMessage(0xc0001148c0, 0xc0002ce000, 0x514, 0x5fc, 0xc000148bd0, 0x1, 0xc000164480, 0xc0001644a0)
	/Users/David/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake/crypto_setup_server.go:162 +0x18f
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake.(*cryptoSetupServer).HandleCryptoStream(0xc0001148c0, 0xc0001733a0, 0xc0001733c0)
	/Users/David/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/internal/handshake/crypto_setup_server.go:119 +0x21a
github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go.(*session).run.func1(0xc0000aca00)
	/Users/David/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/session.go:452 +0x3a
created by github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go.(*session).run
	/Users/David/go/src/github.com/mholt/caddy/vendor/github.com/lucas-clemente/quic-go/session.go:451 +0x8a
@david9991

This comment has been minimized.

Copy link

david9991 commented Dec 3, 2018

First: quic-go shouldn't crash even SNI is undefined in config file. @marten-seemann
Second: We still don't know why SNI became invalid. @francislavoie @marten-seemann
Third: You have to revert your two fixes, and we should do more investigate on why it prevents Chrome to switch to QUIC. @marten-seemann

@marten-seemann

This comment has been minimized.

Copy link
Contributor

marten-seemann commented Dec 3, 2018

Sorry, I don’t know how to help you here. I provided a (possible) fix, so please run the code with the fix. I’m 100% sure that this fix doesn’t prevent Chrome from using QUIC, this has to be a problem in your setup.

Maybe @vcsjones can try applying the fix?

@mholt mholt referenced this issue Dec 4, 2018

Closed

INVALIDARGUMENT #2377

@Henrocker

This comment has been minimized.

Copy link

Henrocker commented Dec 6, 2018

Crafted the patch:

From 20719a7c50c1f97ff0f2272010cda5710794c1d0 Mon Sep 17 00:00:00 2001
From: Marten Seemann <martenseemann@gmail.com>
Date: Thu, 29 Nov 2018 20:33:02 +0700
Subject: [PATCH] use the original tls.Config if tls.Config.GetConfigForClient
 returns nil

---
 internal/crypto/cert_chain.go      | 15 ++++++++++-----
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/internal/crypto/cert_chain.go b/internal/crypto/cert_chain.go
index 0c728fd25..45af2952d 100644
--- a/internal/crypto/cert_chain.go
+++ b/internal/crypto/cert_chain.go
@@ -56,8 +56,7 @@ func (c *certChain) GetLeafCert(sni string) ([]byte, error) {
 }
 
 func (c *certChain) getCertForSNI(sni string) (*tls.Certificate, error) {
-	conf := c.config
-	conf, err := maybeGetConfigForClient(conf, sni)
+	conf, err := maybeGetConfigForClient(c.config, sni)
 	if err != nil {
 		return nil, err
 	}
@@ -107,7 +106,13 @@ func maybeGetConfigForClient(c *tls.Config, sni string) (*tls.Config, error) {
 	if c.GetConfigForClient == nil {
 		return c, nil
 	}
-	return c.GetConfigForClient(&tls.ClientHelloInfo{
-		ServerName: sni,
-	})
+	confForClient, err := c.GetConfigForClient(&tls.ClientHelloInfo{ServerName: sni})
+	if err != nil {
+		return nil, err
+	}
+	// if GetConfigForClient returns nil, use the original config
+	if confForClient == nil {
+		return c, nil
+	}
+	return confForClient, nil
 }

and currently caddy is running with it since half an hour without any panic.
Giving feedback tomorrow when tested for a day.

@david9991

This comment has been minimized.

Copy link

david9991 commented Dec 7, 2018

@Henrocker Could you check from Chrome's developer tools -> security that if the connection type is QUIC not TLS 1.2? And I'll try the patch you attached one more time.

@david9991

This comment has been minimized.

Copy link

david9991 commented Dec 7, 2018

@Henrocker Seems this single file patch works for me, but current gquic branch doesn't. Could you give current gquic branch a try?

@david9991

This comment has been minimized.

Copy link

david9991 commented Dec 7, 2018

With previous single file patched, it works, and then I tried to apply this patch

diff --git a/server.go b/server.go
index d9af6a43..3d370549 100644
--- a/server.go
+++ b/server.go
@@ -123,6 +123,9 @@ func Listen(conn net.PacketConn, tlsConf *tls.Config, config *Config) (Listener,
 }
 
 func listen(conn net.PacketConn, tlsConf *tls.Config, config *Config) (*server, error) {
+	if tlsConf == nil || (len(tlsConf.Certificates) == 0 && tlsConf.GetCertificate == nil) {
+		return nil, errors.New("quic: neither Certificates nor GetCertificate set in tls.Config")
+	}
 	certChain := crypto.NewCertChain(tlsConf)
 	kex, err := crypto.NewCurve25519KEX()
 	if err != nil {

QUIC stop working now.

@david9991

This comment has been minimized.

Copy link

david9991 commented Dec 7, 2018

I have add these debug code at front of the condition:

     log.SetOutput(os.Stdout)
     log.Println(tlsConf, len(tlsConf.Certificates), tlsConf.GetCertificate)

Got this:

2018/12/06 19:44:13 &{<nil> <nil> [] map[] <nil> <nil> 0x809750 <nil> <nil> [h2 http/1.1]  0 <nil> false [] false false [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] <nil> 0 0 [] false 0 <nil> {{0 0} 0} {{0 0} 0 0 0 0} []} 0 <nil>
2018/12/06 19:44:13 serving QUIC connections: quic: neither Certificates nor GetCertificate set in tls.Config

that I believe that's why gquic branch didn't work.

@Henrocker

This comment has been minimized.

Copy link

Henrocker commented Dec 7, 2018

@david9991 Agree! Single file patch works for me, too!

Applied Patch shows connections with QUIC v44 with Canary Chrome. Gonna try gquic branch.

@Henrocker

This comment has been minimized.

Copy link

Henrocker commented Dec 7, 2018

Tested gquic branch... Caddy indeed stops serving over QUIC.

@marten-seemann

This comment has been minimized.

Copy link
Contributor

marten-seemann commented Dec 7, 2018

Thanks @Henrocker and @david9991, this is really helpful. I think I know now what is going on:
I introduced a check in lucas-clemente/quic-go#1654 that quic.Listen is called with a tls.Config that is non-nil and has either tls.Config.Certificates or tls.Config.GetCertificate set: https://github.com/lucas-clemente/quic-go/blob/e0c849af519ad287ab5afe24136e97b421d57eb8/server.go#L125-L128. This is the same check that the standard library TLS implementation uses: https://github.com/golang/go/blob/276870d6e0ff5d22b73feee56e0bad9096f01b22/src/crypto/tls/tls.go#L75-L78.

Now it looks like Caddy is using tls.Config.GetConfigForClient in order to return a tls.Config based on the SNI. It seems like the check in quic.Listen should therefore also regrad a tls.Config as valid if tls.Config.GetConfigForClient is set. Before making that change, I'd like to understand why the standard library doesn't include this check. Any ideas?

@francislavoie

This comment has been minimized.

Copy link
Collaborator

francislavoie commented Dec 7, 2018

@marten-seemann in case you missed it, #1303 is the reason for that change

@marten-seemann

This comment has been minimized.

Copy link
Contributor

marten-seemann commented Dec 7, 2018

I played around with the tls package, and found out the following. Let's assume that I have a valid tls.Config (with certificates configured) conf.

Then it is valid to call

tls.Listen("tcp", "locahost:0", conf)

but not

c := &tls.Config{
	GetConfigForClient: func(*tls.ClientHelloInfo) (*tls.Config, error) {
		return conf, nil
	},
}
tls.Listen("tcp", "locahost:0", c)

The second way is the type of tls.Config that Caddy is using, and quic-go is rejecting that config, analogously to the behavior of the tls package. I think the standard library should accept such a tls.Config, and I opened golang/go#29139 to discuss this.

Depending on the feedback on this issue, I see two different ways forward here:

  • If people agree that this is an issue and should be addressed, quic-go should also accept such a tls.Config.
  • If there are good reasons not to support such a tls.Config, quic-go should continue to behave analogously to the standard library. In that case, Caddy could provide a valid tls.Config by setting additionally setting the GetCertificate callback, e.g. to always return an error.
@francislavoie

This comment has been minimized.

Copy link
Collaborator

francislavoie commented Dec 7, 2018

Disclaimer: I'm no expert here;

Why not both? We could patch Caddy to add that setting for now, and if at some point in the future a fix is merged into tls, then it could be removed.

@dhannyz

This comment has been minimized.

Copy link

dhannyz commented Dec 18, 2018

Playing with chrome v71+ i've found that, at least on linux (i havent checked other OSs yet), by default QUIC is disabled in chrome://flags. So it's worth double checking that it is enabled.

@francislavoie

This comment has been minimized.

Copy link
Collaborator

francislavoie commented Jan 14, 2019

@marten-seemann just to clarify, after the PR has been merged, should this just work fine in Caddy master, or does the latest quic-go need to be pulled? Just wondering what needs to happen to push this issue along (I don't use quic myself).

@marten-seemann

This comment has been minimized.

Copy link
Contributor

marten-seemann commented Jan 15, 2019

We still need to pull in the latest changes in the gquic branch in quic-go. I still have one patch for that branch, which I will merge today or tomorrow. I'll send you a PR then.

@marten-seemann marten-seemann referenced this issue Jan 17, 2019

Merged

update quic-go to v0.10.1 #2431

4 of 4 tasks complete

@mholt mholt closed this in #2431 Jan 17, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment