Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSO handling causes data race #4228

Open
Savolro opened this issue Jan 2, 2024 · 5 comments
Open

GSO handling causes data race #4228

Savolro opened this issue Jan 2, 2024 · 5 comments
Labels

Comments

@Savolro
Copy link

Savolro commented Jan 2, 2024

Hi! I would like to report an issue. Key to reproduction is to have GSO not working in the system. This can be simulated with:

ethtool -K eth0 tx-checksumming off

Then running such main.go application:

package main

import (
	"log"
	"net/http"
	"sync"

	"github.com/quic-go/quic-go/http3"
)

func main() {
	rt := &http3.RoundTripper{}
	var wg sync.WaitGroup
	for i := 0; i < 2; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			req, err := http.NewRequest(http.MethodGet, "https://google.com", nil)
			if err != nil {
				log.Fatalln(err)
			}
			_, err = rt.RoundTrip(req)
			if err != nil {
				log.Fatalln(err)
			}
		}()
	}
	wg.Wait()
}

results in such output when executed with -race:

==================
WARNING: DATA RACE
Write at 0x00c00007e298 by goroutine 16:
  github.com/quic-go/quic-go.(*sconn).Write()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/send_conn.go:65 +0x205
  github.com/quic-go/quic-go.(*sendQueue).Run()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/send_queue.go:81 +0x2d0
  github.com/quic-go/quic-go.(*connection).run.func2()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/connection.go:510 +0x4a

Previous read at 0x00c00007e298 by goroutine 14:
  github.com/quic-go/quic-go.(*sconn).capabilities()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/send_conn.go:97 +0x57
  github.com/quic-go/quic-go.(*connection).sendPackets()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/connection.go:1874 +0x6dd
  github.com/quic-go/quic-go.(*connection).triggerSending()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/connection.go:1778 +0x470
  github.com/quic-go/quic-go.(*connection).run()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/connection.go:632 +0x137e
  github.com/quic-go/quic-go.(*client).dial.func1()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/client.go:214 +0x70

Goroutine 16 (running) created at:
  github.com/quic-go/quic-go.(*connection).run()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/connection.go:509 +0x3b9
  github.com/quic-go/quic-go.(*client).dial.func1()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/client.go:214 +0x70

Goroutine 14 (running) created at:
  github.com/quic-go/quic-go.(*client).dial()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/client.go:213 +0xa6a
  github.com/quic-go/quic-go.dial()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/client.go:159 +0x527
  github.com/quic-go/quic-go.(*Transport).dial()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/transport.go:196 +0x3d3
  github.com/quic-go/quic-go.(*Transport).DialEarly()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/transport.go:178 +0x99
  github.com/quic-go/quic-go/http3.(*RoundTripper).getClient.(*RoundTripper).makeDialer.func1()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/http3/roundtrip.go:288 +0xfb
  github.com/quic-go/quic-go/http3.(*client).dial()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/http3/client.go:122 +0x163
  github.com/quic-go/quic-go/http3.(*client).RoundTripOpt.func1()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/http3/client.go:262 +0x91
  sync.(*Once).doSlow()
      /usr/local/go/src/sync/once.go:74 +0xf0
  sync.(*Once).Do()
      /usr/local/go/src/sync/once.go:65 +0x44
  github.com/quic-go/quic-go/http3.(*client).RoundTripOpt()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/http3/client.go:261 +0x2e9
  github.com/quic-go/quic-go/http3.(*RoundTripper).RoundTripOpt()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/http3/roundtrip.go:150 +0xad1
  github.com/quic-go/quic-go/http3.(*RoundTripper).RoundTrip()
      /go/pkg/mod/github.com/quic-go/quic-go@v0.40.1/http3/roundtrip.go:164 +0x184
  main.main.func1()
      /source/main.go:22 +0x16d
==================
Found 1 data race(s)
exit status 66
@Savolro
Copy link
Author

Savolro commented Jan 2, 2024

Actual data race happens when accessing sconn.gotGSOError. Changing its' type to atomic.Bool solves the problem. I suspect wroteFirstPacket field may cause the same issue as it can also be edited by different goroutines simultaniously. But I did not dig deep into more details whether doing that would be a good enough solution. Please let me know if it would and I could create a PR

@marten-seemann
Copy link
Member

Thanks for reporting. Changing the type of gotGSOError to atomic.Bool is probably not the correct solution though, as it just masks the underlying race condition.

@oliverpool
Copy link

oliverpool commented Mar 20, 2024

Affected as well (I initially thought it was a data race like #4303 - however even when disabling 0-RTT, the data race appears sometime).

as it just masks the underlying race condition

I have run it a couple of times with a simple test of mine and apparently the GSOError does not appear on the first write (in the following example it appears on the 7th):

> 1x: gotGSOError false (send_conn.go:97)
> 6x: writePacket error nil (send_conn.go:63)
> 2x: gotGSOError false (send_conn.go:97)
> 1x: gotGSOError false (send_conn.go:97)
> 1x: writePacket error(*net.OpError)  (send_conn.go:63) <- this is a GSOError

With such an execution trace, I see two possible solutions:

  • make c.gotGSOError atomic or mutex protected
  • adjust client.go:467 connState := conn.ConnectionState().TLS since this triggers the data race, but does not care about the GSO capability (it access the variable within ConnectionState(), but only takes the TLS field). For instance the ConnectionState method could be split:
type Connection interface {
	// ...

	// ConnectionState returns information about the TLS connection state
	ConnectionState() tls.ConnectionState
	// QuicState returns basic details about the QUIC connection.
	// Warning: This API should not be considered stable and might change soon.
	QuicState() QuicState

	// ...
}   

I would be willing to draft a PR to address this.

@marten-seemann
Copy link
Member

as it just masks the underlying race condition

I have run it a couple of times with a simple test of mine and apparently the GSOError does not appear on the first write (in the following example it appears on the 7th):

Interesting! What system are you running on?

  • adjust client.go:467 connState := conn.ConnectionState().TLS since this triggers the data race, but does not care about the GSO capability (it access the variable within ConnectionState(), but only takes the TLS field). For instance the ConnectionState method could be split:

This wouldn't solve the race, it would just hide it, right?

@oliverpool
Copy link

What system are you running on?

Arch Linux.

Investigating a bit more, the GSOError happens at the first GSO packet (which is the 7th). The first 6 packets have gsoSize set to 0 (during which handshakeConfirmed is false).

adjust client.go:467 connState := conn.ConnectionState().TLS since this triggers the data race, but does not care about the GSO capability (it access the variable within ConnectionState(), but only takes the TLS field). For instance the ConnectionState method could be split:

This wouldn't solve the race, it would just hide it, right?

I think this would solve the race because it happens during the dial phase, before the connection is returned to the caller.

So nothing outside of the library should be able to call ConnectionState before the GSOError value is settled (except if you want to handle GSO being disabled during a connection, in which case I don't see any solution without a mutex).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants