-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sampler data race fix #4004
sampler data race fix #4004
Conversation
executor/resources/sampler.go
Outdated
@@ -34,6 +35,9 @@ func (s *Sub[T]) Close(captureLast bool) ([]T, error) { | |||
delete(s.sampler.subs, s) | |||
s.sampler.mu.Unlock() | |||
|
|||
s.mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure I understand this race. In line 35 the sub is already deleted from the sampler (with a lock) so it does not appear in range s.subs
anymore.
I do get that sub may be closed when range active
is running. Maybe after callback()
should take s.mu.RLock()
again and check if any of the active
were removed from subs
and skip them if they were.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here's the data race warning:
WARNING: DATA RACE
Read at 0x00c0003b72b8 by goroutine 317:
github.com/moby/buildkit/executor/resources.(*Sub[...]).Close()
/src/executor/resources/sampler.go:37 +0x124
github.com/moby/buildkit/solver/llbsolver.(*ProvenanceCreator).Predicate()
/src/solver/llbsolver/provenance.go:520 +0x20c
github.com/moby/buildkit/solver/llbsolver.(*Solver).recordBuildHistory.func1.1()
/src/solver/llbsolver/solver.go:216 +0x118
github.com/moby/buildkit/solver/llbsolver.(*Solver).recordBuildHistory.func1.2()
/src/solver/llbsolver/solver.go:254 +0x108
golang.org/x/sync/errgroup.(*Group).Go.func1()
/src/vendor/golang.org/x/sync/errgroup/errgroup.go:75 +0x82
Previous write at 0x00c0003b72b8 by goroutine 68:
github.com/moby/buildkit/executor/resources.(*Sampler[...]).run()
/src/executor/resources/sampler.go:117 +0x7aa
github.com/moby/buildkit/executor/resources.(*Sampler[...]).Record.func1()
/src/executor/resources/sampler.go:82 +0x47
Goroutine 317 (running) created at:
golang.org/x/sync/errgroup.(*Group).Go()
/src/vendor/golang.org/x/sync/errgroup/errgroup.go:72 +0x12e
github.com/moby/buildkit/solver/llbsolver.(*Solver).recordBuildHistory.func1()
/src/solver/llbsolver/solver.go:253 +0xabd
github.com/moby/buildkit/solver/llbsolver.(*Solver).Solve.func2()
/src/solver/llbsolver/solver.go:474 +0xaf
runtime.deferreturn()
/usr/local/go/src/runtime/panic.go:476 +0x32
github.com/moby/buildkit/control.(*Controller).Solve()
/src/control/control.go:437 +0x1bfa
github.com/moby/buildkit/api/services/control._Control_Solve_Handler.func1()
/src/api/services/control/control.pb.go:2438 +0x88
github.com/moby/buildkit/util/grpcerrors.UnaryServerInterceptor()
/src/util/grpcerrors/intercept.go:14 +0x7c
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x8e
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1()
/src/vendor/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc/interceptor.go:342 +0x6fa
main.unaryInterceptor.func1()
/src/cmd/buildkitd/main.go:603 +0x27c
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x8e
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1()
/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34 +0x126
github.com/moby/buildkit/api/services/control._Control_Solve_Handler()
/src/api/services/control/control.pb.go:2440 +0x1dd
google.golang.org/grpc.(*Server).processUnaryRPC()
/src/vendor/google.golang.org/grpc/server.go:1336 +0x177a
google.golang.org/grpc.(*Server).handleStream()
/src/vendor/google.golang.org/grpc/server.go:1704 +0xff8
google.golang.org/grpc.(*Server).serveStreams.func1.2()
/src/vendor/google.golang.org/grpc/server.go:965 +0xec
Goroutine 68 (running) created at:
github.com/moby/buildkit/executor/resources.(*Sampler[...]).Record()
/src/executor/resources/sampler.go:82 +0x2db
github.com/moby/buildkit/solver/llbsolver.(*Solver).Solve()
/src/solver/llbsolver/solver.go:419 +0x164
github.com/moby/buildkit/control.(*Controller).Solve()
/src/control/control.go:437 +0x1bfa
github.com/moby/buildkit/api/services/control._Control_Solve_Handler.func1()
/src/api/services/control/control.pb.go:2438 +0x88
github.com/moby/buildkit/util/grpcerrors.UnaryServerInterceptor()
/src/util/grpcerrors/intercept.go:14 +0x7c
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x8e
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1()
/src/vendor/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc/interceptor.go:342 +0x6fa
main.unaryInterceptor.func1()
/src/cmd/buildkitd/main.go:603 +0x27c
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x8e
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1()
/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34 +0x126
github.com/moby/buildkit/api/services/control._Control_Solve_Handler()
/src/api/services/control/control.pb.go:2440 +0x1dd
google.golang.org/grpc.(*Server).processUnaryRPC()
/src/vendor/google.golang.org/grpc/server.go:1336 +0x177a
google.golang.org/grpc.(*Server).handleStream()
/src/vendor/google.golang.org/grpc/server.go:1704 +0xff8
google.golang.org/grpc.(*Server).serveStreams.func1.2()
/src/vendor/google.golang.org/grpc/server.go:965 +0xec
edit: this was produced by running CGO_ENABLED=1 GOBUILDFLAGS="-race" TESTFLAGS="-v -test.run=TestClientGatewayIntegration/TestClientGatewaySolve" TESTPKGS=./client ./hack/test integration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe after callback() should take s.mu.RLock() again and check if any of the active where removed from subs and skip them if they were.
I like this idea; I updated the code to have the subs all use the same sampler lock, which simplifies things.
The race I was getting warnings for (see above) was that ss.err
is being set under run()
, but also read by Close
when doing if s.err != nil {
.
Protect against data race occuring while both sampler.Run and sub.Close accesses the same data from multiple threads. Signed-off-by: Alex Couture-Beil <alex@earthly.dev>
54a203f
to
364340b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the s.sampler.mu.Unlock()
is now in defer. If it is already removed from subs then that should take care of it not showing up in range
cycles in run()
anymore.
Signed-off-by: Alex Couture-Beil <alex@earthly.dev>
364340b
to
86a740f
Compare
Protect against data race occuring while both sampler.Run and sub.Close accesses the same data from multiple threads.