Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sampler data race fix #4004

Merged
merged 2 commits into from
Jul 10, 2023
Merged

Conversation

alexcb
Copy link
Collaborator

@alexcb alexcb commented Jul 7, 2023

Protect against data race occuring while both sampler.Run and sub.Close accesses the same data from multiple threads.

@alexcb alexcb marked this pull request as ready for review July 8, 2023 01:03
executor/resources/sampler.go Show resolved Hide resolved
@@ -34,6 +35,9 @@ func (s *Sub[T]) Close(captureLast bool) ([]T, error) {
delete(s.sampler.subs, s)
s.sampler.mu.Unlock()

s.mu.Lock()
Copy link
Member

@tonistiigi tonistiigi Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure I understand this race. In line 35 the sub is already deleted from the sampler (with a lock) so it does not appear in range s.subs anymore.

I do get that sub may be closed when range active is running. Maybe after callback() should take s.mu.RLock() again and check if any of the active were removed from subs and skip them if they were.

Copy link
Collaborator Author

@alexcb alexcb Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's the data race warning:

WARNING: DATA RACE
Read at 0x00c0003b72b8 by goroutine 317:
  github.com/moby/buildkit/executor/resources.(*Sub[...]).Close()
      /src/executor/resources/sampler.go:37 +0x124
  github.com/moby/buildkit/solver/llbsolver.(*ProvenanceCreator).Predicate()
      /src/solver/llbsolver/provenance.go:520 +0x20c
  github.com/moby/buildkit/solver/llbsolver.(*Solver).recordBuildHistory.func1.1()
      /src/solver/llbsolver/solver.go:216 +0x118
  github.com/moby/buildkit/solver/llbsolver.(*Solver).recordBuildHistory.func1.2()
      /src/solver/llbsolver/solver.go:254 +0x108
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /src/vendor/golang.org/x/sync/errgroup/errgroup.go:75 +0x82

Previous write at 0x00c0003b72b8 by goroutine 68:
  github.com/moby/buildkit/executor/resources.(*Sampler[...]).run()
      /src/executor/resources/sampler.go:117 +0x7aa
  github.com/moby/buildkit/executor/resources.(*Sampler[...]).Record.func1()
      /src/executor/resources/sampler.go:82 +0x47

Goroutine 317 (running) created at:
  golang.org/x/sync/errgroup.(*Group).Go()
      /src/vendor/golang.org/x/sync/errgroup/errgroup.go:72 +0x12e
  github.com/moby/buildkit/solver/llbsolver.(*Solver).recordBuildHistory.func1()
      /src/solver/llbsolver/solver.go:253 +0xabd
  github.com/moby/buildkit/solver/llbsolver.(*Solver).Solve.func2()
      /src/solver/llbsolver/solver.go:474 +0xaf
  runtime.deferreturn()
      /usr/local/go/src/runtime/panic.go:476 +0x32
  github.com/moby/buildkit/control.(*Controller).Solve()
      /src/control/control.go:437 +0x1bfa
  github.com/moby/buildkit/api/services/control._Control_Solve_Handler.func1()
      /src/api/services/control/control.pb.go:2438 +0x88
  github.com/moby/buildkit/util/grpcerrors.UnaryServerInterceptor()
      /src/util/grpcerrors/intercept.go:14 +0x7c
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
      /src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x8e
  go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1()
      /src/vendor/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc/interceptor.go:342 +0x6fa
  main.unaryInterceptor.func1()
      /src/cmd/buildkitd/main.go:603 +0x27c
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
      /src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x8e
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1()
      /src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34 +0x126
  github.com/moby/buildkit/api/services/control._Control_Solve_Handler()
      /src/api/services/control/control.pb.go:2440 +0x1dd
  google.golang.org/grpc.(*Server).processUnaryRPC()
      /src/vendor/google.golang.org/grpc/server.go:1336 +0x177a
  google.golang.org/grpc.(*Server).handleStream()
      /src/vendor/google.golang.org/grpc/server.go:1704 +0xff8
  google.golang.org/grpc.(*Server).serveStreams.func1.2()
      /src/vendor/google.golang.org/grpc/server.go:965 +0xec

Goroutine 68 (running) created at:
  github.com/moby/buildkit/executor/resources.(*Sampler[...]).Record()
      /src/executor/resources/sampler.go:82 +0x2db
  github.com/moby/buildkit/solver/llbsolver.(*Solver).Solve()
      /src/solver/llbsolver/solver.go:419 +0x164
  github.com/moby/buildkit/control.(*Controller).Solve()
      /src/control/control.go:437 +0x1bfa
  github.com/moby/buildkit/api/services/control._Control_Solve_Handler.func1()
      /src/api/services/control/control.pb.go:2438 +0x88
  github.com/moby/buildkit/util/grpcerrors.UnaryServerInterceptor()
      /src/util/grpcerrors/intercept.go:14 +0x7c
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
      /src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x8e
  go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1()
      /src/vendor/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc/interceptor.go:342 +0x6fa
  main.unaryInterceptor.func1()
      /src/cmd/buildkitd/main.go:603 +0x27c
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
      /src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x8e
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1()
      /src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34 +0x126
  github.com/moby/buildkit/api/services/control._Control_Solve_Handler()
      /src/api/services/control/control.pb.go:2440 +0x1dd
  google.golang.org/grpc.(*Server).processUnaryRPC()
      /src/vendor/google.golang.org/grpc/server.go:1336 +0x177a
  google.golang.org/grpc.(*Server).handleStream()
      /src/vendor/google.golang.org/grpc/server.go:1704 +0xff8
  google.golang.org/grpc.(*Server).serveStreams.func1.2()
      /src/vendor/google.golang.org/grpc/server.go:965 +0xec

edit: this was produced by running CGO_ENABLED=1 GOBUILDFLAGS="-race" TESTFLAGS="-v -test.run=TestClientGatewayIntegration/TestClientGatewaySolve" TESTPKGS=./client ./hack/test integration

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe after callback() should take s.mu.RLock() again and check if any of the active where removed from subs and skip them if they were.

I like this idea; I updated the code to have the subs all use the same sampler lock, which simplifies things.

The race I was getting warnings for (see above) was that ss.err is being set under run(), but also read by Close when doing if s.err != nil {.

Protect against data race occuring while both sampler.Run and sub.Close
accesses the same data from multiple threads.

Signed-off-by: Alex Couture-Beil <alex@earthly.dev>
@alexcb alexcb force-pushed the acb/sampler-data-race-fix branch 3 times, most recently from 54a203f to 364340b Compare July 10, 2023 17:49
executor/resources/sampler.go Outdated Show resolved Hide resolved
Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the s.sampler.mu.Unlock() is now in defer. If it is already removed from subs then that should take care of it not showing up in range cycles in run() anymore.

Signed-off-by: Alex Couture-Beil <alex@earthly.dev>
@alexcb alexcb force-pushed the acb/sampler-data-race-fix branch from 364340b to 86a740f Compare July 10, 2023 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants