Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestCompletenessInit race condition #19294

Closed
zmb3 opened this issue Dec 12, 2022 · 7 comments · Fixed by #19390
Closed

TestCompletenessInit race condition #19294

zmb3 opened this issue Dec 12, 2022 · 7 comments · Fixed by #19390
Assignees

Comments

@zmb3
Copy link
Collaborator

zmb3 commented Dec 12, 2022

Failure

Link(s) to logs

Relevant snippet

    testing.go:1319: race detected during execution of test
--- FAIL: TestCompletenessInit (12.87s)
@zmb3
Copy link
Collaborator Author

zmb3 commented Dec 12, 2022

Took 10-15 minutes to reproduce locally, but I was able to trigger it.

==================
WARNING: DATA RACE
Read at 0x00c000702c60 by goroutine 37749:
  github.com/gravitational/trace.(*TraceErr).UserMessage()
      /Users/zmb/go/pkg/mod/github.com/gravitational/trace@v1.1.19/trace.go:291 +0xe4
  github.com/gravitational/trace.(*TraceErr).Error()
      /Users/zmb/go/pkg/mod/github.com/gravitational/trace@v1.1.19/trace.go:324 +0x28
  github.com/sirupsen/logrus.(*JSONFormatter).Format()
      /Users/zmb/go/pkg/mod/github.com/gravitational/logrus@v1.4.4-0.20210817004754-047e20245621/json_formatter.go:70 +0x184
  github.com/gravitational/teleport/lib/utils.(*JSONFormatter).Format()
      /Users/zmb/teleport/lib/utils/formatter.go:249 +0x1e0
  github.com/sirupsen/logrus.(*Entry).write()
      /Users/zmb/go/pkg/mod/github.com/gravitational/logrus@v1.4.4-0.20210817004754-047e20245621/entry.go:289 +0xe0
  github.com/sirupsen/logrus.(*Entry).log()
      /Users/zmb/go/pkg/mod/github.com/gravitational/logrus@v1.4.4-0.20210817004754-047e20245621/entry.go:252 +0x704
  github.com/sirupsen/logrus.(*Entry).Log()
      /Users/zmb/go/pkg/mod/github.com/gravitational/logrus@v1.4.4-0.20210817004754-047e20245621/entry.go:301 +0x80
  github.com/sirupsen/logrus.(*Entry).Warn()
      /Users/zmb/go/pkg/mod/github.com/gravitational/logrus@v1.4.4-0.20210817004754-047e20245621/entry.go:322 +0x29c
  github.com/gravitational/teleport/lib/cache.(*Cache).update()
      /Users/zmb/teleport/lib/cache/cache.go:823 +0x17c
  github.com/gravitational/teleport/lib/cache.(*Cache).Start.func1()
      /Users/zmb/teleport/lib/cache/cache.go:771 +0x64

Previous write at 0x00c000702c60 by goroutine 37460:
  github.com/gravitational/trace.(*TraceErr).AddUserMessage()
      /Users/zmb/go/pkg/mod/github.com/gravitational/trace@v1.1.19/trace.go:260 +0xe8
  github.com/gravitational/trace.Wrap()
      /Users/zmb/go/pkg/mod/github.com/gravitational/trace@v1.1.19/trace.go:64 +0x1d0
  github.com/gravitational/teleport/lib/cache.(*Cache).fetch.func2()
      /Users/zmb/teleport/lib/cache/cache.go:1262 +0x494
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /Users/zmb/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 +0x6c

Goroutine 37749 (running) created at:
  github.com/gravitational/teleport/lib/cache.(*Cache).Start()
      /Users/zmb/teleport/lib/cache/cache.go:771 +0x294
  github.com/gravitational/teleport/lib/cache.New()
      /Users/zmb/teleport/lib/cache/cache.go:749 +0x1524
  github.com/gravitational/teleport/lib/cache.TestCompletenessInit()
      /Users/zmb/teleport/lib/cache/cache_test.go:588 +0x68c
  testing.tRunner()
      /opt/homebrew/Cellar/go/1.19.4/libexec/src/testing/testing.go:1446 +0x188
  testing.(*T).Run.func1()
      /opt/homebrew/Cellar/go/1.19.4/libexec/src/testing/testing.go:1493 +0x40

Goroutine 37460 (finished) created at:
  golang.org/x/sync/errgroup.(*Group).Go()
      /Users/zmb/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:72 +0x10c
  github.com/gravitational/teleport/lib/cache.(*Cache).fetch()
      /Users/zmb/teleport/lib/cache/cache.go:1244 +0x520
  github.com/gravitational/teleport/lib/cache.(*Cache).fetchAndWatch()
      /Users/zmb/teleport/lib/cache/cache.go:941 +0x328
  github.com/gravitational/teleport/lib/cache.(*Cache).update()
      /Users/zmb/teleport/lib/cache/cache.go:817 +0x14c
  github.com/gravitational/teleport/lib/cache.(*Cache).Start.func1()
      /Users/zmb/teleport/lib/cache/cache.go:771 +0x64
==================
--- FAIL: TestCompletenessInit (2.47s)
    testing.go:1319: race detected during execution of test
FAIL
FAIL	github.com/gravitational/teleport/lib/cache	12.820s
FAIL

@Tener
Copy link
Contributor

Tener commented Dec 14, 2022

@Tener
Copy link
Contributor

Tener commented Dec 14, 2022

The stack traces above were interesting, so I took a look. I believe the core issue is with the trace.Wrap function, since it modifies an existing error in place by calling AddUserMessage on it. This is a problem if another goroutine is in the process of calling Error() on *trace.TraceError, as ultimately this leads to race accessing TraceErr.Messages field.

Here is a small test replicating the issue with only trace as a dependency:

func TestRaceErrorWrap(t *testing.T) {
	baseErr := trace.BadParameter("foo")

	iters := 1000

	wg := sync.WaitGroup{}
	wg.Add(2)

	// trace.Wrap with user message
	go func() {
		for i := 0; i < iters; i++ {
			_ = trace.Wrap(baseErr, "foo bar %q", "baz")
		}
		wg.Done()
	}()

	// plain Error() call
	go func() {
		for i := 0; i < iters; i++ {
			_ = baseErr.Error()
		}
		wg.Done()
	}()

	wg.Wait()
}

Running with -race finds the issue pretty quickly.

I think this should be fixed by making trace.Wrap return a copy of the error in case AddUserMessage is called (which isn't often); what do you think?

@espadolini
Copy link
Contributor

@ibeckermayer
Copy link
Contributor

@Tener
Copy link
Contributor

Tener commented Dec 23, 2022

Reopening as v9 and v10 are still affected.

@Tener Tener reopened this Dec 23, 2022
@Tener
Copy link
Contributor

Tener commented Dec 29, 2022

Closing as gravitational/trace has now been updated on all active branches.

#19719
#19718

@Tener Tener closed this as completed Dec 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants