Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: summary and test output interleaved #40771

Closed
mattmoor opened this issue Aug 13, 2020 · 50 comments
Closed

testing: summary and test output interleaved #40771

mattmoor opened this issue Aug 13, 2020 · 50 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@mattmoor
Copy link

mattmoor commented Aug 13, 2020

What version of Go are you using (go version)?

>> go version
go version go1.14.6 linux/amd64

Does this issue reproduce with the latest release?

We haven't yet picked up 1.15, which cut like yesterday and it's an intermittent flake for us.

What operating system and processor architecture are you using (go env)?

go env Output
>> go version
go version go1.14.6 linux/amd64

What did you do?

We are running our e2e tests with t.Parallel() and -v thru gotestsum.

What did you expect to see?

gotestsum uses test2json to parse the build log output, and I expect to see passing test results for all of the tests run.

What did you see instead?

Action: pass events are missing for typically 2 of the tests.

We were originally chasing this here (has links to richer logs): gotestyourself/gotestsum#141

To include some relevant context here, we see several test summaries that are interleaved with other output, which seems to trip up test2json. Here are three from a single run (3 diff API versions):

Normal (what you'd normally see):

--- PASS: TestContainerExitingMsg (0.01s)
    --- PASS: TestContainerExitingMsg/tcp (27.63s)
    --- PASS: TestContainerExitingMsg/http (160.37s)

Weird (interleaved, but surfaces events properly):

--- PASS: TestContainerExitingMsg (0.00s)
        + 				Message:            "Waiting for Envoys to receive Endpoints data.",
    --- PASS: TestContainerExitingMsg/http (179.23s)
          			},
          		},
          		Annotations: nil,
    --- PASS: TestContainerExitingMsg/tcp (194.07s)

Broken (interleaved with another test context switch, breaks things):

--- PASS: TestContainerExitingMsg (0.00s)
=== CONT  TestBlueGreenRoute
    kubelogs.go:201: I 01:01:58.381 webhook-98759d99-pfkgh [conversion/conversion.go:133] [serving-tests/blue-green-route-atsgeusz-q56cd] Converting [kind=Revision group=serving.knative.dev version=v1] to version serving.knative.dev/v1beta1
    --- PASS: TestContainerExitingMsg/http (49.33s)
    --- PASS: TestContainerExitingMsg/tcp (165.25s)

If I change the ordering of lines in this last section, I get the appropriate set of events.

@andybons andybons changed the title go test -v can interleave summary and test output cmd/go: go test -v can interleave summary and test output Aug 13, 2020
@andybons andybons added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 13, 2020
@andybons andybons added this to the Unplanned milestone Aug 13, 2020
@andybons
Copy link
Member

@bcmills @matloob @jayconrod

@dnephin
Copy link
Contributor

dnephin commented Aug 13, 2020

The test2json bug can be seen in these logs:

{"Time":"2020-08-11T01:03:41.641590332Z","Action":"output","Package":"knative.dev/serving/test/conformance/api/v1","Test":"TestContainerExitingMsg","Output":"--- PASS: TestContainerExitingMsg (0.00s)\n"}
{"Time":"2020-08-11T01:03:41.641597357Z","Action":"pass","Package":"knative.dev/serving/test/conformance/api/v1","Test":"TestContainerExitingMsg","Elapsed":0}
{"Time":"2020-08-11T01:03:41.641625254Z","Action":"output","Package":"knative.dev/serving/test/conformance/api/v1","Test":"TestBlueGreenRoute","Output":"    --- PASS: TestContainerExitingMsg/http (49.33s)\n"}
{"Time":"2020-08-11T01:03:41.641632549Z","Action":"output","Package":"knative.dev/serving/test/conformance/api/v1","Test":"TestBlueGreenRoute","Output":"    --- PASS: TestContainerExitingMsg/tcp (165.25s)\n"}

The --- PASS TestContainerExitingMsg output is attributed to TestBlueGreenRoute, and test2json does to create Action: pass events for those 2 subtests.

@mattmoor
Copy link
Author

I actually suspect that the testing package has races related to the === markers.

Here's another strange interleaving I see, which ultimately leaves certain log lines under the incorrect heading:

=== CONT  TestSecretVolume             <-- Nothing logged wtf?
=== CONT  TestRevisionTimeout
    kubelogs.go:201: I 01:00:41.328 autoscaler-86d4bd9bcb-hvnt8 [kpa-class-podautoscaler-controller] [serving-tests/revision-timeout-when-b67b736422814c3736c90c658f29bd58-zwcnw8nw] Sleeping additionally for 57.671904984s before can scale to 0
    # These log lines actually belong to the first empty heading!
    kubelogs.go:201: I 01:00:49.594 webhook-98759d99-pfkgh [conversion/conversion.go:133] [serving-tests/secret-volume-dsivzkqq] Converting [kind=Route group=serving.knative.dev version=v1] to version serving.knative.dev/v1alpha1
    kubelogs.go:201: I 01:00:52.333 contour-ingress-controller-5dc846564f-jtmpz [controller/controller.go:520] [serving-tests/projected-secret-volume-erpaxwuw] Reconcile succeeded. Time taken: 31.082µs

Is there a way to disable the streaming interleaved output? If so, then we can probably mitigate many of this issues by simply throwing that in our CI setup.

@bcmills
Copy link
Contributor

bcmills commented Aug 13, 2020

This is probably related to the fix for #39308, and thus also to #40657. (CC @jadekler)

@bcmills bcmills changed the title cmd/go: go test -v can interleave summary and test output testing: summary and test output interleaved Aug 13, 2020
@bcmills
Copy link
Contributor

bcmills commented Aug 13, 2020

@mattmoor, the output in that last snippet does not look like a call to t.Logf. Was it written directly to os.Stdout or os.Stderr? (If so, the answer is “don't write to os.Stdout or os.Stderr in a parallel test if you want the output to associate correctly”.)

@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 13, 2020
@mattmoor
Copy link
Author

Should all be t.Logf: https://github.com/knative/pkg/blob/c9f9284521f197a17fcdbaf5c4360bef8ef8570d/test/logstream/kubelogs.go#L222

@bcmills @dnephin Is there any way to mitigate this?

Ideally we'd want the -v behavior prior to streaming, but I'd also consider/propose dropping -v in CI if that's the only way until this is fixed.

@mattmoor
Copy link
Author

You may be looking at this formatting, but that's going through the logf: https://github.com/knative/pkg/blob/c9f9284521f197a17fcdbaf5c4360bef8ef8570d/test/logstream/kubelogs.go#L185

@mattmoor
Copy link
Author

Switching to gotestsum's equivalent of "no -v" doesn't help, I suspect that's because -json is purely output based? 😞

@bcmills
Copy link
Contributor

bcmills commented Aug 14, 2020

@mattmoor, are these t.Logf calls occurring after the test has already completed? (Is the --- PASS line actually chronologically misplaced, or is this more just a matter of missing === CONT headers?)

@mattmoor
Copy link
Author

No, we stop calling t.Logf for a particular test when the cancel function is called, which is registered via t.Cleanup

@bcmills
Copy link
Contributor

bcmills commented Aug 14, 2020

Could you please provide a link to an actual test that is failing? I can see how this library is wired to its exported API, but that doesn't really help me understand the sequencing or concurrency properties of the calls in the actual test.

@bcmills bcmills added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Aug 14, 2020
@mattmoor
Copy link
Author

A lot of tests are failing with this. The ones pertinent to the logs in the linked issue are here: https://github.com/knative/serving/blob/d9d93bd2ce10140af538acaa787aec031c526401/test/conformance/api/v1/errorcondition_test.go#L160

The specific callsite I link above is what initializes the logstream here:
https://github.com/knative/serving/blob/d9d93bd2ce10140af538acaa787aec031c526401/test/conformance.go#L72-L73

@bcmills
Copy link
Contributor

bcmills commented Aug 18, 2020

@mattmoor, could you provide a complete test output demonstrating the problem? I'd like to see which messages (particularly === CONT and === PAUSE messages) precede the misordered ones.

Did this start as of go1.14.6, or has it been broken across all go1.14 releases?

@mattmoor
Copy link
Author

I believe the event problem was new in 1.14.6, but we had other issues related to the fix that landed in .6

There is a link in the top comment of gotestyourself/gotestsum#141 with a link to download the full build log. This is interleaved with some other bash output, but when I was playing with test2json locally I'd just snipped the go test output out of one of these.

@bcmills
Copy link
Contributor

bcmills commented Aug 18, 2020

@mattmoor, one more question: do the tests in question pass if the -race flag is set? Trying to rule out an unlikely data race.

@mattmoor
Copy link
Author

Running go test with args: -race -count=1 -timeout=30m ./test/conformance/api/... ./test/conformance/runtime/... ./test/e2e --resolvabledomain=false --ingressClass=contour.ingress.networking.knative.dev -tags=e2e

Looks like this is with -race.

@bcmills
Copy link
Contributor

bcmills commented Aug 18, 2020

Thanks, that's helpful. I haven't been able to reproduce this with all logging occurring synchronously within the test, but I can get a very similar symptom by logging after the test has returned (which appears to bypass both the streaming output and synchronization):

https://play.golang.org/p/SnpZP6n0lvk

func TestWithDelayedOutput(t *testing.T) {
	t.Parallel()

	subtestsDone := make(chan struct{})
	var delayed sync.WaitGroup
	for i := 0; i < 3; i++ {
		t.Run(fmt.Sprint(i), func(t *testing.T) {
			delayed.Add(1)
			go func() {
				<-subtestsDone
				defer delayed.Done()
				t.Logf(t.Name())
			}()
		})
	}

	close(subtestsDone)
	delayed.Wait()
}
=== RUN   TestWithDelayedOutput
=== PAUSE TestWithDelayedOutput
=== CONT  TestWithDelayedOutput
=== RUN   TestWithDelayedOutput/0
=== RUN   TestWithDelayedOutput/1
=== RUN   TestWithDelayedOutput/2
--- PASS: TestWithDelayedOutput (0.00s)
    --- PASS: TestWithDelayedOutput/0 (0.00s)
    --- PASS: TestWithDelayedOutput/1 (0.00s)
    --- PASS: TestWithDelayedOutput/2 (0.00s)
    prog.go:22: TestWithDelayedOutput/0
    prog.go:22: TestWithDelayedOutput/1
    prog.go:22: TestWithDelayedOutput/2
PASS

I suspect that it won't be hard to extrapolate from that to a racy write.

@bcmills
Copy link
Contributor

bcmills commented Aug 18, 2020

I am able to reproduce the interrupting === CONT line by wrapping this bytes.NewReader in an iotest.OneByteReader, but I still can't get it to reproduce without.

@mattmoor
Copy link
Author

@bcmills What if there is logging between the return of the test, but before the t.Cleanups have executed?

This is actually very possible today since the t.Cleanup is what cancels our logstream to the test's t.Log

@bcmills
Copy link
Contributor

bcmills commented Aug 19, 2020

@mattmoor, it appears that logging to a test after the test function but before the t.Cleanup is generally racy.

Logging to a subtest before the main test's Cleanup results in a data race:

https://play.golang.org/p/EaLLhYQejXa

==================
WARNING: DATA RACE
Read at 0x00c000104658 by goroutine 10:
  testing.(*common).frameSkip()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:479 +0x26e
  testing.(*common).decorate()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:524 +0x64
  testing.(*common).logDepth()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:734 +0x1a9
  testing.(*common).log()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:720 +0x8f
  testing.(*common).Logf()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:766 +0x21
  example%2ecom.TestWithDelayedOutput.func1.1()
      /tmp/tmp.BqaVZz8DDs/example.com/main_test.go:20 +0xe5

Previous write at 0x00c000104658 by goroutine 7:
  testing.(*common).Cleanup.func1()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:858 +0xba
  testing.(*common).runCleanup()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:936 +0xca
  testing.tRunner.func2()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:1102 +0x74
  testing.tRunner()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:1112 +0x22a

Goroutine 10 (running) created at:
  example%2ecom.TestWithDelayedOutput.func1()
      /tmp/tmp.BqaVZz8DDs/example.com/main_test.go:17 +0x92
  testing.tRunner()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:1108 +0x202

Goroutine 7 (running) created at:
  testing.(*T).Run()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:1159 +0x796
  testing.runTests.func1()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:1430 +0xa6
  testing.tRunner()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:1108 +0x202
  testing.runTests()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:1428 +0x5aa
  testing.(*M).Run()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:1338 +0x4eb
  main.main()
      _testmain.go:43 +0x236
==================
--- FAIL: TestWithDelayedOutput (0.00s)
    main_test.go:20: TestWithDelayedOutput/0
    main_test.go:20: TestWithDelayedOutput/2
    main_test.go:20: TestWithDelayedOutput/1
    testing.go:1023: race detected during execution of test
FAIL
exit status 1
FAIL	example.com	0.020s

Likewise for logging to the main test function itself:

https://play.golang.org/p/sGxwXx_D8KH

==================
WARNING: DATA RACE
Read at 0x00c000114358 by goroutine 9:
  testing.(*common).frameSkip()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:479 +0x26e
  testing.(*common).decorate()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:524 +0x64
  testing.(*common).logDepth()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:751 +0x5ab
  testing.(*common).log()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:720 +0x8f
  testing.(*common).Logf()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:766 +0x21
  example%2ecom.TestWithDelayedOutput.func1()
      /tmp/tmp.BqaVZz8DDs/example.com/main_test.go:14 +0x71

Previous write at 0x00c000114358 by goroutine 7:
  testing.(*common).Cleanup.func1()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:858 +0xba
  testing.(*common).runCleanup()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:936 +0xca
  testing.tRunner.func2()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:1102 +0x74
  testing.tRunner()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:1112 +0x22a

Goroutine 9 (running) created at:
  example%2ecom.TestWithDelayedOutput()
      /tmp/tmp.BqaVZz8DDs/example.com/main_test.go:12 +0x86
  testing.tRunner()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:1108 +0x202

Goroutine 7 (running) created at:
  testing.(*T).Run()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:1159 +0x796
  testing.runTests.func1()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:1430 +0xa6
  testing.tRunner()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:1108 +0x202
  testing.runTests()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:1428 +0x5aa
  testing.(*M).Run()
      /usr/local/google/home/bcmills/sdk/gotip/src/testing/testing.go:1338 +0x4eb
  main.main()
      _testmain.go:43 +0x236
==================
--- FAIL: TestWithDelayedOutput (0.00s)
    main_test.go:14: BRAAAAINS
    testing.go:1023: race detected during execution of test
FAIL
exit status 1
FAIL	example.com	0.022s

On the other hand, logging to a test before that test function's own Cleanup seems to be tolerated (https://play.golang.org/p/X514jUu6UPM). I'm not sure why that is. If you've got a use-case where it matters, I would suggest filing a separate issue about the safety of t.Log during a t.Cleanup function.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/249026 mentions this issue: testing: flush test summaries to stdout atomically when streaming output

@bcmills
Copy link
Contributor

bcmills commented Aug 19, 2020

@mattmoor, I still can't reproduce the reported behavior without modifying the code, but I think I understand where it's coming from. Could you try patching in https://golang.org/cl/249026 and see if that fixes the test output in your environment?

@bcmills bcmills removed this from the Unplanned milestone Aug 19, 2020
@bcmills
Copy link
Contributor

bcmills commented Sep 2, 2020

@mattmoor, this should be fixed at head, but since I couldn't reproduce the failure without injecting a OneByteReader I'm a little uncomfortable backporting to 1.14.9 or 1.15.2 without further testing. Any chance you could try a build from source (https://golang.org/doc/install/source#head) or use golang.org/dl/gotip to confirm that it fixes the symptoms you observed?

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/252638 mentions this issue: [release-branch.go1.14] testing: flush test summaries to stdout atomically when streaming output

@mattmoor
Copy link
Author

mattmoor commented Sep 2, 2020

I'll talk to the folks that operator the Knative CI infrastructure. This hits about once a day in each of our post-submit test legs, so if we could convert even just one of those, it'd be a pretty strong signal IMO.

@bcmills
Copy link
Contributor

bcmills commented Sep 2, 2020

If you want to minimize other variables, you could also try building from source on release-branch.go1.14 with CL 252638 manually patched in, and/or release-branch.go1.15 with CL 252637 manually patched in.

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 2, 2020
@mattmoor
Copy link
Author

mattmoor commented Sep 8, 2020

Alrighty, I think I have good news, but you should take this with the grain of salt that this thing is obvs non-deterministic 😅

So in our typical "contour-latest" leg, we see this manifesting typically a couple times a day (periodic run every ~2 hours).

image

Our test-infra folks added an experimental leg that's been running with this change, and clicking through each of legs to see their results in "spyglass" (we don't have it in "testgrid") I don't see any manifestations of this in ~31 runs of this so far.

So I'm optimistic that this fixes things 🤞 🎉

@mattmoor
Copy link
Author

This is still biting us pretty pervasively across our various e2e tests. When do we think we can expect this in a patch release on 1.14 or 1.15?

@bcmills
Copy link
Contributor

bcmills commented Sep 17, 2020

@mattmoor, if we don't get any reports of regressions at head, then I would guess that this can make 1.15.3.

(It would help to have more data supporting that it fixes your issue — you mentioned ~31 runs earlier; have you had any occurrences of the bug using the patched version since then?)

@mattmoor
Copy link
Author

I will try to dig the link to those runs back up and see if they are still running, but I haven't looked since posting above.

@mattmoor
Copy link
Author

Something had broken the test image, but they kicked it today and it's running again. Every build since has been clean, and I'll keep an eye on it over the weekend.

For my own sanity: https://prow.knative.dev/?job=ci-knative-serving-contour-latest-beta-prow-tests

@mattmoor
Copy link
Author

So far another 16 runs without this issue manifesting 🤞

@mattmoor
Copy link
Author

Up to 29 runs without this issue. As a parallel datapoint, in our unmodified leg over roughly the same period 6 of the last 35 runs have failed with this.

@mattmoor
Copy link
Author

Up to 44 runs without this issue manifesting. How much longer should we do this? I think we'll soon start to fall off of our run history and start to fall victim to my horrible mental bookmarking / checkpointing in these numbers.

Future self: Green run number 44 was 9/21 at 15:43:56.

@bcmills
Copy link
Contributor

bcmills commented Sep 22, 2020

That seems like pretty solid evidence to me. I'll talk with the folks on the release team and see if we can get this into the next minor release.

Thanks for following up!

gopherbot pushed a commit that referenced this issue Oct 5, 2020
…cally when streaming output

While debugging #40771, I realized that the chatty printer should only
ever print to a single io.Writer (normally os.Stdout). The other
Writer implementations in the chain write to local buffers, but if we
wrote a test's output to a local buffer, then we did *not* write it to
stdout and we should not store it as the most recently logged test.

Because the chatty printer should only ever print to one place, it
shouldn't receive an io.Writer as an argument — rather, it shouldn't
be used at all for destinations other than the main output stream.

On the other hand, when we flush the output buffer to stdout in the
top-level flushToParent call, it is important that we not allow some
other test's output to intrude between the test summary header and the
remainder of the test's output. cmd/test2json doesn't know how to
parse such an intrusion, and it's confusing to humans too.

No test because I couldn't reproduce the user-reported error without
modifying the testing package. (This behavior seems to be very
sensitive to output size and/or goroutine scheduling.)

Fixes #40880
Updates #40771
Updates #38458

Change-Id: Ic19bf1d535672b096ba1c8583a3b74aab6d6d766
Reviewed-on: https://go-review.googlesource.com/c/go/+/249026
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Jay Conrod <jayconrod@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
(cherry picked from commit 51c0bdc)
Reviewed-on: https://go-review.googlesource.com/c/go/+/252638
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Bryan C. Mills <bcmills@google.com>
gopherbot pushed a commit that referenced this issue Oct 5, 2020
…cally when streaming output

While debugging #40771, I realized that the chatty printer should only
ever print to a single io.Writer (normally os.Stdout). The other
Writer implementations in the chain write to local buffers, but if we
wrote a test's output to a local buffer, then we did *not* write it to
stdout and we should not store it as the most recently logged test.

Because the chatty printer should only ever print to one place, it
shouldn't receive an io.Writer as an argument — rather, it shouldn't
be used at all for destinations other than the main output stream.

On the other hand, when we flush the output buffer to stdout in the
top-level flushToParent call, it is important that we not allow some
other test's output to intrude between the test summary header and the
remainder of the test's output. cmd/test2json doesn't know how to
parse such an intrusion, and it's confusing to humans too.

No test because I couldn't reproduce the user-reported error without
modifying the testing package. (This behavior seems to be very
sensitive to output size and/or goroutine scheduling.)

Fixes #40881
Updates #40771
Updates #38458

Change-Id: Ic19bf1d535672b096ba1c8583a3b74aab6d6d766
Reviewed-on: https://go-review.googlesource.com/c/go/+/249026
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Jay Conrod <jayconrod@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
(cherry picked from commit 51c0bdc)
Reviewed-on: https://go-review.googlesource.com/c/go/+/252637
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Bryan C. Mills <bcmills@google.com>
claucece pushed a commit to claucece/go that referenced this issue Oct 22, 2020
…cally when streaming output

While debugging golang#40771, I realized that the chatty printer should only
ever print to a single io.Writer (normally os.Stdout). The other
Writer implementations in the chain write to local buffers, but if we
wrote a test's output to a local buffer, then we did *not* write it to
stdout and we should not store it as the most recently logged test.

Because the chatty printer should only ever print to one place, it
shouldn't receive an io.Writer as an argument — rather, it shouldn't
be used at all for destinations other than the main output stream.

On the other hand, when we flush the output buffer to stdout in the
top-level flushToParent call, it is important that we not allow some
other test's output to intrude between the test summary header and the
remainder of the test's output. cmd/test2json doesn't know how to
parse such an intrusion, and it's confusing to humans too.

No test because I couldn't reproduce the user-reported error without
modifying the testing package. (This behavior seems to be very
sensitive to output size and/or goroutine scheduling.)

Fixes golang#40881
Updates golang#40771
Updates golang#38458

Change-Id: Ic19bf1d535672b096ba1c8583a3b74aab6d6d766
Reviewed-on: https://go-review.googlesource.com/c/go/+/249026
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Jay Conrod <jayconrod@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
(cherry picked from commit 51c0bdc)
Reviewed-on: https://go-review.googlesource.com/c/go/+/252637
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Bryan C. Mills <bcmills@google.com>
@calexicoz
Copy link

Hi.

It seems that this change to only flush on completion essentially prevents you from watching tests in the terminal. This is not too much an issue for CI/CD, but it's unsettling when running tests manually. Is there any workaround to still be able to see some progress while tests run, and not just a wall of text after 30min? Am I missing something big here?

@bcmills
Copy link
Contributor

bcmills commented Nov 16, 2020

@calexicoz, the change for this issue was not to “to only flush on completion”. Flushing on completion is the default behavior for Go tests.

The major change was to add support for streaming output while the test is running, but only if the -v flag is set (#24929). This issue was a fix to the synchronization in the implementation of that feature.

@calexicoz
Copy link

calexicoz commented Nov 17, 2020

Mmm. Something changed recently though. I am running my tests as:

go test -v -tags integration_testing ./pkg/test/integration/...

And it definitely does not stream. It's a long test suite, so it runs for 20min or so then outputs everything. I'm on go version go1.15.5 darwin/amd64. Should I open a new issue? It was working until yesterday morning when I updated Homebrew and got the newest Go version. Moving the -v flag around to different locations on the command line does not work either.

@bcmills
Copy link
Contributor

bcmills commented Nov 17, 2020

@calexicoz, yes, please open a new issue (ideally with steps to reproduce it).

(Is it possible that all of the logging in your test is going to a *testing.T for which the corresponding Test function or Run callback has already returned?)

@calexicoz
Copy link

(Is it possible that all of the logging in your test is going to a *testing.T for which the corresponding Test function or Run callback has already returned?)

I don't have any logging in the test proper. All I expect is simply to see the RUN and PASS statements as it goes. It used to do that and does not anymore. I'll see if I can reproduce it in a test repo and then I'll open a new issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

7 participants