Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure logging #80253

Merged
merged 4 commits into from Aug 26, 2019

Conversation

@pohly
Copy link
Contributor

commented Jul 17, 2019

What type of PR is this?
/kind feature

What this PR does / why we need it:

Logging of failed assertions in E2E testing is inconsistent and sub-optimal:

  • some test failures (those that use framework.Failf or
    framework.ExpectNoError) are logged immediately, Gomega failures
    are not
  • framework.ExpectNoError logs just the error, which is often useless
    without the additional explanation
  • the full stack trace is broken (a bug in Ginkgo)
  • failures that occur after some others are not reported at all;
    this can happen in cleanup code and while that code should better
    be written so that it continues instead of failing on an assertion,
    in practice quite a lot of code fails and when it does, the output
    would be useful to have

This made debugging a test with long-running cleanup code problematic.

Special notes for your reviewer:

This PR is split into an introduction of a unit test that demonstrates the current status followed by individual commits that address these shortcomings. The effect of each change can be seen in the corresponding change of the unit test.

Does this PR introduce a user-facing change?:

`framework.ExpectNoError` no longer logs the error and instead relies on using the new `log.Fail` as Gomega fail handler.
@pohly

This comment has been minimized.

Copy link
Contributor Author

commented Jul 17, 2019

/sig testing

@spiffxp

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

/approve
I think this is worth trying to land but would like someone from #testing-commons to take a look (also, needs rebase)

/cc @andrewsykim @timothysc
please hand off to someone else as you see fit

@k8s-ci-robot k8s-ci-robot requested review from andrewsykim and timothysc Aug 21, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, spiffxp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@neolit123

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

/priority backlog
LGTM

this is nice to have, thank @pohly
please rebase and we can merge it.

ginkgowrapper.Fail(nowStamp()+": "+msg, 1+offset)
}

// Fail is a replacement for ginkgo.Fail which logs the problem as it occurs
// and then calls ginkgowrapper.Fail.

This comment has been minimized.

Copy link
@neolit123

neolit123 Aug 21, 2019

Member

as a side note, it doesn't seem that we have that many direct calls to ginkgo.Fail:
https://github.com/kubernetes/kubernetes/search?q=%22ginkgo.Fail%22&unscoped_q=%22ginkgo.Fail%22

@oomichi

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

/cc @oomichi

@k8s-ci-robot k8s-ci-robot requested a review from oomichi Aug 21, 2019

e2e: unit test for logger.go and ginkgowrapper.go
This test runs a fake Ginkgo suite with various errors and checks how
logger.go respectively ginkgowrapper.go handle this. Right now, the
outcome is sub-optimal:

- some test failures (those that use framework.Failf or
  framework.ExpectNoError) are logged immediately, Gomega failures
  are not
- framework.ExpectNoError logs just the error, which is often useless
  without the additional explanation
- failures that occur after some others are not reported at all;
  this can happen in cleanup code and while that code should better
  be written so that it contines instead of failing on an assertion,
  in practice quite a lot of code fails and when it does, the output
  would be useful to have
- the full stack trace is odd and doesn't start with the expected
  function name

Example output:

• Failure [0.002 seconds]
log
/nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log/logger_test.go:35
  fails [It]
  /nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log/logger_test.go:39

  Jul 17 12:00:52.545: I'm failing.

  /nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log/logger_test.go:41

  Full Stack Trace
  	/nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log/logger.go:51 +0x143
  k8s.io/kubernetes/test/e2e/framework/log.Failf(...)
  	/nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log/logger.go:43
  k8s.io/kubernetes/test/e2e/framework/log_test.glob..func1.2.1(...)
  	/nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log/logger_test.go:41
  k8s.io/kubernetes/test/e2e/framework/log_test.glob..func1.2()
  	/nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log/logger_test.go:42 +0x52
  k8s.io/kubernetes/vendor/github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc00029b020, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
  	/nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log/logger_test.go:65 +0x1c8
  testing.tRunner(0xc000358600, 0x19818c0)
  	/nvme/gopath/go/src/testing/testing.go:865 +0xc0
  created by testing.(*T).Run
  	/nvme/gopath/go/src/testing/testing.go:916 +0x35a

------------------------------
Jul 17 12:00:52.545: INFO: before
Jul 17 12:00:52.545: INFO: I'm failing.
Jul 17 12:00:52.547: INFO: after
• Failure [0.002 seconds]
log
/nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log/logger_test.go:35
  asserts [It]
  /nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log/logger_test.go:44

  false is never true
  Expected
      <bool>: false
  to equal
      <bool>: true

  /nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log/logger_test.go:45

  Full Stack Trace
  	/nvme/gopath/src/k8s.io/kubernetes/vendor/github.com/onsi/gomega/internal/assertion/assertion.go:75 +0x1f1
  k8s.io/kubernetes/vendor/github.com/onsi/gomega/internal/assertion.(*Assertion).To(0xc00035f6c0, 0x1b42140, 0xc000350dd0, 0xc000350de0, 0x1, 0x1, 0x42e35f)
  	/nvme/gopath/src/k8s.io/kubernetes/vendor/github.com/onsi/gomega/internal/assertion/assertion.go:38 +0xc7
  k8s.io/kubernetes/test/e2e/framework/log_test.glob..func1.3()
  	/nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log/logger_test.go:45 +0x17e
  k8s.io/kubernetes/vendor/github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc00029b0e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
  	/nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log/logger_test.go:65 +0x1c8
  testing.tRunner(0xc000358600, 0x19818c0)
  	/nvme/gopath/go/src/testing/testing.go:865 +0xc0
  created by testing.(*T).Run
  	/nvme/gopath/go/src/testing/testing.go:916 +0x35a

------------------------------
Jul 17 12:00:52.548: INFO: before
Jul 17 12:00:52.549: INFO: after
• Failure [0.002 seconds]
log
/nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log/logger_test.go:35
  error [It]
  /nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log/logger_test.go:47

  hard-coded error
  Unexpected error:
      <*errors.errorString | 0xc000351930>: {
          s: "I'm an error, nice to meet to.",
      }
      I'm an error, nice to meet to.
  occurred

  /nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log/logger_test.go:49

  Full Stack Trace
  	/nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/util.go:1376 +0x191
  k8s.io/kubernetes/test/e2e/framework.ExpectNoError(...)
  	/nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/util.go:1367
  k8s.io/kubernetes/test/e2e/framework/log_test.glob..func1.4()
  	/nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log/logger_test.go:49 +0xc9
  k8s.io/kubernetes/vendor/github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc00029b200, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
  	/nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log/logger_test.go:65 +0x1c8
  testing.tRunner(0xc000358600, 0x19818c0)
  	/nvme/gopath/go/src/testing/testing.go:865 +0xc0
  created by testing.(*T).Run
  	/nvme/gopath/go/src/testing/testing.go:916 +0x35a

------------------------------
Jul 17 12:00:52.550: INFO: before
Jul 17 12:00:52.550: INFO: Unexpected error occurred: I'm an error, nice to meet to.
Jul 17 12:00:52.551: INFO: after

@pohly pohly force-pushed the pohly:failure-logging branch from 742bca6 to f3dd835 Aug 26, 2019

pohly added 3 commits Jul 17, 2019
e2e: consistent failure logging
All failures are worth logging immediately, not just unexpected
errors. That helps understand tests that have long-running cleanup
operations with their own logging, because the failure will be visible
inside the test output.

The logging in framework.ExpectNoError also was rather poor, because
it only showed the error, but not the additional information about it.

Tests suites now should use log.Fail as Gomega failure handler instead
of ginkgowrapper.Fail. log.Fail will handle the logging for all
failures before proceeding to record the failure in Ginkgo.

Because logging is always done also after a test failure, additional
failures during cleanup are now visible. Ginkgo itself just ignores
them.
e2e logger test: handle paths in Kubernetes CI
Source code paths during //test/e2e/framework/log:go_default_test in
the Kubernetes CI start with relative paths. To avoid too broad
matching of the regex, those paths that occur in practice are named
explicitly as alternatives to the leading slash.

@pohly pohly force-pushed the pohly:failure-logging branch from f3dd835 to cdaba2b Aug 26, 2019

@pohly

This comment has been minimized.

Copy link
Contributor Author

commented Aug 26, 2019

/retest

@pohly

This comment has been minimized.

Copy link
Contributor Author

commented Aug 26, 2019

please rebase and we can merge it.

@neolit123: done

@neolit123

This comment has been minimized.

Copy link
Member

commented Aug 26, 2019

/lgtm
given this has been up for a while and the general approach has seen a +1 let's merge it.
if anyone has objections let's amend.

i will log a separate ticket for cleaning up direct ginkgo.Fail calls.

@k8s-ci-robot k8s-ci-robot added the lgtm label Aug 26, 2019

@k8s-ci-robot k8s-ci-robot merged commit 18d7f88 into kubernetes:master Aug 26, 2019

21 of 24 checks passed

pull-kubernetes-conformance-kind-ipv6 Job triggered.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job triggered.
Details
tide Not mergeable. Needs lgtm label.
Details
cla/linuxfoundation pohly authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.

@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Aug 26, 2019

@neolit123

This comment has been minimized.

Copy link
Member

commented Aug 26, 2019

i will log a separate ticket for cleaning up direct ginkgo.Fail calls.

#81939

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2019

@pohly: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-conformance-kind-ipv6 cdaba2b link /test pull-kubernetes-conformance-kind-ipv6

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.