Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --verbose option to test/e2e/ #24471

Closed
jayunit100 opened this issue Apr 19, 2016 · 11 comments
Closed

Add --verbose option to test/e2e/ #24471

jayunit100 opened this issue Apr 19, 2016 · 11 comments
Assignees
Labels
area/test lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/testing Categorizes an issue or PR as relevant to SIG Testing.

Comments

@jayunit100
Copy link
Member

jayunit100 commented Apr 19, 2016

Problem
There are a few issues in the e2e tests which are very verbose at +10 node scale. This can make it hard to interpret and read through the results of a test run, esp. when there are multiple failures. The two things i've seen in the past.

  • hanging at large scale.
  • (failures) dumping all data for 100s of nodes to stdout.

Proposed Solution (4/18/2016)

When developing e2es at small (2-4) node scale, the info is very useful. We should log lots of stuff, but just minimize the output, unless the user specifies --verbose.

UPDATED SOLUTION (4/21/2016)

The simplest solution wound up simply supporting (2) below. (1) can be done later if we really need to.

So we should

  1. introspect the cluster and find out how many nodes there are.
  2. Add a Debugf which runs by default in small clusters.
  3. Make dump call Debugf.
  4. Drastically reduce the use of Logf across the e2e suite.

cc @kubernetes/sig-testing @kubernetes/sig-scalability

@jayunit100
Copy link
Member Author

jayunit100 commented Apr 20, 2016

This script counts the log lines for each step in the E2Es...

currStep = "N/A"
currCount = 0
print "starting"
def status():
        print currStep, " ", currCount
for line in fileinput.input():
        if "STEP" in line:
                status()
                currStep = line
                currCount = 0
        else:
                currCount = currCount + 1
print currCount

runnning this on the test output from the CI, I get these culprits for scaleout overlogging...

STEP: Waiting for a default service account to be provisioned in namespace
  85
STEP: Waiting for a default service account to be provisioned in namespace
  85
STEP: Waiting for a default service account to be provisioned in namespace
  85
STEP: creating replication controller cleanup60-7b6834b4-0720-11e6-a521-42010af0000d in namespace e2e-tests-kubelet-hb0om
  59
STEP: deleting replication controller cleanup60-7b6834b4-0720-11e6-a521-42010af0000d in namespace e2e-tests-kubelet-hb0om
  103
STEP: Waiting for a default service account to be provisioned in namespace
  125
STEP: creating replication controller svc-latency-rc in namespace e2e-tests-svc-latency-cs82v
  407
STEP: Waiting for a default service account to be provisioned in namespace
  85
STEP: Waiting for a default service account to be provisioned in namespace
  125
STEP: creating replication controller proxy-service-rd0m4 in namespace e2e-tests-proxy-51h6k
  692

We could actually run this at the end of jenkins jobs if we want to punish overlogging programmatically. but for nowill just audit these tests and make them less verbose.

@jayunit100
Copy link
Member Author

fyi @timothysc "scale killers" overlogging.

@timothysc
Copy link
Member

Then fix them ;-)

@jayunit100
Copy link
Member Author

On it yup

@jayunit100
Copy link
Member Author

culprit: config.DefaultReporterConfig.Verbose this is currently set to true.

  • This has the advantage of giving us spec progress.
  • The disadvantage is that in ginkgo that also means the Logs get streamed out.

So, I think the simplest solution is to have the debug logs go into an output file, which has both INFO as well as DEBUG.

@jayunit100
Copy link
Member Author

jayunit100 commented Apr 21, 2016

Dug some more, I have a patch that will do this the easy way,

  • Require -v 2 to see the ugly granular logs, which go through the Debugf pathway.
  • User can decide and no need for extra directories/files.

Thus, this separates ginkgo progress logs from debug e2e logs (Which almost always really should be verbose) from the other logs (which can be on/off via -v glog).

k8s-github-robot pushed a commit that referenced this issue Sep 23, 2016
Automatic merge from submit-queue

Logging soak

Implements #24427 

Needs 

- #24471 so that it doesnt clog test outputs for scale
- builds on the utils function added in support of #22869 

cc @timothysc @kubernetes/sig-testing
@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 31, 2017
@0xmichalis
Copy link
Contributor

/sig testing

@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Jun 20, 2017
@0xmichalis 0xmichalis removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 20, 2017
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 29, 2017
@spiffxp
Copy link
Member

spiffxp commented Jan 7, 2018

/remove-area test-infra
/area test

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 10, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

No branches or pull requests

8 participants