Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print progress updates to stdout and publish to URL #84524

Merged
merged 1 commit into from
Nov 11, 2019

Conversation

johnSchnake
Copy link
Contributor

What type of PR is this?
/kind feature

Minor testing feature to clarify/surface progress of a test run.

What this PR does / why we need it:
One common frustration of end-users running the e2e suite is that
they take a significant amount of time and it is difficult to
gauge progress.

Even if tailing the logs it can be difficult to see where one
test starts and another ends or understand the if there have been
failures in the past 1h of logs.

This change adds a new custom reporter that prints summary information
as tests complete. This includes the number of tests to run and how
many have been passed/failed/skipped along with which tests have failed.

A new flag can be set which pushes these values to an endpoint. This is
intended for integration with Sonobuoy but any endpoint could consume and
surface this data to the user so they can better understand the state
of the current test run.

Which issue(s) this PR fixes:
Fixes #78535

Special notes for your reviewer:
I've published an image that utilizes this; see the sonobuoy run command below. It is based on the v1.16.0 conformance image but swaps out the e2e.test binary for my own from this PR.

Does this PR introduce a user-facing change?:
Only for testing; does not impact k8s users themselves.

A new flag "progress-report-url" has been added to the test context which allows progress information about the test run to be sent to a webhook. In addition, this information is printed to stdout to aid in users watching the logs.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
Example output in logs:

{"msg":"PASSED [sig-storage] Downward API volume should update annotations on modification [NodeConformance] [Conformance]","total":277,"completed":5,"numFailures":0,"numSkipped":98}

Example command to utilize this with Sonobuoy (will have some more Sonobuoy improvements to make this a default and such, but wanted to show existing value):

sonobuoy run --kube-conformance-image=schnake/conformance:progressv4 --plugin-env e2e.E2E_EXTRA_ARGS="--progress-report-url=http://localhost:8099/progress"

And this leads to the value being readable from sonobuoy status --json:

{"plugins":[{"plugin":"e2e","node":"global","status":"running","result-status":"","result-counts":null,"progress":{"name":"e2e","node":"global","timestamp":"2019-10-29T15:29:48.0908739Z","msg":"PASSED [sig-cli] Kubectl client Kubectl describe should check if kubectl describe prints relevant information for rc and pods  [Conformance]","total":277,"completed":24}}],"status":"running","tar-info":{"name":"","created":"0001-01-01T00:00:00Z","sha256":"","size":0}}

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 29, 2019
@k8s-ci-robot k8s-ci-robot added area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 29, 2019
@johnSchnake
Copy link
Contributor Author

/sig testing
/area e2e-test-framework
/priority important-longterm

  • Marking as import-longterm but I'm hoping to get this merged for the v1.17 release.
  • Technically a feature but is so small and is just a testing "feature" that a KEP does not seem necessary/appropriate.
  • Happy to take feedback about tweaks to the printed data but I'm hoping we don't bikeshed too much at this time. We can tweak this over time if need-be but the basics about how many tests pass/fail are invaluable.

@dims @andrewsykim

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 29, 2019
@neolit123
Copy link
Member

/approve
i like the idea!

@johnSchnake
Copy link
Contributor Author

/retest

@johnSchnake
Copy link
Contributor Author

/assign @andrewsykim

@johnSchnake
Copy link
Contributor Author

/assign @timothysc

@johnSchnake
Copy link
Contributor Author

/retest

@johnSchnake
Copy link
Contributor Author

@hh

// NewProgressReporter returns a progress reporter which posts updates to the given URL.
func NewProgressReporter(progressUpdateURL string) *ProgressReporter {
c := &http.Client{
Timeout: time.Second * 10,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great! But are you thinking about making Timeout configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really anticipate something like that being particularly configurable. It is really just specifying a timeout so it doesn't hang forever. And regardless, the penalty for one slow request is that it gets dropped effectively which is OK IMO for one or two events.

Do you see a reason it should be a higher value? I wouldn't care about increasing it to a high, but reasonable, value.


// NewProgressReporter returns a progress reporter which posts updates to the given URL.
func NewProgressReporter(progressUpdateURL string) *ProgressReporter {
c := &http.Client{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, If the progressUpdateURL is empty, we don't have to create a http client.

@@ -292,6 +295,8 @@ func RegisterCommonFlags(flags *flag.FlagSet) {

flags.BoolVar(&TestContext.ListImages, "list-images", false, "If true, will show list of images used for runnning tests.")
flags.StringVar(&TestContext.KubectlPath, "kubectl-path", "kubectl", "The kubectl binary to use. For development, you might use 'cluster/kubectl.sh' here.")

flags.StringVar(&TestContext.ProgressUpdateURL, "progress-report-url", "", "The URL to POST progress updates to as the suite runs to assist in aiding integrations. If empty, no messages sent.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just a question.
Why don't we name ProgressReportURL instead?
Most test context variables consist of the same words from the option names like the above.
That could be easy to search the code from the option name.

reporter.NumSkipped++
return
default:
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a question: This default case is unexpected, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct; we are just parsing for success/failure/skip. There are other types but they aren't really applicable at this time for how we're using this (https://github.com/onsi/ginkgo/blob/000d317071340972710dc1da9ae3c2c016a7a560/types/types_test.go#L10-L16)

Copy link
Member

@andrewsykim andrewsykim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM and seems useful, should probably be discussed in SIG testing or Testing Commons meeting (again) before approving?

test/e2e/reporters/progress.go Outdated Show resolved Hide resolved
test/e2e/reporters/progress.go Outdated Show resolved Hide resolved
test/e2e/reporters/progress.go Outdated Show resolved Hide resolved
test/e2e/reporters/progress.go Outdated Show resolved Hide resolved
test/e2e/reporters/progress.go Outdated Show resolved Hide resolved
// sendUpdates serializes the current progress and prints it to stdout and also posts it to the configured endpoint if set.
func (reporter *ProgressReporter) sendUpdates() {
b := reporter.serialize()
fmt.Println(string(b))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this log line intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; one of the conditions/goals for doing this came from @timothysc during our testing commons meeting and it was to print out something useful to the user to get the same sort of feedback. Currently, when looking at the logs you can't make much sense of how the run as a whole is going or how close it is to finishing until it is done.

This solves that problem.

@johnSchnake
Copy link
Contributor Author

@andrewsykim Our next meeting for testing commons is on the 11th (this coming monday) correct? If that's the case then I'm OK waiting to merge until then. I brought this up months ago when the ticket was filed but it had come up at the Aug5th meeting (some time ago!) and feedback was positive and I was given the green light to do this as long as I added stdout printing in additino to the webhook.

The only question implementation-wise I have is whether or not I should print the full list of failures to stdout as well. The way the Sonobuoy webhook is right now it expects the full status each time so we need to POST it that way. However, if you have a busted run with hundreds of failures it will definitely be sub-optimal to reprint the failed tests each time. The downside of changing the way its printed in each case is (1) it removes a benefit of this approach where you can grep for one of these progress lines and each one will be self-contained and really tell you the full state of affairs (2) it complicates the code a bit (3) may be less performant if we have to serialize a second time but the items are small most of the time so not a huge deal

One common frustration of end users running the e2e suite is that
they take a significant amount of time and it is difficult to
gauge progress.

Even if tailing the logs it can be difficult to see where one
test starts and another ends or understand the if there have been
failures in the past 1h of logs.

This change adds a new custom reporter which prints summary information
as tests complete. This includes the number of tests to run and how
many have been passed/failed/skipped along with which tests have failed.

A new flag can be set which pushes these values to an endpoint. This is
intended for integration with Sonobuoy but any endpoint could consume and
surface this data to the user so they can better understand the state
of the current test run.

// Stream the progress to stdout and optionally a URL accepting progress updates.
r = append(r, e2ereporters.NewProgressReporter(framework.TestContext.ProgressReportURL))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just skip appending the progress reporter if the URL is empty?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means we also don't log to stdout but maybe that's okay?

Copy link
Member

@andrewsykim andrewsykim Nov 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe logging to stdout should be a separate ginko reporter that's enabled by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the motivation to decouple the URL and stdout printing; all the logic to count/gather/serialize would be the same. Even if we reused this datatype and had one for stdout printing and one for posting to URLs, then the memory footprint and processing overhead would be double (one for each of the reporters).

What is the desired endgoal; you want the ability to disable this functionality?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- it sounds like the new progress reporter is actually doing two things: reporting progress via stdout and reporting progress by sending a POST request to a URL. Based on how ginkgo deals with custom reports, it sounds like they should be decoupled. Reporting to stdout can be on by default but still optional.

I won't block on this though but something we should consider in the future.

@johnSchnake
Copy link
Contributor Author

I refreshed the image just to be sure we were testing the latest code; you can run:

sonobuoy run --plugin e2e --kube-conformance-image=schnake/conformance:demo --plugin-env e2e.E2E_EXTRA_ARGS="--progress-report-url=http://localhost:8099/progress" --image-pull-policy=IfNotPresent

to see it in action. Then wait a bit and start watching sonobuoy status --json to see the progress.

@andrewsykim
Copy link
Member

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 11, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewsykim, johnSchnake, neolit123

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 11, 2019
// As new tests are completed it updates the values and prints them to stdout and optionally, sends the updates
// to the configured URL.
type ProgressReporter struct {
LastMsg string `json:"msg"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a future PR, I think a LastReported timestamp would be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improved test progress reporting
7 participants