Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

writing-good-e2e-tests.md: polling, contexts and failure messages #7021

Merged
merged 1 commit into from Mar 1, 2023

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Jan 2, 2023

This reflects the recent changes that came with Ginkgo v2. The goal is to get this reviewed and then send out an email to Kubernetes-Dev to make developers aware of these changes.

/sig testing
/cc @aojea @onsi

@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Jan 2, 2023
@k8s-ci-robot
Copy link
Contributor

@pohly: GitHub didn't allow me to request PR reviews from the following users: onsi.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

This reflects the recent changes that came with Ginkgo v2. The goal is to get this reviewed and then send out an email to Kubernetes-Dev to make developers aware of these changes.

/sig testing
/cc @aojea @onsi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/developer-guide Issues or PRs related to the developer guide size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 2, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 2, 2023
```

On the other hand, overly verbose logging, particularly of non-error conditions, can make
it unnecessarily difficult to figure out whether a test failed and if
so why? So don't log lots of irrelevant stuff either.

Except for this special case, using Gomega assertions directly is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that there is wide consensus on using Gomega, I personally prefer to use stdlib directly, and I know some other contributors too

if !strings.Contains(actualStr, expectedSubStr) {
   framework.Failf("expected %s to contain %s",actualStr, expectedSubStr) 
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you expand on this later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this already shows how hand-written code can be problematic: is the failure message still readable when actualStr is empty? When using %q, how readable is it for multi-line strings?

The other problem is inconsistency.

I understand that we cannot require the usage of gomega, but I still think that we should encourage it where it makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to write about kubernetes/kubernetes#113298 here, which implies that we need to get that merged first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added something about that.

@pohly pohly force-pushed the testing-e2e-guidelines branch 2 times, most recently from aaa2a3d to 0566459 Compare January 3, 2023 07:49

#### Polling and timeouts ####

When waiting for something to happen, use a reasonable timeout. Without it, a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should encourage to use the framework timeouts https://github.com/kubernetes/kubernetes/blob/1edbb8cf1a864c2dd71a1e5b7282de4bc3d8f5fa/test/e2e/framework/timeouts.go#L21-L39 so we have consistency on the operations.

We also have to set some upper bounds, is not ok that a test takes more than 5 minutes to run a pod , per example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that, but it leads to another can of worms: if we recommend using NewTimeoutContextWithDefaults, what should we tell developers when they need a timeout that isn't covered by the existing ones? Should such timeouts be configurable? How?

The ones referenced above are not configurable. Others are.

It's a mess 😭

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then there are timeouts hard-coded even inside the framework:

https://github.com/kubernetes/kubernetes/blob/1edbb8cf1a864c2dd71a1e5b7282de4bc3d8f5fa/test/e2e/framework/pod/wait.go#L41-L65

Did I mention "mess" already?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but the common timeouts like podStart and podDelete should be the same, those has demonstrated to be useful to surface CRI regressions ... also, people tend to increase timeouts to infinite instead of debugging, and that use to hide underly issues ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubernetes/kubernetes#114783 is my attempt to clean this up. If we can agree on that and merge it first, then I am more comfortable with pointing developers towards TimeoutContext.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@aojea
Copy link
Member

aojea commented Jan 3, 2023

I'm a bit worried about the contexts sections, I can see how some users can shoot themselves in the foot, but I can't see how that can be clearer, context cancellation is complex 🤷

Let's wait for Ben and Aaron opinion too

/assign @spiffxp @BenTheElder

@pohly
Copy link
Contributor Author

pohly commented Jan 3, 2023

I'm a bit worried about the contexts sections, I can see how some users can shoot themselves in the foot

The tests fail fairly reliably when someone gets it wrong, so as long as the tests are actually run it's not that bad.

@jberkus
Copy link
Contributor

jberkus commented Jan 7, 2023

@pohly when this is ready to go, can we do a blog post on the contributor blog as well?

@pohly
Copy link
Contributor Author

pohly commented Jan 7, 2023

can we do a blog post on the contributor blog as well?

My intention was to send an email to kubernetes-dev, but I can also do a blog post. Or perhaps both?

by https://go.k8s.io/triage.

A good failure message:
- identifies the test failure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we say whether text should not needs to include names of dynamic properties like pod names

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment actually conflicts with the later suggestion:

It is good practice to include details like the object that failed some assertion in the failure message because

it seems like aggregation handles it all right. You can disregard the comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Aggregation simplifies the failure message. I have added a paragraph with a link to the relevant code.

- identifies the test failure
- has enough details to provide some initial understanding of what went wrong

"Timeout" is not a useful error
message. "Timed out after 60 seconds waiting for pod xxx to enter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this example does pod xxx contains the unique pod name? WOuld it make it harder to aggregate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@pohly pohly changed the title RFC: writing-good-e2e-tests.md: polling, contexts and failure messages writing-good-e2e-tests.md: polling, contexts and failure messages Feb 16, 2023
@pohly
Copy link
Contributor Author

pohly commented Feb 16, 2023

I've pushed an update that I consider complete at this time.

@aojea: please take another look.

@@ -38,17 +52,288 @@ like the following generates rather useless errors:
Expect(err).NotTo(HaveOccurred())
```

Rather
Errors returned by client-go can be very detailed. A better way to test for
errors is with `framework.ExpectNoError` because it logs the full error and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see , this seems to be the only helper not wrapping gomega
kubernetes/kubernetes#115961

`framework.ExpectEqual` is not useful because dumping the actual and expected
value just distracts from the underlying failure reason.

Dumping structs with `format.Object` is recommended. Starting with Kubernetes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean for an user? can we add an example of how to use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.


When aborting a manual `gingko ./test/e2e` invocation with CTRL-C or a signal,
the currently running test(s) should stop immediately. This gets achieved by
accepting a `ctx context.Context` as first parameter in the Ginkgo callback
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is by far the best addition ❤️


There are some gotchas:

- Don't use the `ctx` passed into `ginkgo.It` in a `ginkgo.DeferCleanup`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that was a bit of a footgun ...

and passed in.

The E2E framework therefore uses a different approach:
- [`framework.Gomega()`](https://github.com/pohly/kubernetes/blob/222f65506252354da012c2e9d5457a6944a4e681/test/e2e/framework/expect.go#L80-L101)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if you meant to use your fork here, maybe it is because it was not merged at the time of writing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably - fixed.

[`gomega.Eventually`](https://pkg.go.dev/github.com/onsi/gomega#Eventually)
satisfies all of these criteria and therefore is recommended, but not required.
In https://github.com/kubernetes/kubernetes/pull/113298,
[test/e2e/framework/pods/wait.go](https://github.com/pohly/kubernetes/blob/222f65506252354da012c2e9d5457a6944a4e681/test/e2e/framework/pod/wait.go)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as in https://github.com/kubernetes/community/pull/7021/files#r1121389210, this url is from your fork

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines +317 to +348
Some of the E2E framework sub-packages have helper functions that wait for
certain domain-specific conditions. Currently most of these functions don't
follow best practices (not using gomega.Eventually, error messages not very
informative). [Work is
planned](https://github.com/kubernetes/kubernetes/issues/106575) in that
area, so beware that these APIs may
change at some point.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is true but I'm afraid we leave stale information forever, but considering the pace we move I think that it may be ok :)

area, so beware that these APIs may
change at some point.

- Use `gomega.Consistently` to ensure that some condition is true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great advice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading it again and considering a PR that I just reviewed, this statement might be interpreted as doing something like Eventually(func() bool {...}).Should(BeTrue()). That's the "check boolean" anti-pattern.

I'll rephrase...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, examples that people can copy paste is the best solution for documentation ... see stackoverflow 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extended and linked to an example in the previous section.

- Both `gomega.Consistently` and `gomega.Eventually` can be aborted early via
[`gomega.StopPolling`](https://onsi.github.io/gomega/#bailing-out-early---polling-functions).

- Avoid polling with functions that don't take a context (`wait.Poll`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add the linter for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aojea
Copy link
Member

aojea commented Mar 1, 2023

LGTM

just the comments about the URL pointing to your repo @pohly , I can LGTM if you want after it

This reflects the recent changes that came with Ginkgo v2.
@aojea
Copy link
Member

aojea commented Mar 1, 2023

/lgtm

Thanks

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 1, 2023
@k8s-ci-robot k8s-ci-robot merged commit 665eecf into kubernetes:master Mar 1, 2023
pohly added a commit to pohly/contributor-site that referenced this pull request Apr 5, 2023
"Writing good E2E tests" was already updated a while ago in
kubernetes/community#7021. As suggested
there (kubernetes/community#7021 (comment)),
we should bring this update to the attention of more contributors, hence this
blog post.
pohly added a commit to pohly/contributor-site that referenced this pull request Apr 9, 2023
"Writing good E2E tests" was already updated a while ago in
kubernetes/community#7021. As suggested
there (kubernetes/community#7021 (comment)),
we should bring this update to the attention of more contributors, hence this
blog post.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/developer-guide Issues or PRs related to the developer guide cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants