-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Label flaky e2es with [Flaky] & slow tests with [Slow] #19021
Conversation
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. |
Labelling this PR as size/M |
GCE e2e build/test failed for commit d7e1a8604f2392ab8a2b6091ed3638c8a394dd49. |
@spxtr this is ready for review. |
GCE e2e test build/test passed for commit 436574f4233ef7ba0de5fc522cec00ba11f61df5. |
GCE e2e test build/test passed for commit e96a00225b9eb4f33f0df32f91fb92ec88ebeb84. |
GCE e2e test build/test passed for commit 5ee8850. |
LGTM but I haven't tested it. |
@k8s-oncall: currently the PR builder is totally shot because stuff I changed in #18900 was coupled with these skip lists. I'm going to manually merge this so that hopefully we can get back to where we were. |
Label flaky e2es with [Flaky] & slow tests with [Slow]
Squash? |
@mikedanese I didn't want to squash because I wanted it to be clear what I had done, (in particular, the last two commits I wanted to keep clearly apart, since they're fairly invasive). |
Why is this preferable to the FLAKY env vars. Those are much easier to track. I would prefer if we keep the test labels to invariant properties of tests |
Why are they much easier to track? I (and perhaps I'm the exception) think this is easier, because this sets the metadata about the test in the test, rather than in some random file in
I see flakiness as an invariant property of a test; if we have to change the test or the code proper to make the test not flaky, that PR should similarly update the label to call it not flaky anymore. |
@quinton-hoole had a comment on a related issue, following up here.
Looking at this PR, actually, the partitioning of flaky tests isn't/wasn't a function of environment: everything just skipped I'm trying to reduce the number of dimensions by which we partition the e2e tests so that we can more sanely keep track of what's running where. I'm going to work on labels for |
@ihmccreery What you're seeing is in fact a form of inheritance, and does in fact make sense if you think about it a bit. GCE_FLAKY_TESTS are the tests that are flaky on GCE, whether they run in serial, parallel, GKE, non-GKE etc. So that's why you see them being inherited. Make sense? |
@ihmccreery what would perhaps make sense is to restructure the inheritance hierarchy something like the following:
etc. |
To tell the truth there is a form of inheritance, but it comes from laziness. I created GKE_FLAKY suite when we started to have tests that are stable on GCE and very flaky on GKE. There was no well defined semantics, and we used GCE_FLAKY as things that are generally flaky, and GKE_FLAKY for things that are flaky on GKE (i.e. all the rules for GKE tests were adding those two sets, and GCE tests were using GCE only list). I see that @quinton-hoole responded as well:) |
tl;dr: @quinton-hoole you're correct, but I think it falls into YAGNI. @quinton-hoole your proposal seems to me, but, like I said, it's an added level of complexity in an already (IMO) overly-complex system of partitioning tests. In my (limited) experience, our team hasn't really experienced any pain from not being able to label where precisely a test is an isn't flaky—indeed, we don't even really have great tooling to determine such a thing, even if we could label it. On the flip side, we've experienced quite a lot of pain from trying to manage a very-complex system of partitioning tests, not having tests running in certain places, etc. I propose we keep a single, monolithic |
@ihmccreery - we already had GKE only flaky tests. What do you suggest we do with them - do we keep them running and block our build, turn them off and miss a possible regression, or don't block on GKE suite failures? |
No, the GKE flaky tests were a subset of the GCE flaky tests, so the skipped tests we actually identical in both environments. I'm suggesting that if a test is flaky in GKE, mark it as |
(This is exactly the kind of confusion I'm trying to avoid.) |
@ihmccreery The existing regex scheme was put in place precisely because we needed it to disable tests that only flaked when run in parallel, or only when run on GKE, or only when run on AWS etc. You can collapse that all into [Flakey], but you will lose a lot of fidelity if you do. @ixdy to continue this review, as I'm not supposed to stick my nose into testing stuff :-) |
At the time That said, this test was on the tl;dr I agree with @ihmccreery that we probably don't need to worry about this, at least not right now. |
I'm OK with removing GKE_FLAKY, as it's not very useful now. OTOH PARALLEL_FLAKY is still a thing. If I understand your proposal right, you want to have a single suite of 'flaky' tests, running exactly as they do in 'normal' suite (e.g. in parallel). It's a valid idea, as it puts a bit more pressure on test owners. What's more it will require that tests will be better written - I'm only afraid about the engineering cost of fixing existing ones... So generally - after some thought I actually like the idea of a single "flaky" suite. The main drawback I see is the cost of fixing things that are currently flaky when run in parallel. @ihmccreery - I'm a huge fan of this effort and your work. |
Continued work on #10548.
Notably, I'm collapsing:
GKE_FLAKY_TESTS
GCE_FLAKY_TESTS
GCE_PARALLEL_FLAKY_TESTS
into one label,
[Flaky]
. If a test is flaky in the parallel run, then perhaps it's disruptive and should be marked as such, or it's flaky. (There was one set of tests—DaemonRestart
—inGCE_PARALLEL_FLAKY_TESTS
already in[Disruptive]
, so I just kicked it out of flaky entirely.)