Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Label flaky e2es with [Flaky] & slow tests with [Slow] #19021

Merged
merged 4 commits into from Dec 22, 2015
Merged

Label flaky e2es with [Flaky] & slow tests with [Slow] #19021

merged 4 commits into from Dec 22, 2015

Conversation

ikehz
Copy link
Contributor

@ikehz ikehz commented Dec 22, 2015

Continued work on #10548.

Notably, I'm collapsing:

  • GKE_FLAKY_TESTS
  • GCE_FLAKY_TESTS
  • GCE_PARALLEL_FLAKY_TESTS

into one label, [Flaky]. If a test is flaky in the parallel run, then perhaps it's disruptive and should be marked as such, or it's flaky. (There was one set of tests—DaemonRestart—in GCE_PARALLEL_FLAKY_TESTS already in [Disruptive], so I just kicked it out of flaky entirely.)

@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.

@k8s-github-robot
Copy link

Labelling this PR as size/M

@k8s-github-robot k8s-github-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 22, 2015
@ikehz ikehz assigned spxtr and unassigned ixdy Dec 22, 2015
@k8s-bot
Copy link

k8s-bot commented Dec 22, 2015

GCE e2e build/test failed for commit d7e1a8604f2392ab8a2b6091ed3638c8a394dd49.

@ikehz ikehz changed the title [WIP] Label flaky e2es with [Flaky] & slow tests with [Slow] Label flaky e2es with [Flaky] & slow tests with [Slow] Dec 22, 2015
@ikehz
Copy link
Contributor Author

ikehz commented Dec 22, 2015

@spxtr this is ready for review.

@ikehz ikehz added cla: yes and removed cla: no labels Dec 22, 2015
@k8s-bot
Copy link

k8s-bot commented Dec 22, 2015

GCE e2e test build/test passed for commit 436574f4233ef7ba0de5fc522cec00ba11f61df5.

@k8s-bot
Copy link

k8s-bot commented Dec 22, 2015

GCE e2e test build/test passed for commit e96a00225b9eb4f33f0df32f91fb92ec88ebeb84.

@k8s-bot
Copy link

k8s-bot commented Dec 22, 2015

GCE e2e test build/test passed for commit 5ee8850.

@spxtr spxtr added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 22, 2015
@spxtr
Copy link
Contributor

spxtr commented Dec 22, 2015

LGTM but I haven't tested it.

@ikehz
Copy link
Contributor Author

ikehz commented Dec 22, 2015

@k8s-oncall: currently the PR builder is totally shot because stuff I changed in #18900 was coupled with these skip lists. I'm going to manually merge this so that hopefully we can get back to where we were.

ikehz pushed a commit that referenced this pull request Dec 22, 2015
Label flaky e2es with [Flaky] & slow tests with [Slow]
@ikehz ikehz merged commit eeb727a into kubernetes:master Dec 22, 2015
@mikedanese
Copy link
Member

Squash?

@ikehz
Copy link
Contributor Author

ikehz commented Dec 22, 2015

@mikedanese I didn't want to squash because I wanted it to be clear what I had done, (in particular, the last two commits I wanted to keep clearly apart, since they're fairly invasive).

@mikedanese
Copy link
Member

Why is this preferable to the FLAKY env vars. Those are much easier to track. I would prefer if we keep the test labels to invariant properties of tests

@ikehz
Copy link
Contributor Author

ikehz commented Dec 26, 2015

Those [env vars] are much easier to track.

Why are they much easier to track? I (and perhaps I'm the exception) think this is easier, because this sets the metadata about the test in the test, rather than in some random file in hack/. Also, it's a label rather than a regular expression, which can cause all kinds of problems (matching things that it shouldn't match, matching things that don't even exist, etc.).

I would prefer if we keep the test labels to invariant properties of tests

I see flakiness as an invariant property of a test; if we have to change the test or the code proper to make the test not flaky, that PR should similarly update the label to call it not flaky anymore.

@ikehz
Copy link
Contributor Author

ikehz commented Jan 4, 2016

@quinton-hoole had a comment on a related issue, following up here.

@ihmccreery How do you deal with the case where a given test is flaky in one environment and not others. Stated another way, flakyness is a function of (test, environment) as per the existing regex scheme, not just (test).

Looking at this PR, actually, the partitioning of flaky tests isn't/wasn't a function of environment: everything just skipped GCE_FLAKY_TESTS, and GKE_FLAKY_TESTS was just a redundant subset. So I didn't take flakiness as a function of (test, environment) into account.

I'm trying to reduce the number of dimensions by which we partition the e2e tests so that we can more sanely keep track of what's running where. I'm going to work on labels for [SkipIfEnv:GKE], e.g., and we could do something similar for flakes (this work needs to be combined with a refactoring of the Go proper calls to SkipIfProviderIs). I'm not really opposed to having env-specific skip lists, just that as the project stands now, it wasn't something it seemed like we needed, and was yet another added layer of complexity to how we partition tests.

@ghost
Copy link

ghost commented Jan 4, 2016

@ihmccreery What you're seeing is in fact a form of inheritance, and does in fact make sense if you think about it a bit. GCE_FLAKY_TESTS are the tests that are flaky on GCE, whether they run in serial, parallel, GKE, non-GKE etc. So that's why you see them being inherited. Make sense?

@ghost
Copy link

ghost commented Jan 4, 2016

@ihmccreery what would perhaps make sense is to restructure the inheritance hierarchy something like the following:

  • FLAKY_TESTS (tests that flake everywhere - the test or system is fundamentally broken)
    • FLAKY_GCE_TESTS (these flake on GCE, but not necessarily other cloud providers)
      • FLAKY_GCE_PARALLEL_TESTS (these fail when run in parallel on GCE)
    • FLAKY_AWS_TESTS
      • FLAKY_AWS_PARALLEL_TESTS

etc.

@gmarek
Copy link
Contributor

gmarek commented Jan 4, 2016

To tell the truth there is a form of inheritance, but it comes from laziness. I created GKE_FLAKY suite when we started to have tests that are stable on GCE and very flaky on GKE. There was no well defined semantics, and we used GCE_FLAKY as things that are generally flaky, and GKE_FLAKY for things that are flaky on GKE (i.e. all the rules for GKE tests were adding those two sets, and GCE tests were using GCE only list).

I see that @quinton-hoole responded as well:)

@ikehz
Copy link
Contributor Author

ikehz commented Jan 4, 2016

tl;dr: @quinton-hoole you're correct, but I think it falls into YAGNI.

@quinton-hoole your proposal seems to me, but, like I said, it's an added level of complexity in an already (IMO) overly-complex system of partitioning tests. In my (limited) experience, our team hasn't really experienced any pain from not being able to label where precisely a test is an isn't flaky—indeed, we don't even really have great tooling to determine such a thing, even if we could label it. On the flip side, we've experienced quite a lot of pain from trying to manage a very-complex system of partitioning tests, not having tests running in certain places, etc.

I propose we keep a single, monolithic [Flaky] label, until we find that we actually need to have more fine-grained control. Certainly when folks are filing bugs for flaky tests, they should mention what environments the flakiness occurs.

@gmarek
Copy link
Contributor

gmarek commented Jan 4, 2016

@ihmccreery - we already had GKE only flaky tests. What do you suggest we do with them - do we keep them running and block our build, turn them off and miss a possible regression, or don't block on GKE suite failures?

@ikehz
Copy link
Contributor Author

ikehz commented Jan 4, 2016

We already had GKE only flaky tests.

No, the GKE flaky tests were a subset of the GCE flaky tests, so the skipped tests we actually identical in both environments.

I'm suggesting that if a test is flaky in GKE, mark it as [Flaky]. That will move it to the flaky suites (GKE, GCE, AWS, etc.) until it's fixed.

@ikehz
Copy link
Contributor Author

ikehz commented Jan 4, 2016

No, the GKE flaky tests were a subset of the GCE flaky tests, so the skipped tests we actually identical in both environments.

(This is exactly the kind of confusion I'm trying to avoid.)

@ghost
Copy link

ghost commented Jan 4, 2016

@ihmccreery The existing regex scheme was put in place precisely because we needed it to disable tests that only flaked when run in parallel, or only when run on GKE, or only when run on AWS etc. You can collapse that all into [Flakey], but you will lose a lot of fidelity if you do.

@ixdy to continue this review, as I'm not supposed to stick my nose into testing stuff :-)

@ixdy
Copy link
Member

ixdy commented Jan 4, 2016

At the time GKE_FLAKY_TESTS was introduced (#17821), it contained a test ("KubeProxy\sshould\stest\skube-proxy") which was not on the GCE_FLAKY_TESTS list.

That said, this test was on the GCE_SLOW_TESTS list, and thus basically wasn't running in any of the GCE builds. And it was added to GCE_FLAKY_TESTS almost immediately afterwards in #17831.

tl;dr I agree with @ihmccreery that we probably don't need to worry about this, at least not right now.

@gmarek
Copy link
Contributor

gmarek commented Jan 5, 2016

I'm OK with removing GKE_FLAKY, as it's not very useful now.

OTOH PARALLEL_FLAKY is still a thing. If I understand your proposal right, you want to have a single suite of 'flaky' tests, running exactly as they do in 'normal' suite (e.g. in parallel). It's a valid idea, as it puts a bit more pressure on test owners. What's more it will require that tests will be better written - I'm only afraid about the engineering cost of fixing existing ones...

So generally - after some thought I actually like the idea of a single "flaky" suite. The main drawback I see is the cost of fixing things that are currently flaky when run in parallel.

@ihmccreery - I'm a huge fan of this effort and your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test area/test-infra lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants