Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could leak RCs/pods if there's an error communicating with the apiserver -- is that something that could break the other tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's a problem communicating with the apiserver then there would likely be an rc and its pods still in the system. That could definitely impact other tests. I'm not sure there's much we can do though. If the apiserver is unresponsive I'm not sure how we clean up after the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retries would be the typical answer, but I haven't checked how much we use retries in our e2e tests. If you give a request 3 tries to succeed instead of 1, flakes are less likely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A retry might work but without a real world case to test against finding how many attempts and how long to wait between, or even if it will help, are all guesses. I can add retries but I wouldn't be able to verify they'd actually solve a problem. I'm also a little reluctant to cover up a communication problem. The apiserver is going to have to be responsive under load.
I don't recall there being a lot of retries for operations in the e2e tests. That is also probably because not many would stress the system to the point of causing communication timeouts like this test suite could.
ATM this test is disabled and won't be run unless explicitly enabled because of the nature of the test. This really belongs in a performance test suite.