-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Make testing.StartTestServer close cleanly #50690
[WIP] Make testing.StartTestServer close cleanly #50690
Conversation
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Hi @frobware. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: frobware Assign the PR to them by writing Associated issue: 49489 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Instead of injecting a stop channel through all these constructors (which then inject the channel into various server structs which have Run methods which accept stop channels), is there a way to pass through the stop channel from the outermost Run invocations? |
@ironcladlou The trouble I had or have with that is the place that can hold the channel is in various Config types. To me there is a difference between what is effectively static configuration that could be reused to create another server and the channel which is most definitely active. Having said that, some configuration values, if applied to create a new server, could cause creation to fail (e.g., bind port). |
As far as I can tell, all this wiring to add state to the completedConfig is so that GenericAPIServer.installAPIResources can call Destroy on storage instances... It's not clear to me why NonBlockingRun (which has the stop channel via Run) can't do the teardown? It seems incredibly strange to add the stop channel (which relates only to execution as pertains to calls to Run) to the config state and to make it a creation dependency. I still maintain the stop channel should propagate via (and ONLY via) Run, and if there's some places that breaks down, we should look very closely at those cases because there's probably some other refactoring which needs done. |
9ed4f30
to
22b8d5a
Compare
I was trying to avoid adding state to the GenericAPIServer. We can certainly call an additional |
If calling Seems to me that any stateful component which allows registration/installation of stuff for which the component controls the lifecycle independently should also support de-registration/uninstallation and also cascading destruction said stuff. On that note, is GenericAPIServer really intended to control the lifecycle of Stores? Are the Stores we're explicitly shutting down shared with other components? I'm finding it really difficult to understand the actual lifecycle of most components being wired around the system. I wonder if this change makes the lifecycle more or less opaque. 😟 |
I don't agree with that. A stop channel is essentially a context. Passing a context during creation is a good and established pattern. Moreover, we use it everywhere. Introducing another pattern for this purpose for Stores feels wrong. Moreover, our plumbing is aligned along creation only. I don't want to double the complexity by adding shutdown logic. A context perfectly merges those two goals. |
/ok-to-test |
I did experiment with passing a stop channel all the way through to the stores but it was significantly more intrusive: frobware@2e045be |
/cc @deads2k PTAL. Thanks. |
Most of the rest of our code takes a stop channel on a |
I would agree with @sttts if creation-based context were the actual pattern employed throughout the codebase. Instead, I have seen context only applied via Run-type methods post-creation. I have no objection to switching paradigms if everybody else is fine with having a mixture of patterns. (Although not everybody would agree that passing context around is a good idea in general.)
I think you're right about the watch cache at least. |
22b8d5a
to
71107c5
Compare
/retest |
Our mixtures in the apiserver is more like "using a stopCh" vs. "leak running go-routine". I agree with @deads2k that @deads2k brought up the idea to port @smarterclayton's integration test runner which launches one process per test. This would solve our problem as well as we can leak whatever we want without consequences. Is this something we can get soonish (= within a few weeks?). If not, I would prefer the solution here in the PR. It's not perfect, a bit ugly, but good enough and exists now. The additional complexity is very limited, and it's forgiving in the sense if a destruction chain is slightly it wrong won't kill us, only leave some garbage.
The author has no solution either, at least none on-top of the language. I agree that Go should have better support for process trees and partial shutdown of those. But it hasn't in 1.x. So a context is the best we can get as a pattern now. |
Thanks for bringing that up. To be honest, I have more confidence in the isolated test process approach than our ability to get graceful shutdown working (and keep it working) across the board. I don't know that graceful shutdown is something anybody even cares about outside a test context. I'd almost rather see this PR replaced with the per-process test runner if there was widespread acceptance of the idea. |
When I was looking at this originally I measured the startup time of the server to be around 8s (on my hardware). |
@frobware: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@frobware PR needs rebase |
This PR hasn't been active in 90 days. Closing this PR. Please reopen if you would like to work towards merging this change, if/when the PR is ready for the next round of review. You can add 'keep-open' label to prevent this from happening again, or add a comment to keep it open another 90 days |
@frobware are you still around out there somewhere or should I take this and try to drive it forward? |
I am, but working on other things ATM. Feel free to drive forward. Thanks. |
@MHBauer and assign me for review or ping me for discussion. Am too busy to drive this myself, but am happy to support with review, opinion and direction as far as I can. |
This PR is still mentioned in a comment in the code, but it's closed - is anyone going to finish this? |
bump |
FYI - I'm resurrecting this PR in #109303 |
What this PR does / why we need it:
This PR ensures that the test apiserver closes cleanly. Without this
change there are many repeated reconnection attempts to etcd, at
sub-second intervals, accompanied by a lot of log spam indicating that
the connection could not be made. This also results in the
accumulation of many 1000s of goroutines and they in turn prevent
effective use of the test server across multiple test functions within
the same process.
This PR introduces Storage.Destroy() and the test server now closes
all its stores when it is being shutdown.
Prior to this change there are ~1500+ goroutines remaining after the
server stops. With the change there are ~200 remaining; this is a
stepping-stone on the way to reducing that further.
Which issue this PR fixes
Fixes #49489
Special notes for your reviewer:
Release note: