Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change sidecars kep to implementable #1109

Merged

Conversation

Joseph-Irving
Copy link
Member

This PR moves the sidecars KEP #753 to be implementable. Allowing us to progress with developing the feature for 1.16.

If anyone thinks something is missing from the proposal now's the time to discuss it.

/sig apps
/sig node
/assign @kow3ns @enisoc

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 19, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @Joseph-Irving. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 19, 2019
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Jun 19, 2019
@Joseph-Irving Joseph-Irving mentioned this pull request Jun 19, 2019
11 tasks
@Joseph-Irving
Copy link
Member Author

@derekwaynecarr @dchen1107 if you have any comments from the sig-node side that would be appreciated

@janetkuo
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 24, 2019
@janetkuo
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 24, 2019
@sjenning
Copy link
Contributor

I feel compelled to go on record saying I'm not in favor of complicating the pod lifecycle any more than it already is, especially without good reason. The use cases put forth is in the KEP are not strong enough to justify platform level support in my opinion.

The whole thing basically distils to "I want to control the start and stop sequence of the containers in the pod".

The KEP throws shade on the idea of wrapping the dependent application in a script that will delay the start of the main application until the sidecar is operational, but that is a perfectly valid solution. A similar setup can be used to wrap the sidecar application on the termination side.

We, to this day, are still fixing bugs related to init containers kubernetes/kubernetes#78328 As someone whose team supports the kubelet for a large customer base, this is something I fear will cause a lot of fallout over a long period of time and set a precedent for further lifecycle types (i.e. complications) in the future.

I recognize that a lot of effort has gone into the KEP already and I thought I might come around to the idea if I thought about it for long enough, but I haven't.

@Joseph-Irving
Copy link
Member Author

@sjenning, I must respectfully disagree you. This has been a long requested feature for kubernetes, as the sidecar pattern becomes increasingly common with things like services meshes it's becoming more apparent that this problem could do with an official solution.
Various companies are inventing their own similar but different workarounds to these issues, surely instead of expecting everyone to solve a common problem independently, it would be better if we could offer one simple solution that can everyone can use?
The user experience of adding a sidecar to your pod, realising things now don't work as you expected, doing some googling, reading that you need to add bash wrappers, google what a bash wrapper is, realising you have no shell in your containers, rebuilding your containers, rinse and repeat for every single service. Is not that great.

I'm aware that init containers caused some issues with their implementation, but this is not init containers, it's a significantly smaller change. We're not adding an entirely new set of containers that behave very differently, it's just modifying the startup/shutdown behaviour of normal containers. I would hope the the lessons learned from init containers would allow us to make sure we don't make the same mistakes again instead of being worried about repeating it. This would be going in as alpha and behind a feature-gate, so I would hope that we would have a very good degree of confidence, due to good test coverage as well as end user testing, before we promote this for general consumption. The last thing anyone wants is for users to be impacted by a bad new feature in Kubernetes.

@derekwaynecarr
Copy link
Member

derekwaynecarr commented Jun 26, 2019

An issue recently discussed in sig-node was related to handling graceful termination when a machine is powered off. Typically, static pods and/or daemonsets may not be drained during node maintenance. As a result, these pods are not actually terminated via normal drain and instead we delegate to systemd (on Linux) to handle termination. Right now, we have a bug in the kubelet/CRI where we are unable to handle termination periods that differ from systemd defaults on the host during poweroff of a machine because we did not provide the right information on start container calls assuming the kubelet would always be there to handle stop container calls (which isn’t the case on shutdown). It is probably worth calling out in this kep that there will be a similar limitation on powerdown that termination ordering is not guaranteed. This is important on nodes where the kubelet and/or container runtime may have crashed and/or may not be functional. I don’t think this is a blocker to this KEP, but it is the type of thing we discover after the fact. In this case , we know we are unable to do this now, so we should make sure to track it and account for it in the KEP and documentation for when the lifecycle guarantees are not satisfiable.

@derekwaynecarr
Copy link
Member

note: edited my prior comment to make clear that we are not able to enforce termination sequences when a host is powered off.

@derekwaynecarr
Copy link
Member

right now, the CRI implementer is unable to know if a container is or is not a sidecar to try to setup any shutdown ordering semantics as well. We may want to think what we could do there upfront to see if we could handle poweroff.

@derekwaynecarr
Copy link
Member

since a sidecar cold stop/start multiple times over life of a pod whose restart policy is setup appropriately the ordering guarantees are set in stone so hopefully folks can account for that when using the pattern.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 26, 2019
@Joseph-Irving
Copy link
Member Author

@derekwaynecarr, I've added a section to the Risks and Mitigations summarising your comments, please let me know if I've misinterpreted you or there's anything else I should add.

@derekwaynecarr
Copy link
Member

/approve

thank you for the clarification text. given this has API review, I think it’s fine to explore the alpha phase of the feature and learn more during implementation.

I agree with @sjenning that we need to be careful about implementation to mitigate the challenges we had with init containers. I also think we should not give any precedence to side car containers when handling in place pod resizing versus normal containers.

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 27, 2019
@derekwaynecarr
Copy link
Member

FYI @mrunalp if you have thoughts about sidecar containers, shutdown ordering, and enhancing the CRI so it could possibly know more in this area.

@janetkuo
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 28, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, janetkuo, Joseph-Irving

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit b3554aa into kubernetes:master Jun 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants