Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCTP support implementation for Kubernetes #64973

Merged
merged 9 commits into from Aug 28, 2018

Conversation

@janosi
Copy link
Contributor

commented Jun 11, 2018

What this PR does / why we need it: This PR adds SCTP support to Kubernetes, including Service, Endpoint, and NetworkPolicy.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #44485

Special notes for your reviewer:

Release note:


SCTP is now supported as additional protocol (alpha) alongside TCP and UDP in Pod, Service, Endpoint, and NetworkPolicy.  

@janosi

This comment has been minimized.

Copy link
Contributor Author

commented Jun 11, 2018

/assign @thockin

@thockin

This comment has been minimized.

Copy link
Member

commented Jun 11, 2018

I know precious little about SCTP - I have no hands-on experience. @kubernetes/sig-network-feature-requests I do know, for example that GCP's load-balancer only supports SCTP in single-endpoint mode: https://cloud.google.com/compute/docs/load-balancing/network/forwarding-rules

So at a minimum, all of the cloud implementations need to be checked for compatibility and disabled when not compatible.

This should probably start as a proposal to sig-network to gather context (like the above GCP point) and then become a KEP for broader review.

@janosi

This comment has been minimized.

Copy link
Contributor Author

commented Jun 11, 2018

@thockin Do I understand right, that you would like us to go back to point 0 and start with a sig-network proposal first, and then after some criteria are met we should create a KEP? In order to check that if a cloud provider API gets an unsupported protocol value then it answers with an error and it is handled right in the relevant cloud provider logic of k8s?

@janosi

This comment has been minimized.

Copy link
Contributor Author

commented Jun 12, 2018

@thockin
I have checked now the EnsureLoadBalancer implementations of the different cloud providers (pkg/cloudprovider/providers)
AWS: it accepts only "TCP" as Protocol. Otherwise it returns with error.

if port.Protocol != v1.ProtocolTCP {

Azure: it accepts only "TCP" or "UDP". Otherwise it returns with error.
return &transportProto, &securityProto, &probeProto, fmt.Errorf("only TCP and UDP are supported for Azure LoadBalancers")

CloudStack: it accepts only "TCP" or "UDP". Otherwise it returns with error.
return nil, fmt.Errorf("unsupported load balancer protocol: %v", port.Protocol)

GCE: it allows only "TCP" or "UDP" for external LB. Otherwise it returns with error. For internal LB there is no explicit protocol check.
if ports[0].Protocol != v1.ProtocolTCP && ports[0].Protocol != v1.ProtocolUDP {

OpenStack: my understanding is, that OpenStack supports SCTP, so I enabled SCTP in the OpenStack cloudprovider code with this PR
oVirt, Photon, vSphere: these require the registration of a "cloud provider" and it is then up to the "LoadBalancer" implementation of the cloud provider how it handles the different protocols.

pkg/controller/service/service_controller.go also have a nice warning at the point where it calls the EnsureLoadBalancer() function of the relevant cloud provider:
" // - Not all cloud providers support all protocols and the next step is expected to return
// an error for unsupported protocols"

// - Not all cloud providers support all protocols and the next step is expected to return

As I can see all the supported cloud providers follows this principle, and they handle unsupported protocol values in their code.

Of course we are happy to receive comments from Network SIG. Please help in inviting the relevant people here if the "sig/network" label is not enough for a heads-up.

@MaximProshin

This comment has been minimized.

Copy link

commented Jun 13, 2018

LKSCTP is not loaded by default in Linux OS that allows developers to run user-land SCTP in Linux. With your change LKSCTP will automatically be loaded and it will intercept all incoming SCTP packets which will lead to mess and basically aborting of SCTP associations. In other words, this change will not allow developers to run user-land SCTP in K8s cluster.

@janosi

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2018

@MaximProshin Is there the same effect when those are in different network namespaces? I.e. when the user space SCTP implementation is used inside a pod that does not use the host network namespace?

UPDATE: If the network interface of the pod is not a tun/tap-like one, and the user space SCTP stack uses raw socket with IPPROTO_SCTP, then namespace will not save us. On the other hand, considering the world without containers, one had to ensure that such user space SCTP apps are not deployed on the same nodes with such applications that would trigger the loading of lksctp. It drives me into the direction of thinking about a way how nodes could be reserved for such user-land SCTP applications. Assuming of course, that those user-land SCTP applications are deployed as containers/pods that share the same kernel with other pods - i.e. not as e.g. Kata Containers or similar.

@@ -25,6 +25,8 @@ import (

"k8s.io/api/core/v1"
utiliptables "k8s.io/kubernetes/pkg/util/iptables"

"github.com/nokia/sctp"

This comment has been minimized.

Copy link
@dcbw

dcbw Jun 13, 2018

Member

I think you need to update the BUILD file in this directory to account for this import addition. run 'hack/update-bazel.sh' and I think it'll do that for you.

@@ -33,6 +33,7 @@ import (
"time"

"github.com/golang/glog"
"github.com/nokia/sctp"

This comment has been minimized.

Copy link
@dcbw

dcbw Jun 13, 2018

Member

Same here, bazel update.

@@ -1656,6 +1667,18 @@ func openLocalPort(lp *utilproxy.LocalPort) (utilproxy.Closeable, error) {
return nil, err
}
socket = conn
case "sctp":
//SCTP is not supported by golang/net, or any other built-in lib,

This comment has been minimized.

Copy link
@dcbw

dcbw Jun 13, 2018

Member

mis-indented whitespace; also there should be a single space after the // in comments.

@janosi

This comment has been minimized.

Copy link
Contributor Author

commented Jun 14, 2018

@dcbw Thank you for the comments! Updated with hack/update-gofmt.sh and hack/update-bazel.sh. Also did a rebase to the current master.

@MaximProshin

This comment has been minimized.

Copy link

commented Jun 15, 2018

Yes, namespaces don't help to isolate user-land SCTP from LKSCTP so there should be a way to not load/unload LKSCTP at least. If I compare K8s with native Linux OS, I would say the PR is non-backward compatible as previously LKSCTP wasn't loaded by default while with K8s it will be so.

@janosi

This comment has been minimized.

Copy link
Contributor Author

commented Jun 15, 2018

@MaximProshin My understanding is that lksctp is not loaded as long as no SCTP Service (with Cluster IP or NodePort) is created, because of the on-demand loading. Once someone creates an SCTP Service the module is loaded, but it is an explicit request from the user. Just like when a user started a native Linux process that wanted to use SCTP.
If my understanding is wrong, please point to the place that triggers lksctp loading right at node startup, i.e. before there is an explicit request for that. Thank you!

UPDATE: we tested the module load case explicitly with the current code proposed here. My understanding is right: the current solution does not trigger the loading of the SCTP module by default(e.g. when kube-x is loaded at bootup). The SCTP module was loaded only when an SCTP Service was created. So, from this perspective if someone upgrades her k8s to a version that includes this feature, she will not experience any problems as long as she does not create SCTP Services explicitly.
This was about backward compatibility.
Starting from this stand we can discuss what we can do in order to have both SCTP Servies with ClusterIP/NodePort and userspace SCTP stack supported at the same time on the same cluster.
I think, we can agree, that these were not supported earlier either - i.e. if I had an application that used userspace SCTP and then I started another app that used kernel SCTP on the same node, then the result was the same problem. I.e. the solution, most probably, was to dedicate nodes to either of these SCTP users, i.e. to achieve that they are not deployed on the same nodes. Here we have the same challenge: to achieve that on the nodes where userspace SCTP apps are to be deployed we should not use the kernel level SCTP on any way.

@janosi

This comment has been minimized.

Copy link
Contributor Author

commented Jun 16, 2018

@thockin I created a KEP, the PR is at kubernetes/community#2276

@thockin

This comment has been minimized.

Copy link
Member

commented Jun 18, 2018

Thanks for the KEP. This also needs discussion on sig-network. A lot of significant issues in there.

@janosi

This comment has been minimized.

Copy link
Contributor Author

commented Jun 19, 2018

Sure, we can take it to the next SIG Network meeting. Am I allowed to write it on the agenda?

@janosi janosi force-pushed the nokia:k8s-sctp branch from 18e85dd to 5d8b1ef Aug 27, 2018
Laszlo Janosi
Laszlo Janosi added 2 commits Aug 27, 2018
@janosi

This comment has been minimized.

Copy link
Contributor Author

commented Aug 27, 2018

Hopefully this PR kubernetes/test-infra#9161 solved the GKE test issues on 27th August 2018, so
/retest

UPDATE (30 minutes later): that PR is not in effect yet. The retest failed.

@thockin

This comment has been minimized.

Copy link
Member

commented Aug 28, 2018

/lgtm
/approve
/retest

@k8s-ci-robot k8s-ci-robot added the lgtm label Aug 28, 2018
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 28, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: janosi, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@janosi

This comment has been minimized.

Copy link
Contributor Author

commented Aug 28, 2018

/retest

2 similar comments
@janosi

This comment has been minimized.

Copy link
Contributor Author

commented Aug 28, 2018

/retest

@janosi

This comment has been minimized.

Copy link
Contributor Author

commented Aug 28, 2018

/retest

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 28, 2018

Automatic merge from submit-queue (batch tested with PRs 67694, 64973, 67902). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 2eb14e3 into kubernetes:master Aug 28, 2018
18 checks passed
18 checks passed
Submit Queue Queued to run github e2e tests a second time.
Details
cla/linuxfoundation janosi authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Job succeeded.
Details
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
@janosi

This comment has been minimized.

Copy link
Contributor Author

commented Aug 28, 2018

@thockin Thank you for guiding me through this adventure, for the review, and for the approval!
@MaximProshin @danwinship @dcbw @KomorkinMikhail @matrohon @m1093782566 Thank you for your comments and reviews!

@justaugustus

This comment has been minimized.

Copy link
Member

commented Sep 11, 2018

Linking the features issue for this: kubernetes/enhancements#614

@yujuhong

This comment has been minimized.

Copy link
Member

commented Sep 12, 2018

I don't think this gets enough attention from the sig-node. This adds a new SCTP protocol to CRI and may require changes in the CRI shims.

@kubernetes/sig-node-api-reviews @kubernetes/sig-node-pr-reviews @mrunalp @Random-Liu @resouer @feiskyer

@feiskyer

This comment has been minimized.

Copy link
Member

commented Sep 12, 2018

On the Kubernetes side we solve this problem so, that we do not start listening on the SCTP ports defined for Servies with ClusterIP or NodePort or externalIP, neither in the case when containers with SCTP HostPort are defined. On this way we avoid the loading of the kernel module due to Kubernetes actions.

@yujuhong Should we skip SCTP port mapping in kuberuntime instead?

@janosi

This comment has been minimized.

Copy link
Contributor Author

commented Sep 12, 2018

@feiskyer @yujuhong I wonder whether the same logic could be applied on the CRI side which was selected for the load balancers: k8s itself does not restrict the usage of SCTP as protocol for load balancer reuqests, and it is the task of the cloud provider's load balancer plugin to filter unsupported protocols.
For example, AFAIK, Docker supports SCTP for host port binding.
Also, I wonder if a clear and straightforward rejection of a request with an unsupported protocol would be better from the user's perspective than just skipping that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.