Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set enableServiceLinks to false as default #121787

Open
jmcgrath207 opened this issue Nov 8, 2023 · 27 comments
Open

Set enableServiceLinks to false as default #121787

jmcgrath207 opened this issue Nov 8, 2023 · 27 comments
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@jmcgrath207
Copy link
Contributor

jmcgrath207 commented Nov 8, 2023

What would you like to be added?

I want to make enableServiceLinks disabled by default.

Why is this needed?

Service link environment variables are injected into every pod. It creates three environment variables for every service in a respective pod's namespace, regardless of how they are related. This behavior was observed in EKS 1.24.

In larger namespaces, this can cause pods to crash due to too many environmental variables. I assume this is due to Container runtime or Operating System Limitations.

I assume this was needed for legacy network reasons or installs that do not run cluster DNS. Regardless, Cluster DNS is added by default for most Kubernetes installers (excluding mircok8s) out there today.

I am willing to contribute if this is the direction we want to go.

@jmcgrath207 jmcgrath207 added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 8, 2023
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Nov 8, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Nov 8, 2023
@neolit123
Copy link
Member

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 8, 2023
@uablrek
Copy link
Contributor

uablrek commented Nov 9, 2023

I think a changed default will be rejected for backward compatibility reasons, but have you considered a mutating webhook?

@jmcgrath207
Copy link
Contributor Author

jmcgrath207 commented Nov 9, 2023

So, I have seen a solution to use Kyverno.

Example:
https://stackoverflow.com/a/69555759/3263650

With this proposal, I am trying to accomplish this:

  • Remove situations where troubleshooting is needed to figure out this problem
    • ex. There is no logging that suggests enableServiceLinks causes this problem; there are only errors when there are too many environment variables for a pod, making it not easy to troubleshoot.
  • Not having third-party dependencies to fix this.

FWIW, I believe this is the code that performs this action.
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_pods.go#L554

@jmcgrath207
Copy link
Contributor Author

jmcgrath207 commented Nov 27, 2023

@uablrek

Here is an example of an error we ran into where too many environment variables were set in Java, causing the container to crash. Setting enableServiceLinks: false fixed this.

quarkusio/quarkus#36394

@uablrek
Copy link
Contributor

uablrek commented Nov 27, 2023

I don't question the problem, and I agree that the environment variables are basically obsolete, and the default should be enableServiceLinks=false, but...

With a so large customer base as K8s has, changing defaults is very risky. According to Hyrum's law it will ruin someones workflow.

Then again, this change may be acceptable. I will bring it up if I attend the next sig/network bi-weekly meeting.

@uablrek
Copy link
Contributor

uablrek commented Dec 7, 2023

It should be noted that even with enableServiceLinks: false the environment variables for the "kubernetes" service are defined.

@thockin thockin self-assigned this Dec 7, 2023
@thockin
Copy link
Member

thockin commented Dec 7, 2023

This one is hard. Changing defaults is a breaking change. The real answer is to rev the "core" API to v2 at which point which we can change defaults. But that's a big topic. BIG.

We can easily do a dumb admission controller (always set it to false), but I'll have to look into whether we can be smarter and only set it if the user didn't speciically ask for true (it's still a breaking change, but you get to control it as a cluster admin). It's complicated because defaulting happens BEFORE admission control. We could perhaps move th edefaulting into the BeforeCreate hook so admission control would see it as nil (thankfully it is a pointer field!). I could support that, but it would be one more built-in admission controller to enable.

Earliest that could happen would be 1.30. Using gatekeeper or kyverno is something that can work NOW.

@jpbetz @deads2k @apelisse - do you think that we should solve for this general problem? Specifically, should we have a normal way to set default values AFTER admission control? Or that admission control could categorically know "this value was set by the user" vs "this was set by defaulting" ? Is there some way?

@apelisse
Copy link
Member

Technically, one could look at the managed fields to see if the field was applied (since we don't track defaulted fields), that can probably be done in the admission chain. I can't remember if we send the full managed fields to the admission webhooks, but I think we do.

@uablrek
Copy link
Contributor

uablrek commented Dec 14, 2023

I would like to explain why I don't think this is a one-man problem, or a corner-case that can be taken lightly.

A very large environment is a burden for fork/exec intensive applications

The environment is copied to child processes. A smart OS like Linux may have some copy-on-demand policy for environments, but even so many programs alter their environment, interpreters in particular. Just check the $BASH_* variables.

Depending on start order the problem may not be seen at first

In Unix'es the environment of a running process is only altered by the process itself. So, PODs that are started before the services are created will have small environments. They can run without problems a long time, but when the POD is restarted it will suddenly get the very large environment that it can't handle, and dive into a restart-loop in worst case.

Experiments and tests

I wrote a script that creates a number of services:

export namespace=svc-test
kubectl create namespace $namespace
time createsvcs 800
...
service/test-service-0002 created
service/test-service-0001 created
real    1m 2.77s
user    0m 42.50s
sys     0m 7.60s
createsvcs
#! /bin/sh
##
## namespace=default prefix=test-service selector="app: alpine-env" \
## createsvcs <number>
##
##   Create a number of Kubernetes services for test.
##

if test -z "$1"; then
	grep '^##' $0 | cut -c3-
	exit 0
fi

test -n "$namespace" || namespace=default
test -n "$prefix" || prefix=test-service
test -n "$selector" || selector="app: alpine-env"

# createsvc <name> <selector>
createsvc() {
	# cat | kubectl create -f - <<EOF
	# Some shells (bash) appends the caller's stdin to the command above
	# for some reason, so the command hangs
	local tmp=/tmp/$1.yaml
	cat > $tmp <<EOF
apiVersion: v1
kind: Service
metadata:
  name: $1
spec:
  selector:
    $2
  ports:
  - port: 7005
EOF
	kubectl create -n $namespace -f $tmp
	rm -f $tmp
}

cnt=$1
while test $cnt -gt 0; do
	createsvc $(printf "$prefix-%04d" $cnt) "$selector"
	cnt=$((cnt - 1))
done

Start a POD and check it's environment:

kubectl -n $namespace apply -f alpine-env.yaml
pod=$(kubectl -n $namespace get pods -l app=alpine-env -o name | head -1)
kubectl -n $namespace exec $pod -- env | wc -lc
     5612    266659
kubectl -n $namespace exec $pod -- env | tail
TEST_SERVICE_0531_PORT_7005_TCP_PORT=7005
TEST_SERVICE_0088_PORT_7005_TCP_PROTO=tcp
TEST_SERVICE_0101_PORT_7005_TCP=tcp://[fd00:4000::3324]:7005
TEST_SERVICE_0589_PORT_7005_TCP_PORT=7005
TEST_SERVICE_0155_PORT_7005_TCP_PORT=7005
TEST_SERVICE_0303_PORT_7005_TCP_PORT=7005
TEST_SERVICE_0401_PORT_7005_TCP_ADDR=fd00:4000::75a2
TEST_SERVICE_0157_SERVICE_HOST=fd00:4000::f6a7
TEST_SERVICE_0693_SERVICE_PORT=7005
HOME=/root

The issue author got a Java crash for an environment >5100 lines. My test POD doesn't crash, so that specific problem may be for Java.

alpine-env.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: alpine-env
spec:
  selector:
    matchLabels:
      app: alpine-env
  replicas: 1
  template:
    metadata:
      labels:
        app: alpine-env
    spec:
      #enableServiceLinks: false
      containers:
      - name: alpine
        image: alpine:latest
        imagePullPolicy: IfNotPresent
        command: ["nc", "-lk", "-p", "7005", "-e", "hostname"]
        ports:
        - name: nc
          containerPort: 7005

Set enableServiceLinks: false, and try again:

sed -i -e 's,#enableServiceLinks,enableServiceLinks,' alpine-env.yaml
kubectl -n $namespace apply -f alpine-env.yaml
pod=$(kubectl -n $namespace get pods -l app=alpine-env -o name | head -1)
kubectl -n $namespace exec $pod -- env | wc -lc
       12       423
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_PORT=tcp://[fd00:4000::1]:443
KUBERNETES_PORT_443_TCP=tcp://[fd00:4000::1]:443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_ADDR=fd00:4000::1
KUBERNETES_SERVICE_HOST=fd00:4000::1
KUBERNETES_SERVICE_PORT=443
HOME=/root

As you can see the "kubernetes" service is defined even with enableServiceLinks: false, which allows access to the API server even with old code.

@thockin
Copy link
Member

thockin commented Dec 20, 2023

I'm not saying I disagree. This is bad. What I am saying is that changing it would be a breaking change, and we just don't do that. If we think this is important enough to violate our own rules, why? If we think it is unlikely to break anyone, where's the data to back that?

Doing an optional admission controller is the only non-breaking way to do this. If someone wants to do it, it should not be technically hard.

@aojea
Copy link
Member

aojea commented Dec 20, 2023

Doing an optional admission controller is the only non-breaking way to do this. If someone wants to do it, it should not be technically hard.

Just if someone wonders how difficult is to create an admission controller check #97395 , is a good opportunity for contributors that have certain experience with the kubernetes code base

/help

@k8s-ci-robot
Copy link
Contributor

@aojea:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

Doing an optional admission controller is the only non-breaking way to do this. If someone wants to do it, it should not be technically hard.

Just if someone wonders how difficult is to create an admission controller check #97395 , is a good opportunity for contributors that have certain experience with the kubernetes code base

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Dec 20, 2023
@jmcgrath207
Copy link
Contributor Author

jmcgrath207 commented Dec 22, 2023

Thanks, everyone, for reviewing this and offering solutions.

I want to work on this issue and have development experience with admissions webhooks that may help 🤞 . However, I am a first-time contributor so it will take me much longer.

@thockin
Copy link
Member

thockin commented Dec 22, 2023

No pressure! It's been this way for almost 10 years :)

@aojea
Copy link
Member

aojea commented Dec 22, 2023

/assign @jmcgrath207

@uablrek
Copy link
Contributor

uablrek commented Dec 27, 2023

Since I haven't written a webhook myself, I took the opportunity to learn. Available at: https://github.com/uablrek/slink-webhook

@aojea
Copy link
Member

aojea commented Jan 7, 2024

Technically, one could look at the managed fields to see if the field was applied (since we don't track defaulted fields), that can probably be done in the admission chain. I can't remember if we send the full managed fields to the admission webhooks, but I think we do.

@apelisse I was playing with the PR in #122631 and dumping the object that arrives at the admission plugin I can not see any difference in the ManagedFields values between setting it on creation or defaulting

Explicitly setting it

I0107 11:18:32.326361 1854719 admission.go:71] ------------------ DEBUG: Managed fields[{endpoints.test Update v1 2024-01-07 11:18:32.325218227 +0000 UTC FieldsV1 {"f:metadata":{"f:labels":{".":{},"f:foo":{}}},"f:spec":{"f:containers":{"k:{"name":"fake-name"}":{".":{},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:nodeName":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{}}} }]

Defaulting:

I0107 11:20:36.944322 1859414 admission.go:71] ------------------ DEBUG: Managed fields[{endpoints.test Update v1 2024-01-07 11:20:36.94324503 +0000 UTC FieldsV1 {"f:metadata":{"f:labels":{".":{},"f:foo":{}}},"f:spec":{"f:containers":{"k:{"name":"fake-name"}":{".":{},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:nodeName":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{}}} }]

@aojea
Copy link
Member

aojea commented Jan 26, 2024

Technically, one could look at the managed fields to see if the field was applied (since we don't track defaulted fields), that can probably be done in the admission chain. I can't remember if we send the full managed fields to the admission webhooks, but I think we do.

@apelisse I was playing with the PR in #122631 and dumping the object that arrives at the admission plugin I can not see any difference in the ManagedFields values between setting it on creation or defaulting

Explicitly setting it

I0107 11:18:32.326361 1854719 admission.go:71] ------------------ DEBUG: Managed fields[{endpoints.test Update v1 2024-01-07 11:18:32.325218227 +0000 UTC FieldsV1 {"f:metadata":{"f:labels":{".":{},"f:foo":{}}},"f:spec":{"f:containers":{"k:{"name":"fake-name"}":{".":{},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:nodeName":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{}}} }]

Defaulting:

I0107 11:20:36.944322 1859414 admission.go:71] ------------------ DEBUG: Managed fields[{endpoints.test Update v1 2024-01-07 11:20:36.94324503 +0000 UTC FieldsV1 {"f:metadata":{"f:labels":{".":{},"f:foo":{}}},"f:spec":{"f:containers":{"k:{"name":"fake-name"}":{".":{},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:nodeName":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{}}} }]

@jpbetz @liggitt do you know , maybe, if is it possible to use managedfields for differentiating between defaulting by the apiserver or explicitly set by the user?

@liggitt
Copy link
Member

liggitt commented Jan 26, 2024

I can not see any difference in the ManagedFields values between setting it on creation or defaulting

That's surprising to me, but I verified that normal create requests populate all the defaulted fields as attributed to the creator. Only if I created the pod using server-side apply did the managed fields exclude the defaulted fields (but no built-in controllers create pods that way, so an admission plugin can't rely on that)

@jpbetz @liggitt do you know , maybe, if is it possible to use managedfields for differentiating between defaulting by the apiserver or explicitly set by the user?

(the issues with managedFields including defaulted fields aside) I think you can do something like the following on pod create:

  1. decode the incoming object from the admission review into a *v1.Pod instance
  2. iterate over the pod.Metadata.ManagedFields entries (I think there should only be a single entry on create, but maybe iterate just to be forward compatible)
  3. for each entry, extract the fields owned by that manager (fields = ExtractPod(manager))
  4. check if the extracted fields for that manager included enableServiceLinks (fields.Spec != nil && fields.Spec.EnableServiceLinks != nil)

@thockin thockin added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 1, 2024
@rwilson-release
Copy link

First point, you can (and do) make breaking changes, you just need to announce the plan to make the deprecation, waiting period, then implementation. So, as an example, k8s-v21 will deprecate the old missing default value of "true" with an emitted warning and in k8s-v24 it will be changed from true to false. The deprecations page will show you that the value is changing. Documentation can point you to an upgrade path or alternatives.

Second point, nobody in their right minds would use this feature as implemented: I am not aware of any use cases in my experience of applications looking for (and using) and environment variable values like tcp://IP:PORT for service discovery. Anyone who absolutely needed this functionality could add enableServiceLinks: true to their manifests prior to the breaking change implementation which would be 3-n versions hence.

@rwilson-release
Copy link

It should be noted that even with enableServiceLinks: false the environment variables for the "kubernetes" service are defined.

This is probably some more legacy that can be handled separately. The environment variables prefixed with KUBERNETES_ are not likely to cause any collisions or issues like the ones that are very common with SERVICENAME_*.

@thockin
Copy link
Member

thockin commented Apr 4, 2024

@rwilson-release What we CAN (technically) do and what we ACTUALLY do are not always the same.

As a project, we have a pretty firm stance that we DO NOT CHANGE default behaviors, unless the benefit vastly outweighs the cost. We have a sort of "budget" for disruptive changes, and we need to spend it wisely because it is very slow to replenish.

You also need to consider that some huge fraction of Kube users are on managed services, which can and do upgrade clusters without notifying users. As such, consider this:

  • You are a user who depends on this feature
  • Your app works on Friday evening
  • Your provider upgrades your cluster over the weekend, including all the nodes (causing pod restarts)
  • You app is broken on Monday morning

How unhappy are you? Who are you unhappy with? When your provider says "Oh, that's because Kube v1.31 changed a default", do you feel like you trust Kubernetes MORE or LESS?

nobody in their right minds would use this feature as implemented

Maybe. We simply do not have data on it, so we err on the side of caution. The risk of breaking the above user is, IMO, higher than the benefit of having a cleaner default. We didn't invent this format, we stole it from Docker. Perhaps we wish we had not (I wish we had not), but we did, and it's been a "feature" for almost 10 years. Lots of time to pick up some users.

WE DON'T BREAK USERS (without REALLY REALLY good reasons). Even if we think it's a dumb feature. Even if we think it imposes some hardship on people who want a sane default. Even if we suspect that nobody in their right mind uses it.

@rwilson-release
Copy link

WE DON'T BREAK USERS (without REALLY REALLY good reasons). Even if we think it's a dumb feature. Even if we think it imposes some hardship on people who want a sane default. Even if we suspect that nobody in their right mind uses it.

I think that's fine and great and I don't disagree. I do think there is a way to move forward that isn't as bad as you describe it, but I could be wrong. I can only pray and hope we find a way to make changes without requiring huge monumental waterfall breaks I guess.

@thockin
Copy link
Member

thockin commented Apr 4, 2024

I once believed that we could declare a deprecation loudly enough that it would be OK to change defaults within an API version. I no longer believe that. Some people don't read announcement blogs or CHANGELOGS. They don't subscribe to this GitHub repo. They don't follow me on Twitter. They don't even read urgent bulletins from their cloud-providers. They don't see API warnings (they either gloss right over them or they are buried inside CD system logs for "succeeded" runs).

I have yet to find a vector which will get close enough to 100% that we could do this.

And even if they DO see it, we're asking them to update their PERFECTLY GOOD deployments, which incurs testing and requalification, and generally just eats time. For what benefit?

We have (rarely) changed things in this way but it takes SO MUCH effort, that the impact has to be dramatic. Like "could make the front page of the New York Times" dramatic.

I can only pray and hope we find a way to make changes without requiring huge monumental waterfall breaks

Well, we have a way, it's just monumental in its scope and the pressure to do it has not reached critical - we create a core/v2 API. That's where we can change defaults (among other things). Users of core/v1 can remain blissfully ignorant. People who care can opt-in to using v2 on a case-by-case basis. I personally think we should do it, but I am not signing up to do the work right now - we have a lot of things more pressing that this. :(

@rwilson-release
Copy link

We have the luxury of generating manifests for our customers, so we've rolled out a global default for their manifests and our corner of the world operates a little bit better maybe.

It occurs to me that if/when you are going to have a major revision to core/v2 API that you should just remove the feature completely rather than set the default. After all, removing all of that code and magical behaviour would make more sense and presumably add more benefits and value than simply changing a default value. That's my vote at least. Thanks for your attention.

@thockin
Copy link
Member

thockin commented Apr 8, 2024

The way our API machinery works is that API versions are just "lenses" into the underlying data. Removing it from v2 means removing it from v1, and we don't do that. :(

Trust me, there's nobody who wants to delete this more than me, but...

koct9i added a commit to koct9i/yt-k8s-operator that referenced this issue Apr 10, 2024
This is legacy feature which contaminates environment variables
with docker-compatible references to all services in namespace.

Link: https://kubernetes.io/docs/tutorials/services/connect-applications-service/#accessing-the-service
Link: kubernetes/kubernetes#121787
koct9i added a commit to koct9i/yt-k8s-operator that referenced this issue Apr 10, 2024
This is legacy feature which contaminates environment variables
with docker-compatible references to all services in namespace.

Link: https://kubernetes.io/docs/tutorials/services/connect-applications-service/#accessing-the-service
Link: kubernetes/kubernetes#121787
koct9i added a commit to ytsaurus/yt-k8s-operator that referenced this issue Apr 11, 2024
This is legacy feature which contaminates environment variables
with docker-compatible references to all services in namespace.

Link: https://kubernetes.io/docs/tutorials/services/connect-applications-service/#accessing-the-service
Link: kubernetes/kubernetes#121787
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

9 participants