Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1000+ services in the same namespace will degrade pod start time & eventually prevent pods from starting #92615

Closed
dprotaso opened this issue Jun 29, 2020 · 29 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@dprotaso
Copy link
Contributor

From: #92226

What happened:

Having a modest number of services in your namespace will eventually degrade pod start time and after a certain number of services pods will fail to start.

What you expected to happen:

The number of services in a namespace should not prevent a pod from starting or degrade its start time.

How to reproduce it (as minimally and precisely as possible):

Example here: knative/serving#8498

Anything else we need to know?:

The culprit is the large list of env vars on pods coming from service links (which are on by default)

Environment:

  • Kubernetes version (use kubectl version): 1.16
  • Cloud provider or hardware configuration: GKE
@dprotaso dprotaso added the kind/bug Categorizes issue or PR as related to a bug. label Jun 29, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 29, 2020
@liggitt liggitt changed the title A modest number of services in the same namespace will degrade pod start time & eventually prevent pods from starting 1000+ services in the same namespace will degrade pod start time & eventually prevent pods from starting Jun 29, 2020
@dprotaso
Copy link
Contributor Author

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 29, 2020
@mattmoor
Copy link
Contributor

To clarify, 1000 Knative Services currently translates to about 2000 Cluster IP services and 1000 ExternalName services.

@schweikert
Copy link

One cause for bad performance with a large number of environment variables can be bash. This code seems to scale O(n^2) with number of variables, and is called on bash startup: https://github.com/bminor/bash/blob/d894cfd104086ddf68c286e67a5fb2e02eb43b7b/variables.c#L4197

@mattmoor
Copy link
Contributor

mattmoor commented Jul 2, 2020

Neither of the containers in question use bash as their entrypoint, both are built from gcr.io/distroless/static:nonroot

@bytetwin
Copy link

bytetwin commented Jul 8, 2020

@mattmoor @dprotaso I am interested in picking this up and need help.
Can you give a bit more explanation on the issue and point me to the file where this is happening. Will it pick it up from there. Thanks

@dprotaso
Copy link
Contributor Author

dprotaso commented Jul 8, 2020

@hanumanthan I'm not familiar with the K8s codebase to be able to point you to the right place. I'd probably reach out for help in the K8s slack - https://slack.k8s.io/

@dims
Copy link
Member

dims commented Jul 28, 2020

xref #93399

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 26, 2020
@dprotaso
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 26, 2020
@lingsamuel
Copy link
Contributor

lingsamuel commented Nov 4, 2020

Could you provide a minimal reproducible example without Knative? @mattmoor @dprotaso
I create 1000+ ExternalName services and 1500+ ClusterIP services (all ClusterIP svcs point to the same pod, if the problem is env var related, I think it doesn't matter), then start a new Pod, it starts normally and quickly.
I take a quick glance at related code, didn't find any suspicion codepath (maybe I missed something?). And I tried to start a container directly using docker with 2000+ env vars, it starts quickly.

@mattmoor
Copy link
Contributor

mattmoor commented Nov 4, 2020

Each service is ~8 env vars, typically with long names. Knative currently creates 2 ClusterIP and 1 ExternalName per service, and things failed for me at around 1200 Knative services.

I don't think there's anything wrong with your repro except the numbers, I'd create 1200 ExternalName and 2400 ClusterIP services.

@lingsamuel
Copy link
Contributor

lingsamuel commented Nov 5, 2020

Still can't repro, I have 1400 ExternalName and 2900 ClusterIP now.

A potential slow startup reason is the container entrypoint trying to manipulate/scan env vars, such as bitnami/postgresql, it does start slower (0/1 Running to 1/1 Running) than normal, but time from ContainerCreating to 0/1 Running is normal too.

Your time data comes from:

kubectl get ksvc -ojsonpath='{range .items[*]}{.metadata.name},{.metadata.creationTimestamp},{.status.conditions[?(@.type=="Ready")].lastTransitionTime}{"\n"}{end}'

This also calculates time from 0/1 Running to 1/1 Running, is your test image do something about env var?

@mattmoor
Copy link
Contributor

mattmoor commented Nov 5, 2020

No, this was a pretty trivial Go HTTP server on GKE.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 3, 2021
@dprotaso
Copy link
Contributor Author

dprotaso commented Feb 3, 2021

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 3, 2021
@SergeyKanzhelev
Copy link
Member

/remove-lifecycle frozen

since the issue needs more information, removing the freeze to make sure we are getting traction here.

@mattmoor @dprotaso reading comments I see that the attempt to repro this failed. Can you please provide more information on how to repro this and why you believe the culprit is env variables. Also, what runtime do you use to run containers.

/triage needs-information
/sig scalability

@k8s-ci-robot k8s-ci-robot added triage/needs-information Indicates an issue needs more information in order to work on it. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. and removed lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. labels Jun 25, 2021
@dprotaso
Copy link
Contributor Author

/lifecycle frozen

Here's an example - with about ~1050 of these services (on a single node kind cluster) causes the deployment's container to fail to start

apiVersion: v1
kind: Service
metadata:
  generateName: long-name-name-name-name-long-name-name-name-name-long-name-nam
spec:
  type: ClusterIP
  ports:
    - name: long-name-name-name-name-long-name-name-name-name-long-nam-http
      port: 80
      targetPort: 8080
    - name: long-name-name-name-name-long-name-name-name-name-long-nm-https
      port: 443
      targetPort: 8443
    - name: long-name-name-name-name-long-name-name-name-name-long-xmetricz
      port: 9090
      targetPort: 9090
    - name: long-name-name-name-name-long-name-name-name-name-long-xhealthz
      port: 9091
      targetPort: 9091
apiVersion: apps/v1
kind: Deployment
metadata:
  name: some-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: app
  template:
    metadata:
      labels:
        app: app
    spec:
      containers:
      - name: app
        image: gcr.io/knative-samples/helloworld-go
        ports:
        - containerPort: 80

containerd logs are

2021-06-25T01:25:51.427945289Z stderr F standard_init_linux.go:228: exec user process caused: argument list too long

If you see the original references issue you can mitigate this by disabling service links - but changing the default wasn't an option so hence I opened this performance tracking issue. Essentially the explosion of env vars causes massive runc configs to be written and you eventually hit some linux limits when starting the process.

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jun 25, 2021
@kloudwrangler
Copy link

kloudwrangler commented Nov 5, 2021

I had this issue recently on a customer cluster. There where too many services in a single namespace, which caused a bunch of env vars appearing on every pod. The problem might not be immediately apparent, but try running a timed workload (e.g. a bash script that performs sqrt() ) in that namespace and the same workload in a different namespace. For me, the difference was astronomical. Reason might be that any child process created needs to inherit all these env vars, and therefore, any type of workload that creates many child processes will suffer. My fix for this issue was to add spec.template.spec.enableServiceLinks: false to the deployment since the pods did not depend on these environmental variables to function properly.

@mattmoor
Copy link
Contributor

mattmoor commented Nov 5, 2021

Yeah, we opted to change the default in Knative over this. I doubt Kubernetes will make such a breaking change.

@dprotaso
Copy link
Contributor Author

dprotaso commented Nov 5, 2021

They won't introduce the breaking change - that's why this issue is open.

@SergeyKanzhelev this issue is tagged needs-information but you should have everything you need. Do you have any more questions?

@schweikert
Copy link

See also my comment: this issue particularly affects bash, which has a routine with O(n^2) complexity on the number of environment variables. It's easy to reproduce: create 1000+ services and do a 'kubectl exec' in a container that uses bash as entrypoint.

I wish spec.template.spec.enableServiceLinks could be false by default in a Pod v2 definition. I feel like very few people use this functionality, but it affects the performance of all.

Additionally, it would be great to improve bash to scale better by number of environment variables.

@dprotaso
Copy link
Contributor Author

dprotaso commented Nov 7, 2021

See also my comment: this issue particularly affects bash

Eventually all workloads fall over - my example doesn't use bash

@ragnarkurmwunder
Copy link

ragnarkurmwunder commented Sep 27, 2022

We had a lot of services in the same namespace.
MariaDB startup script was printing 1 line of text per 3sec or so.
apt-get install -y php crashed the container - extreme slowness.
The simplest way to produce the problem in the container was: time bash -c 'echo | cat'.
This command took somewhere between 4-14 sec.
We found that the test command had to be bash and contain pipe | (forking subshell?).
Single commands were OK. Other shells were OK too.

enableServiceLinks: false helped. When this is set to false, a DNS service like CoreDNS or KubeDNS needs to be installed if name resolution is needed.

@emirot
Copy link

emirot commented Dec 5, 2022

It would be easier if this could be disable at the namespace level.

@chris93111
Copy link

same issue with knative, this cause latency running application, enableServiceLinks: false solved my probleme

@SergeyKanzhelev
Copy link
Member

@dprotaso can you please check if enableServiceLinks: false helps.

@dprotaso
Copy link
Contributor Author

@SergeyKanzhelev it does - we have that set by default in Knative.

I believe I left this issue open so that it could be addressed - maybe in a future v2 of the Pod API

@aojea
Copy link
Member

aojea commented Jan 26, 2024

/close

duplicate of #121787 , were the solution has been discussed and proposed

I'm almost certain this behavior will not be implemented for an hypothetical v2 Pod API 😄

@k8s-ci-robot
Copy link
Contributor

@aojea: Closing this issue.

In response to this:

/close

duplicate of #121787 , were the solution has been discussed and proposed

I'm almost certain this behavior will not be implemented for an hypothetical v2 Pod API 😄

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
Archived in project
Development

No branches or pull requests