Many ConfigMaps and Pods slow down cluster, until it becomes unavailable (since 1.12) #74412

qmfrederik · 2019-02-22T12:41:27Z

What happened:

I schedule multiple jobs in my cluster. Each job uses a different ConfigMap which contains the configuration for that job.

This worked well on version 1.11 of Kubernetes. After upgrading to 1.12 or 1.13, I've noticed that doing this will cause the cluster to significantly slow down; up to the point where nodes are being marked as NotReady and no new work is being scheduled.

For example, consider a scenario in which I schedule 400 jobs, each with its own ConfigMap, which print "Hello World" on a single-node cluster would.

On v1.11, it takes about 10 minutes for the cluster to process all jobs. New jobs can be scheduled.
On v1.12 and v1.13, it takes about 60 minutes for the cluster to process all jobs. After this, no new jobs can be scheduled.

What you expected to happen:

I did not expect this scenario to cause my nodes to become unavailable in Kubernetes 1.12 and 1.13, and would have expected the behavior which I observe in 1.11.

How to reproduce it (as minimally and precisely as possible):

The easiest way seems to be to schedule, on a single-node cluster, about 300 jobs:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: job-%JOB_ID%
data:
# Just some sample data
  game.properties: |
    enemies=aliens
---
apiVersion: batch/v1
kind: Job
metadata:
  name: job-%JOB_ID%
spec:
  template:
    spec:
      containers:
      - name: busybox
        image: busybox
        command: [ "/bin/echo" ]
        args: [ "Hello, World!" ]
        volumeMounts:
        - name: config-volume
          mountPath: /etc/config
      volumes:
        - name: config-volume
          configMap:
            name: job-%JOB_ID%
      restartPolicy: Never
  backoffLimit: 4

I can consistently reproduce this issue in a VM-based environment, which I configure using Vagrant. You can find the full setup here: https://github.com/qmfrederik/k8s-job-repro

Anything else we need to know?:

Happy to provide further information as needed

Environment:

Kubernetes version (use kubectl version): v1.12 through v1.13
Cloud provider or hardware configuration: bare metal
OS (e.g: cat /etc/os-release): 18.04.1 LTS (Bionic Beaver)
Kernel (e.g. uname -a): Linux vagrant 4.15.0-29-generic Add DESIGN.md to document core design. #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Install tools: kubeadm
Others:

The text was updated successfully, but these errors were encountered:

qmfrederik · 2019-02-22T12:43:50Z

@kubernetes/sig-cluster-lifecycle-bugs @wojtek-t

It looks like this is a regression introduced in v1.12. It looks like #64752 may be related.

k8s-ci-robot · 2019-02-22T12:43:57Z

@qmfrederik: Reiterating the mentions to trigger a notification:
@kubernetes/sig-cluster-lifecycle-bugs

In response to this:

@kubernetes/sig-cluster-lifecycle-bugs @wojtek-t

It looks like this is a regression introduced in v1.12. It looks like #64752 may be related.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

qmfrederik · 2019-02-22T12:46:02Z

On v1.12, the kubelet logs will contain messages like this:

kubelet[877]: E0219 09:06:14.637045     877 reflector.go:134] object-"default"/"job-162273560": Failed to list *v1.ConfigMap: Get https://172.13.13.13:6443/api/v1/namespaces/default/configmaps?fieldSelector=metadata.name%3Djob-162273560&limit=500&resourceVersion=0: http2: no cached connection was available

and

kubelet[877]: E0219 09:32:57.379751     877 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Get https://172.13.13.13:6443/api/v1/nodes?fieldSelector=metadata.name%3Dxxxxx&limit=500&resourceVersion=0: http2: no cached connection was available

and the kube-controller-manager logs:

I0219 09:29:46.875397       1 node_lifecycle_controller.go:1015] Controller detected that all Nodes are not-Ready. Entering master disruption mode.
I0219 09:30:16.877715       1 node_lifecycle_controller.go:1042] Controller detected that some Nodes are Ready. Exiting master disruption mode.

spiffxp · 2019-02-25T15:15:49Z

/sig scalability

spiffxp · 2019-02-25T15:16:01Z

@kubernetes/sig-scalability-bugs

wojtek-t · 2019-02-25T16:55:10Z

@qmfrederik - couple questions:

are you using official k8s build (I'm asking to confirm go version that was used to it).

The easiest way seems to be to schedule, on a single-node cluster, about 300 jobs:

You can't start more than 110 pods on a node. So I'm assuming you're talking about creating them roughly at once, and then wait as they will be proceeding (we first schedule ~100 of them (there are ~10 system pods running on that node), and then as they will be finishing we will be scheduling new pods on that node. Am I right?

@kubernetes/sig-node-bugs @yujuhong - FYI

YueHonghui · 2019-02-26T13:56:32Z

We have encountered this problem too(#74302). This is related to max concurrent http2 streams and the change of configmap manager of kubelet. By default, max concurrent http2 stream of http2 server in kube-apiserver is 250, and every configmap will consume one stream to watch in kubelet at least from version 1.13.x. Kubelet will stuck to communicate to kube-apiserver and then become NotReady if too many pods with configmap scheduled to it. A work around is to change the config http2-max-streams-per-connection of kube-apiserver to a bigger value.

qmfrederik · 2019-02-26T14:09:26Z

@wojtek-t To answer your questions:

Yes, these are official Kubernetes builds. Running on Ubuntu 18.04, using the packages from the Kubernetes apt repository
Correct, I schedule all jobs at once using a script, and then wait while they are proceeding.

What @YueHonghui says seems consistent with what I'm experiencing.

So, it appears that kubelet is still watching configmaps of pods which have completed, and ultimately you hit the max concurrent HTTP2 stream limit. Would it make sense for kubelet to stop watching configmaps once the pod which consumes that configmap has completed?

wojtek-t · 2019-02-26T14:12:18Z

@YueHonghui - hmm.. I thought that once we hit the limit per connection (250), we will automatically open a new connection...

Would it make sense for kubelet to stop watching configmaps once the pod which consumes that configmap has completed

It should stop watching:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/util/manager/watch_based_manager.go#L143

qmfrederik · 2019-02-26T14:14:43Z

It should stop watching:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/util/manager/watch_based_manager.go#L143

Are there any logs, metrics,... I can capture to see whether kubelet actually stops watching? Like a metric for concurrent HTTP2 connections or something similar?

wojtek-t · 2019-02-26T14:55:23Z

Are there any logs, metrics,... I can capture to see whether kubelet actually stops watching? Like a metric for concurrent HTTP2 connections or something similar?

Unfortunately it's not easy...
What you can do (I admit it's a bit painful) is to look into kube-apiserver logs (you probably need --v=3 - it's where we log all api calls) and check if there are logs saying that a corresponding "watch request" has finished around the time when pod is being deleted - these will be logs like these:

I0226 13:46:18.330681       1 wrap.go:47] GET /api/v1/namespaces/test-93ne60-1/secrets?fieldSelector=metadata.name%3Ddefault-token-pkdz9&resourceVersion=44330&timeout=9m11s&timeoutSeconds=551&watch=true: (1m55.976728767s) 200 [kubelet/v1.15.0 (linux/amd64) kubernetes/465f7eb 35.237.12.113:35324]

qmfrederik · 2019-02-26T14:57:02Z

@wojtek-t Thanks, I'll give that a try (probably tomorrow) and let you know.

yue9944882 · 2019-02-27T04:15:34Z

/cc @deads2k @lavalamp @MikeSpreitzer

this issue might be suggesting that policing/prioritizing requests merely by user/groups is not sufficient for cluster robustness. we should also classify these requests w/ a finer granularity w/i components like kubelet according to verbs/resources.

lavalamp · 2019-02-27T05:48:40Z

@yue9944882 This issue is due to the golang bug of limiting HTTP2 connections for no reason, no?

yue9944882 · 2019-02-27T06:07:23Z

@yue9944882 This issue is due to the golang bug of limiting HTTP2 connections for no reason, no?

yes, to clarify, i mean we can probably limit the WATCH connections (under 250) for kubelet at server-side to "make room" for the patch calls at client-side. will this help the case?

YueHonghui · 2019-02-27T06:16:30Z

@yue9944882 @wojtek-t @lavalamp I have posted goroutine stacktrace of kubelet to #74302 (comment) . In the case we have encountered, kubelet doesn't use new connection to communicate to kube-apiserver when hit limit of max concurrent streams. This seems due to golang bug of http2 connection pool implementing.

MikeSpreitzer · 2019-02-27T06:23:35Z

@yue9944882 : what I am hearing here is that the problem is not the apiservers being overloaded but rather a connection management problem in the kubelet. So this is not begging to be solved by traffic policing in the apiservers.

wojtek-t · 2019-02-27T08:54:44Z

but rather a connection management problem in the kubelet.

yes

@MikeSpreitzer @lavalamp ^^

qmfrederik · 2019-02-27T13:34:58Z

@wojtek-t I took the time to reproduce this (on v.1.13.3) with logging enabled (v=3) on the API server.
Pretty much the same scenario - schedule 400 jobs (each linked to a different configmap), and wait for them to be scheduled and completed.

Even after all jobs have completed (i.e. the only running pods are the kube-system pods), up to every 10 seconds, ~230 new entries (which seems awfully close to 250) for a watch for a configmap appear on the log.

Here are a couple of cycles, simplified:

I0227 13:24:02.821104       1 wrap.go:47] GET /api/v1/namespaces/default/configmaps?fieldSelector=metadata.name%3Djob-1351&resourceVersion=62916&timeout=6m59s&timeoutSeconds=419&watch=true: (30.346053154s) 200 [kubelet/v1.13.3 (linux/amd64) kubernetes/721bfa7 10.0.2.15:33510]
[...]
I0227 13:24:03.536994       1 get.go:247] Starting watch for /api/v1/namespaces/default/configmaps, rv=62916 labels= fields=metadata.name=job-1244 timeout=7m43s
[...]
I0227 13:24:13.750770       1 wrap.go:47] GET /api/v1/namespaces/default/configmaps?fieldSelector=metadata.name%3Djob-1393&resourceVersion=62916&timeout=9m34s&timeoutSeconds=574&watch=true: (10.111835228s) 200 [kubelet/v1.13.3 (linux/amd64) kubernetes/721bfa7 10.0.2.15:34634]
[...]
I0227 13:24:14.100636       1 get.go:247] Starting watch for /api/v1/namespaces/default/configmaps, rv=62916 labels= fields=metadata.name=job-1125 timeout=5m57s
[...]
I0227 13:24:24.464031       1 wrap.go:47] GET /api/v1/namespaces/default/configmaps?fieldSelector=metadata.name%3Djob-1173&resourceVersion=62916&timeout=6m36s&timeoutSeconds=396&watch=true: (10.021946099s) 200 [kubelet/v1.13.3 (linux/amd64) kubernetes/721bfa7 10.0.2.15:35548]
[...]
I0227 13:24:25.467773       1 get.go:247] Starting watch for /api/v1/namespaces/default/configmaps, rv=62916 labels= fields=metadata.name=job-1113 timeout=5m27s
[...]
I0227 13:24:34.549198       1 wrap.go:47] GET /api/v1/namespaces/default/configmaps?fieldSelector=metadata.name%3Djob-1267&resourceVersion=62916&timeout=5m14s&timeoutSeconds=314&watch=true: (8.750917122s) 200 [kubelet/v1.13.3 (linux/amd64) kubernetes/721bfa7 10.0.2.15:36178]
[...]
I0227 13:24:35.190009       1 get.go:247] Starting watch for /api/v1/namespaces/default/configmaps, rv=62916 labels= fields=metadata.name=job-1339 timeout=9m11s
[...]
I0227 13:25:04.545461       1 wrap.go:47] GET /api/v1/namespaces/default/configmaps?fieldSelector=metadata.name%3Djob-1220&resourceVersion=62916&timeout=8m39s&timeoutSeconds=519&watch=true: (29.249889205s) 200 [kubelet/v1.13.3 (linux/amd64) kubernetes/721bfa7 10.0.2.15:36956]
[...]
I0227 13:25:04.558286       1 get.go:247] Starting watch for /api/v1/namespaces/default/configmaps, rv=62916 labels= fields=metadata.name=job-1157 timeout=8m35s
[...]
I0227 13:25:14.551850       1 wrap.go:47] GET /api/v1/namespaces/default/configmaps?fieldSelector=metadata.name%3Djob-1061&resourceVersion=62916&timeout=8m31s&timeoutSeconds=511&watch=true: (9.631001277s) 200 [kubelet/v1.13.3 (linux/amd64) kubernetes/721bfa7 10.0.2.15:37962]
[...]
I0227 13:25:15.224414       1 get.go:247] Starting watch for /api/v1/namespaces/default/configmaps, rv=62916 labels= fields=metadata.name=job-1285 timeout=5m27s
[...]
I0227 13:25:24.601452       1 wrap.go:47] GET /api/v1/namespaces/default/configmaps?fieldSelector=metadata.name%3Djob-1101&resourceVersion=62916&timeout=6m47s&timeoutSeconds=407&watch=true: (9.084488207s) 200 [kubelet/v1.13.3 (linux/amd64) kubernetes/721bfa7 10.0.2.15:38570]
[...]
I0227 13:25:25.607063       1 get.go:247] Starting watch for /api/v1/namespaces/default/configmaps, rv=62916 labels= fields=metadata.name=job-1376 timeout=5m24s
[...]

Here are the full logs: kube-api-server-logs.txt.gz

So it appears that somewhere, at least one process did not stop watching.

Happy to run further tests/provide additional information if it helps you.

wojtek-t · 2019-02-27T13:50:03Z

Heh... - I think I know where the problem is.
The problem is that we delete the reference to the pod (and this is what stops the watch) when we UnregisterPod:

kubernetes/pkg/kubelet/util/manager/cache_based_manager.go

Line 244 in 3478647

func (c *cacheBasedManager) UnregisterPod(pod *v1.Pod) {

And this one is triggered only by pod deletion:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pod/pod_manager.go#L207

The problem is that pods that are owned by Jobs are not deleted (they are eventually garbage-collected).
So what happens, is that eventually, you end up with many more pods being effectively on that node (though a lot of them are already in "Succeeded" state).

So it seems there are two problems here:

one is that we should probably unregister pod that is no-longer running (and won't be restarted)
second (and more serious in my opinion) is why new connection is not created when we approach the limit

Also:

yes, to clarify, i mean we can probably limit the WATCH connections (under 250) for kubelet at server-side to "make room" for the patch calls at client-side. will this help the case?

@yue9944882 - this won't help in general, because it may be valid to have more then 250 connections (if there are more than that many different secrets/configmaps).
Why we don't create a new connection if we approach the limit of streams in a single one?

YueHonghui · 2019-02-27T13:53:00Z

@wojtek-t
Maybe we should upgrade golang version to 1.12 according to g1.12 release notes ?

The Transport no longer handles MAX_CONCURRENT_STREAMS values advertised from HTTP/2 servers as strictly as it did during Go 1.10 and Go 1.11. The default behavior is now back to how it was in Go 1.9: each connection to a server can have up to MAX_CONCURRENT_STREAMS requests active and then new TCP connections are created as needed. In Go 1.10 and Go 1.11 the http2 package would block and wait for requests to finish instead of creating new connections. To get the stricter behavior back, import the golang.org/x/net/http2 package directly and set Transport.StrictMaxConcurrentStreams to true.

wojtek-t · 2019-02-27T14:22:05Z

Yeah - but that doesn't solve the problem of previous releases...

shyamjvs · 2019-02-28T23:54:07Z

One thought besides the actual fixes themselves is we should add scale tests for per-node secrets/configmaps limit, so we can catch such issues in future. At least a kubelet integration test.

liggitt · 2019-03-01T00:00:32Z

Definitely. Where would the best place to do that be?

shyamjvs · 2019-03-01T00:40:37Z

I would think https://github.com/kubernetes/kubernetes/tree/master/test/e2e_node? Actually, our density test in its current form allows setting secretsPerPod - https://github.com/kubernetes/kubernetes/blob/master/test/e2e/scalability/density.go#L550. If we change that to a higher value, we would test this case.

cc @wojtek-t @krzysied - Can we make such change in cluster-loader?

liggitt · 2019-03-01T00:46:22Z

If those are using replicasets then the secrets will not be unique per pod, right? Unique pods, with unique secrets, would more reliably exercise this

shyamjvs · 2019-03-01T01:12:24Z

You're right. It's probably better to do that (though I think chances of overlapping secrets on node will be quite low in case of our large cluster tests with many RCs).

Also, thinking a bit more, it seems like we'll need to test 2 things:

single kubelet is able to handle so many watches (which probably can even be done with just an integration test)
apiserver is able to handle so many connections/streams in large clusters, with each node using the reasonably big number of configs

Seems like first one is more important than second.

liggitt · 2019-03-01T02:57:26Z

the golang http/2 behavior of not opening new connections once the stream limit is reached (fixed in go1.12, unclear yet whether it will be possible to rebuild patch releases of k8s 1.12/1.13 with a new go version)

it looks like since we vendor golang.org/x/net/http2, we actually have to bump our vendored copy to the go1.12 level to pick up the fix. I was able to write a simple integration test against the cached secret manager. On master, it consistently failed after establishing 236 watches. With our vendored http2 copy bumped to go1.12 levels, it worked all the way up to 10,000 distinct watches (which is where I stopped checking)

Watched based strategy has a couple bugs, 1) golang http2 max streams blocking when the stream limit is reached and 2) the kubelet not cleaning up watches for terminated pods. This patch configures the cache based strategy. Once golang 1.12 is in use, and the kubelet patch is merged we can use the watch based strategy. ref: kubernetes/kubernetes#74412 ref: kubernetes/kubernetes#74412 (comment)

xmudrii · 2019-03-07T21:40:34Z

@liggitt @qmfrederik Hello 👋

I'd like to remind that code freeze is starting today PST EOD! ❄️ 🏔️ As far as I see #74781 is punted for 1.15 and somewhat related PR #71501 doesn't have a milestone. There is still #74809, but that doesn't seem to be fixing this issue. Is this issue still relevant to 1.14 or should it be moved to another milestone?

liggitt · 2019-03-07T22:45:38Z

#74755 fixed this issue.

cherrypicks:
1.13: #74841
1.12: #74842

the other PRs are still relevant, but don't block this issue

/close

k8s-ci-robot · 2019-03-07T22:45:40Z

@liggitt: Closing this issue.

In response to this:

#74755 fixed this issue.

cherrypicks:
1.13: #74841
1.12: #74842

the other PRs are still relevant, but don't block this issue

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

qmfrederik · 2019-03-08T09:55:22Z

@liggitt #74841 hasn't been merged yet, so this now got fixed for 1.12 and master, but not 1.13. Is there any chance of getting #74841 merged?

liggitt · 2019-03-08T14:46:06Z

Is there any chance of getting #74841 merged

yes, it is in process

wojtek-t · 2019-03-11T09:01:15Z

Sorry - it was unfortunate that I was OOO exactly whe this was discovered. Reverting (given that it was 3-line change) seemed like the most reasonable option.

Regarding testing that, I actually don't think we need large cluster to test this.
This can be fully tested in small (even 1-node cluster) and i think that's the direction we should go with.
Two weeks ago or sth I added this new job (it was supposed to be experimental): https://testgrid.k8s.io/sig-scalability-node#node-throughput but I think these kinds of jobs are what we should do.
Adding a simple test where we will be starting 500 jobs or sth on 1-node cluster (with non-negligible amount of secrets and/or configmaps should be super simple).
[ @shyamjvs I wouldn't invest into our old-style tests, i would like to deprecate them very soon]

The application-apply of the stx-openstack application on simplex configurations has been failing since the barbican chart was added to the application. The failure was due to lost node status messages from the kubelet to the kube-apiserver, which causes the node to be marked NotReady and endpoints to be removed. The root cause is the kubernetes bug here: kubernetes/kubernetes#74412 In short, the addition of the barbican chart added enough new secrets/configmaps that the kubelet hit the limit of http2-max-streams-per-connection. As done upstream, the fix is to change the following kubelet config: configMapAndSecretChangeDetectionStrategy (from Watch to Cache). Change-Id: Ic816a91984c4fb82546e4f43b5c83061222c7d05 Closes-bug: 1820928 Signed-off-by: Bart Wensley <barton.wensley@windriver.com>

Upgrading from kubernetes 1.13.5 to 1.15.0 meant the config needed to be updated to handle whatever was deprecated or dropped in 1.14 and 1.15. 1) Removed "ConfigMapAndSecretChangeDetectionStrategy = Watch" reported by kubernetes/kubernetes#74412 because this was a golang deficiency, and is fixed by the newer version of golang. 2) Enforced the kubernetes 1.15.3 version 3) Updated v1alpha3 to v1beta2, since alpha3 was dropped in 1.14 changed fields for beta1 and beta2 are mentioned in these docs: https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta1 https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2 4) cgroup validation checking now includes the pids subfolder. 5) Update ceph-config-helper to v1.15 kubernetes compatable This means that the stx-openstack version check needed to be increased Change-Id: Ibe3d5960c5dee1d217d01fbb56c785581dd1b42c Story: 2005860 Task: 35841 Depends-On: https://review.opendev.org/#/c/671150 Signed-off-by: Al Bailey <Al.Bailey@windriver.com>

The application-apply of the stx-openstack application on simplex configurations has been failing since the barbican chart was added to the application. The failure was due to lost node status messages from the kubelet to the kube-apiserver, which causes the node to be marked NotReady and endpoints to be removed. The root cause is the kubernetes bug here: kubernetes/kubernetes#74412 In short, the addition of the barbican chart added enough new secrets/configmaps that the kubelet hit the limit of http2-max-streams-per-connection. As done upstream, the fix is to change the following kubelet config: configMapAndSecretChangeDetectionStrategy (from Watch to Cache). Change-Id: Ic816a91984c4fb82546e4f43b5c83061222c7d05 Closes-bug: 1820928 Signed-off-by: Bart Wensley <barton.wensley@windriver.com>

Upgrading from kubernetes 1.13.5 to 1.15.0 meant the config needed to be updated to handle whatever was deprecated or dropped in 1.14 and 1.15. 1) Removed "ConfigMapAndSecretChangeDetectionStrategy = Watch" reported by kubernetes/kubernetes#74412 because this was a golang deficiency, and is fixed by the newer version of golang. 2) Enforced the kubernetes 1.15.3 version 3) Updated v1alpha3 to v1beta2, since alpha3 was dropped in 1.14 changed fields for beta1 and beta2 are mentioned in these docs: https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta1 https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2 4) cgroup validation checking now includes the pids subfolder. 5) Update ceph-config-helper to v1.15 kubernetes compatable This means that the stx-openstack version check needed to be increased Change-Id: Ibe3d5960c5dee1d217d01fbb56c785581dd1b42c Story: 2005860 Task: 35841 Depends-On: https://review.opendev.org/#/c/671150 Signed-off-by: Al Bailey <Al.Bailey@windriver.com>

qmfrederik added the kind/bug Categorizes issue or PR as related to a bug. label Feb 22, 2019

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Feb 22, 2019

k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 22, 2019

k8s-ci-robot added the sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. label Feb 25, 2019

k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Feb 25, 2019

wojtek-t added this to the v1.14 milestone Feb 27, 2019

yujuhong added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Feb 27, 2019

liggitt mentioned this issue Mar 1, 2019

kubelet watch-manager test, restore watch-based manager default #74781

Merged

oxddr mentioned this issue Mar 1, 2019

Fix secret/configmap management for terminated pods #74809

Merged

DanyC97 mentioned this issue Mar 2, 2019

potential backport on okd 4.0 assuming K8s 1.12 is being targeted openshift/machine-config-operator#520

Closed

rphillips mentioned this issue Mar 4, 2019

kubelet: use cache configMap and secrets change strategy openshift/machine-config-operator#523

Closed

k8s-ci-robot closed this as completed Mar 7, 2019

mfojtik mentioned this issue Mar 8, 2019

defaultconfig: bump http2-max-streams-per-connection to 2000 openshift/cluster-kube-apiserver-operator#332

Merged

skordas mentioned this issue Mar 19, 2019

Concurrent jobs creation with configmap openshift/svt#567

Merged

mtneug mentioned this issue Oct 7, 2019

Vault HTTP2 no cached connection hashicorp/nomad#6433

Closed

skordas mentioned this issue Oct 21, 2019

Concurrent jobs with configmaps openshift-scale/workloads#87

Open

artsyjian mentioned this issue Feb 19, 2021

#WIP #FF fea: Log config changes. artsy/hokusai#236

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Many ConfigMaps and Pods slow down cluster, until it becomes unavailable (since 1.12) #74412

Many ConfigMaps and Pods slow down cluster, until it becomes unavailable (since 1.12) #74412

qmfrederik commented Feb 22, 2019

qmfrederik commented Feb 22, 2019

k8s-ci-robot commented Feb 22, 2019

qmfrederik commented Feb 22, 2019

spiffxp commented Feb 25, 2019

spiffxp commented Feb 25, 2019

wojtek-t commented Feb 25, 2019

YueHonghui commented Feb 26, 2019 •

edited

qmfrederik commented Feb 26, 2019

wojtek-t commented Feb 26, 2019

qmfrederik commented Feb 26, 2019

wojtek-t commented Feb 26, 2019

qmfrederik commented Feb 26, 2019

yue9944882 commented Feb 27, 2019

lavalamp commented Feb 27, 2019

yue9944882 commented Feb 27, 2019 •

edited

YueHonghui commented Feb 27, 2019

MikeSpreitzer commented Feb 27, 2019

wojtek-t commented Feb 27, 2019

qmfrederik commented Feb 27, 2019

wojtek-t commented Feb 27, 2019

YueHonghui commented Feb 27, 2019

wojtek-t commented Feb 27, 2019

shyamjvs commented Feb 28, 2019

liggitt commented Mar 1, 2019

shyamjvs commented Mar 1, 2019

liggitt commented Mar 1, 2019

shyamjvs commented Mar 1, 2019

liggitt commented Mar 1, 2019 •

edited

xmudrii commented Mar 7, 2019

liggitt commented Mar 7, 2019

k8s-ci-robot commented Mar 7, 2019

qmfrederik commented Mar 8, 2019

liggitt commented Mar 8, 2019

wojtek-t commented Mar 11, 2019

Many ConfigMaps and Pods slow down cluster, until it becomes unavailable (since 1.12) #74412

Many ConfigMaps and Pods slow down cluster, until it becomes unavailable (since 1.12) #74412

Comments

qmfrederik commented Feb 22, 2019

qmfrederik commented Feb 22, 2019

k8s-ci-robot commented Feb 22, 2019

qmfrederik commented Feb 22, 2019

spiffxp commented Feb 25, 2019

spiffxp commented Feb 25, 2019

wojtek-t commented Feb 25, 2019

YueHonghui commented Feb 26, 2019 • edited

qmfrederik commented Feb 26, 2019

wojtek-t commented Feb 26, 2019

qmfrederik commented Feb 26, 2019

wojtek-t commented Feb 26, 2019

qmfrederik commented Feb 26, 2019

yue9944882 commented Feb 27, 2019

lavalamp commented Feb 27, 2019

yue9944882 commented Feb 27, 2019 • edited

YueHonghui commented Feb 27, 2019

MikeSpreitzer commented Feb 27, 2019

wojtek-t commented Feb 27, 2019

qmfrederik commented Feb 27, 2019

wojtek-t commented Feb 27, 2019

YueHonghui commented Feb 27, 2019

wojtek-t commented Feb 27, 2019

shyamjvs commented Feb 28, 2019

liggitt commented Mar 1, 2019

shyamjvs commented Mar 1, 2019

liggitt commented Mar 1, 2019

shyamjvs commented Mar 1, 2019

liggitt commented Mar 1, 2019 • edited

xmudrii commented Mar 7, 2019

liggitt commented Mar 7, 2019

k8s-ci-robot commented Mar 7, 2019

qmfrederik commented Mar 8, 2019

liggitt commented Mar 8, 2019

wojtek-t commented Mar 11, 2019

YueHonghui commented Feb 26, 2019 •

edited

yue9944882 commented Feb 27, 2019 •

edited

liggitt commented Mar 1, 2019 •

edited