Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large kubemark performance tests failing with timeout during ns deletion #53327

Closed
shyamjvs opened this issue Oct 2, 2017 · 81 comments
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@shyamjvs
Copy link
Member

shyamjvs commented Oct 2, 2017

We've been seeing these failures continuously in kubemark-5000 for quite some now - https://k8s-testgrid.appspot.com/sig-scalability#kubemark-5000
Even in kubemark-500 we're occasionally seeing flakes - https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-kubemark-500-gce/8806

On first look I'm seeing that hollow-nodes are going pending during test namespace deletion:

I1002 02:45:51.134] Oct  2 02:45:51.134: INFO: POD                     NODE               PHASE    GRACE  CONDITIONS
I1002 02:45:51.135] Oct  2 02:45:51.134: INFO: load-small-14408-4r45s  hollow-node-lmblz  Pending  1s     [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-10-01 20:26:46 +0000 UTC  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2017-10-01 20:26:46 +0000 UTC ContainersNotReady containers with unready status: [load-small-14408]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-10-01 20:26:46 +0000 UTC  }]
I1002 02:45:51.135] Oct  2 02:45:51.134: INFO: load-small-14408-q8tfd  hollow-node-lm9c4  Pending  1s     [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-10-01 20:26:46 +0000 UTC  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2017-10-01 20:26:46 +0000 UTC ContainersNotReady containers with unready status: [load-small-14408]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-10-01 20:26:46 +0000 UTC  }]

Digging into it now. Sorry for the delay, I've been busy with release scalability validation.

cc @kubernetes/sig-scalability-bugs @wojtek-t @gmarek

@shyamjvs shyamjvs self-assigned this Oct 2, 2017
@k8s-ci-robot k8s-ci-robot added sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. kind/bug Categorizes issue or PR as related to a bug. labels Oct 2, 2017
@shyamjvs shyamjvs changed the title Large kubemark performance tests failing with timeout during deletion Large kubemark performance tests failing with timeout during ns deletion Oct 2, 2017
@shyamjvs shyamjvs added kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/P0 labels Oct 2, 2017
@shyamjvs
Copy link
Member Author

shyamjvs commented Oct 2, 2017

I'm seeing multiple such error lines from hollow-kubelet logs of currently live cluster:

E1002 13:41:13.582577       7 kubelet.go:1220] Image garbage collection failed multiple times in a row: invalid capacity 0 on image filesystem
E1002 13:41:13.998714       7 kubelet_network.go:305] Failed to ensure that nat chain KUBE-MARK-DROP exists: error creating chain "KUBE-MARK-DROP": executable file not found in $PATH: 
E1002 13:41:14.435637       7 eviction_manager.go:238] eviction manager: unexpected err: failed to get root cgroup stats: failed to get cgroup stats for "/": unexpected number of containers: 0
E1002 13:41:24.435805       7 eviction_manager.go:238] eviction manager: unexpected err: failed to get root cgroup stats: failed to get cgroup stats for "/": unexpected number of containers: 0

@kubernetes/sig-node-bugs @yujuhong @Random-Liu - Any ideas?

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. kind/bug Categorizes issue or PR as related to a bug. labels Oct 2, 2017
@shyamjvs
Copy link
Member Author

shyamjvs commented Oct 2, 2017

Seems like kubelet's ImageGCPeriod is 5m and I'm seeing such logs every 5 min - so it's pretty much failing all the time.

Fyi - I'm also seeing one such error log once at the start:

E1002 04:31:13.470857       7 eviction_manager.go:238] eviction manager: unexpected err: failed to get node info: node 'hollow-node-5n2f6' not found

[Edit]: Neglect this error.. It's only seen initially when node info is not available yet. Later it goes fine and reaches next stage where root cgroup stats are fetched but that too fails (as I think hollow-kubelet doesn't expose stats). But these errors are unrelated to our problem iiuc.

@shyamjvs
Copy link
Member Author

shyamjvs commented Oct 3, 2017

Ok.. So here's what I found out from the live cluster:

  • Few namespaces are in terminating state forever during the test's deletion phase and then the job eventually is timing out.
  • Those namespaces are in that state because they have some density test pod that is in terminating state forever:
e2e-tests-density-30-10-r6h65   density-latency-pod-1309-sjgj8   0/1       Terminating   0          10h
e2e-tests-density-30-35-fmg9s   density-latency-pod-3284-zl2gp   0/1       Terminating   0          9h
e2e-tests-density-30-36-c525x   density-latency-pod-2735-rrdnk   0/1       Terminating   0          9h
e2e-tests-density-30-8-kd2sc    density-latency-pod-1307-4fj7d   0/1       Terminating   0          10h
...
  • Looking at one such pod, I found that the pod deletion timestamp had been marked and termination grace period set to 30s (but the pod was never deleted - even hours after the timestamp):
Name:				density-latency-pod-1309-sjgj8
Namespace:			e2e-tests-density-30-10-r6h65
Node:				hollow-node-zvx46/10.64.69.26
Start Time:			Mon, 02 Oct 2017 06:25:33 +0000
Labels:				name=density-latency-pod-1309
				type=density-latency-pod
Annotations:			kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"e2e-tests-density-30-10-r6h65","name":"density-latency-pod-1309","uid"...
Status:				Terminating (expires Mon, 02 Oct 2017 06:26:03 +0000)
Termination Grace Period:	30s
IP:				
Created By:			ReplicationController/density-latency-pod-1309
Controlled By:			ReplicationController/density-latency-pod-1309
  • Seems like the GC didn't ensure the pod's deletion. Though from the controller-manager logs, here's what I found:
I1002 05:34:50.054625       9 replication_controller.go:450] Too few "e2e-tests-density-30-10-r6h65"/"density-latency-pod-1309" replicas, need 1, creating 1
I1002 05:34:50.056816       9 event.go:218] Event(v1.ObjectReference{Kind:"ReplicationController", Namespace:"e2e-tests-density-30-10-r6h65", Name:"density-latency-pod-1309", UID:"68bcbf97-a733-11e7-a55f-42010a8e0fd3", APIVersion:"v1", ResourceVersion:"2846356", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: density-latency-pod-1309-26pgw
I1002 06:25:29.924919       9 garbagecollector.go:375] processing item [v1/Pod, namespace: e2e-tests-density-30-10-r6h65, name: density-latency-pod-1309-26pgw, uid: 68be2c02-a733-11e7-a55f-42010a8e0fd3]
I1002 06:25:29.924946       9 replication_controller.go:597] Replication Controller has been deleted e2e-tests-density-30-10-r6h65/density-latency-pod-1309
I1002 06:25:29.926630       9 garbagecollector.go:479] delete object [v1/Pod, namespace: e2e-tests-density-30-10-r6h65, name: density-latency-pod-1309-26pgw, uid: 68be2c02-a733-11e7-a55f-42010a8e0fd3] with propagation policy Background
I1002 06:25:33.260880       9 replication_controller.go:450] Too few "e2e-tests-density-30-10-r6h65"/"density-latency-pod-1309" replicas, need 1, creating 1
I1002 06:25:33.267788       9 event.go:218] Event(v1.ObjectReference{Kind:"ReplicationController", Namespace:"e2e-tests-density-30-10-r6h65", Name:"density-latency-pod-1309", UID:"68bcbf97-a733-11e7-a55f-42010a8e0fd3", APIVersion:"v1", ResourceVersion:"2847363", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: density-latency-pod-1309-sjgj8
E1002 06:25:33.270014       9 replication_controller.go:421] replicationcontrollers "density-latency-pod-1309" not found
E1002 06:25:33.505893       9 replication_controller.go:421] replicationcontrollers "density-latency-pod-1309" not found
I1002 06:25:33.753529       9 garbagecollector.go:375] processing item [v1/Pod, namespace: e2e-tests-density-30-10-r6h65, name: density-latency-pod-1309-sjgj8, uid: 7ea33bf3-a73a-11e7-a55f-42010a8e0fd3]
I1002 06:25:33.756248       9 garbagecollector.go:479] delete object [v1/Pod, namespace: e2e-tests-density-30-10-r6h65, name: density-latency-pod-1309-sjgj8, uid: 7ea33bf3-a73a-11e7-a55f-42010a8e0fd3] with propagation policy Background
I1002 06:25:36.664103       9 replication_controller.go:597] Replication Controller has been deleted e2e-tests-density-30-10-r6h65/density-latency-pod-1309
  • Strangely the replication_controller created this pod even after the corresponding rc was deleted (see that the earlier replica density-latency-pod-1309-26pgw was also deleted). And then it was almost immediately marked for deletion by the GC.
  • I manually tried deleting the pod multiple times, and each time it says successfully deleted pod - though the pod is still there in the same 'Terminating' state.

Smells like a bug in GC. Also it doesn't look right that we can have a pod in such a 'forever undeletable' state.

@gmarek Any idea about this? Or whom should I be cc'ing here?

@gmarek
Copy link
Contributor

gmarek commented Oct 3, 2017

I think it's a desired behavior if Kubelet is unresponsive. We stopped issuing 'hard deletes' from the control place because stateful sets @smarterclayton. Now control plane always only marks Pod for deletion and waits for kubelet to clean up resources and ack the delete sending the second delete request.

@shyamjvs
Copy link
Member Author

shyamjvs commented Oct 3, 2017

fyi - The corresponding hollow-kubelet was alive when I checked.

@gmarek
Copy link
Contributor

gmarek commented Oct 3, 2017

Then we need logs and involve @dchen1107, @yujuhong or someone else from the Node team.

@shyamjvs
Copy link
Member Author

shyamjvs commented Oct 3, 2017

Let me turn on hollow-node logs for 5k with --v=2 for kubelet (would that be sufficient @yujuhong?)

@gmarek
Copy link
Contributor

gmarek commented Oct 3, 2017

My guess is that it's hard to tell up front;) Certainly won't hurt to have more logs.

@wojtek-t
Copy link
Member

wojtek-t commented Oct 3, 2017

Are we sure that nodes are not cpu-starved? I think we've seen something like that in the past in such scenario.

@shyamjvs
Copy link
Member Author

shyamjvs commented Oct 3, 2017

I'm seeing the following logs throughout the test from real kubelet:

I1003 05:00:17.859851    1684 eviction_manager.go:325] eviction manager: no resources are starved

The hollow-kubelet resource usages are not directly available from the logs, but took a look from the live cluster now and seems normal so far (I'll look again while deletion).

@shyamjvs
Copy link
Member Author

I digged deeper into this today and here's what I found:

  • Earlier when kubemark-scale had quite some green runs in a row, the runs took lesser time consistently (~15hr) and then it increased (~18hr) which is when we also start seeing many failures
  • This increase seems to be due to:
    • If load test is running first, the namespace deletion is leading to timeout (due to pods remaining undeleted)
    • If density test is running first, the deletion phase is taking ~3h longer than before (8m -> 2h (for deleting latency pods) and 1h -> 1h40m (for deleting saturation pods))
  • This jump in time is seen across runs 614 and 615 (ref: https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/logs/ci-kubernetes-kubemark-gce-scale?before=624)
  • Here's the diff of commits across those runs

@shyamjvs
Copy link
Member Author

Will look at other smaller scale kubemark jobs too if we're seeing a similar issue there.

@shyamjvs
Copy link
Member Author

Ok.. It seems like there's also a bump in time for kubemark-500 job (from 47m -> 1h) across runs 8256 and 8257 (https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/logs/ci-kubernetes-kubemark-500-gce?before=8275). And this was around the same time we saw the bump for kubemark-5000. This is great, we now have a much smaller diff.

@shyamjvs
Copy link
Member Author

We're also seeing a small difference for kubemark-100 (from 40m -> 45m) across runs 4369 and 4370 (https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/logs/ci-kubernetes-kubemark-100-gce?before=4385) but the diff is the same. From the diff:

@wojtek-t
Copy link
Member

@shyamjvs - since this is visible in kubemark-500, can you please locally revert the PR you suspect and verify if it's that?

@shyamjvs
Copy link
Member Author

Yup.. Doing that now.

@shyamjvs
Copy link
Member Author

Some updates:

  • The increase in time of deletion phase seems to not be there anymore in the newer runs of kubemark-scale. Looking at kubemark-500, it seems to have restored across runs 8844 and 8845 (https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/logs/ci-kubernetes-kubemark-500-gce?before=8863) and the diff suggests @dashpole 's kubelet-gc fix helped.
  • Interestingly, his PR also fixed this issue with kubemark (besides the thundering herd issue which wasn't seen on kubemark). @dashpole - Any idea why that happened? Btw - I'm seeing that you defined a fake podDeletionProvider used by fake kubelet runtime-manager, however we don't seem to be using that in hollow-kubelet
  • We're seeing a second regression now (which started slightly before the above was fixed). The time taken by cluster saturation phase in density test went up from 50m -> 7h. Also observed in kubemark-500 at a smaller scale (2m40s -> 5m20s). I'm now locally checking the diff.

@shyamjvs
Copy link
Member Author

shyamjvs commented Oct 11, 2017

Unable to reproduce the difference locally. Also can't see any obvious test-infra changes across those runs.
Will try to locally reproduce test-infra environment as much as possible.

@wojtek-t
Copy link
Member

As discussed offline, this is clearly related to scheduler:

  • before it was able to scheduler 100pods/s
  • after it is only scheduling 50pods/s
    [both consistently]

It seems to me that this is qps-related issue.

BTW - looking into kubemark-scale it seems there might be yet another regression in scheduler.

@kubernetes/sig-scheduling-bugs

@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Oct 11, 2017
@shyamjvs
Copy link
Member Author

shyamjvs commented Nov 28, 2017

@dashpole From the above, I believe that the problem is not that the killContainer() failed with error for that container. But instead the container was not even seen as part of the pod (because the container ID field was empty) which probably explains why you couldn't see those logs. It was because the for loop you referred to was over an empty list (which didn't include the container at all).

Does that sound reasonable to you?

@dashpole
Copy link
Contributor

Its a bug in the fake docker client filtering. Thanks to @Random-Liu for helping debug this with me

@shyamjvs
Copy link
Member Author

shyamjvs commented Nov 28, 2017

Ok... I think I've found the cause. Before deleting a pod, kubelet is checking if all containers have been deleted (and this change was introduced by @dashpole in his PR #50350). However the way we're checking that all containers have been deleted is by checking that the length of the pod's containerStatuses field is empty list. However, this is missing the case when that list is non-empty but all the entries of that list correspond to non-existent containers (i.e ones which have empty containerID field as they don't correspond to any real container). We should still allow deletion of the pod in such case.

This is a bug IMO. I'll send out a fix for this. It should most likely also solve this regression.

@shyamjvs
Copy link
Member Author

Sorry.. Just saw your above comment after posting mine. What did you find out?
Also, the thing that I explained in my comment seems to be a bug. WDYT?

@dashpole
Copy link
Contributor

where do you see that non-existent containers have an empty containerID?

@dashpole
Copy link
Contributor

oh, I see

@dashpole
Copy link
Contributor

So the key is that we are not looking at the status in the status manager. We were looking at the containerStatuses from the status provided by the runtime.
The reason that the container status is completely empty is because it was set by the status manager here.
The status from the runtime is based on what is returned by ListContainers in the fake_client. This should still exist even though the status in the status manager has been overridden.

@dashpole
Copy link
Contributor

The other important thing to note here, is that the PodResourcesAreReclaimed function and the HandlePodCleanups function use different caches. The podCache (used by PodResourcesAreReclaimed) is just based on the most recent PLEG events, whereas the runtimeCache (used by HandlePodCleanups) is populated by querying the runtime. Hence the bug in filtering means we do not see the container in HandlePodCleanups, but we do see it in PodResourcesAreReclaimed.

@shyamjvs
Copy link
Member Author

shyamjvs commented Nov 28, 2017

Discussed offline with @dashpole and he updated his PR to fix the bug in the docker client mock. Turns out my guess in #53327 (comment) was right that there are issues on the mock side :) Also, I closed my PR as it's not needed.

@dashpole Let's get your change in asap, so we can have some result by tomorrow. Hoping for a green run.

k8s-github-robot pushed a commit that referenced this issue Nov 29, 2017
Automatic merge from submit-queue (batch tested with PRs 55893, 55906, 55026). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

[Test Fix] Mock docker network dependencies and fix filtering bug

This PR only affects the mocked docker runtime, and has no impact on the kubelet.

Issue #53327 

When kubernetes creates a pod using the docker shim, it creates a container which contains the pod's network namespace, and then creates containers which specify that namespace.
The current mocked docker does not mock this interaction, and thus allows a container to be created even when the container whose network it is joining does not exist.
This allows the mocked kubelet to end up in a state where the pod does not exist, but a container in the pod does, and this breaks pod deletion.

This fixes the above by only allowing containers to be started if the container whose network it is trying to join is running.

Additionally, this PR fixes a filtering bug where we were incorrectly comparing docker container statuses.

/assign @shyamjvs 
can you test this to see if it fixes the issue?
/assign @Random-Liu 
for approval after @shyamjvs confirms this works.
@shyamjvs
Copy link
Member Author

Closing this as we finally have the kubemark-5000 job green - https://k8s-testgrid.appspot.com/sig-scalability-kubemark#kubemark-5000 (run 737)
Thanks everyone for the effort!

/close

@shyamjvs
Copy link
Member Author

Seems like this issue is seen again in the following run - https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-kubemark-gce-scale/738
It was a timeout while deleting namespace and the build-log suggests that it's because of undeleted pods. Reopening the bug for further investigation.

cc @dashpole

@shyamjvs shyamjvs reopened this Nov 30, 2017
@shyamjvs
Copy link
Member Author

shyamjvs commented Nov 30, 2017

My strong feeling is that your fix is working almost as desired, except sometimes by chance it is failing to delete certain containers. I suspect if your toDockerContainerStatus() function is handling all the cases properly (wrt container states)?

@shyamjvs
Copy link
Member Author

Sadly, we don't have the hollow-kubelet logs for kubemark-5000 as we're not collecting them due to their explosive volume.

@dashpole
Copy link
Contributor

The function is based on toDockerContainerStatus and toRuntimeAPIContainerState. It should be correct. It is possible there is another bug somewhere.

@shyamjvs
Copy link
Member Author

shyamjvs commented Dec 4, 2017

@dashpole I managed to get one of the hollow-kubelet logs from the currently running kubemark-5000 cluster where we're seeing a forever undeleted pod 'load-medium-1010-gh8nw'. Attaching it here -

kubelet-hollow-node-gnc8w.log

And the logs concerning to that particular pod are:

I1204 14:46:20.998627       9 kubelet.go:1836] SyncLoop (ADD, "api"): "load-medium-1010-gh8nw_e2e-tests-load-30-nodepods-11-ksdr9(e428a834-d901-11e7-9448-42010a8e0fd3)"
I1204 14:46:21.102577       9 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-xchk5" (UniqueName: "kubernetes.io/secret/e428a834-d901-11e7-9448-42010a8e0fd3-default-token-xchk5") pod "load-medium-1010-gh8nw" (UID: "e428a834-d901-11e7-9448-42010a8e0fd3") 
I1204 14:46:21.202808       9 reconciler.go:262] operationExecutor.MountVolume started for volume "default-token-xchk5" (UniqueName: "kubernetes.io/secret/e428a834-d901-11e7-9448-42010a8e0fd3-default-token-xchk5") pod "load-medium-1010-gh8nw" (UID: "e428a834-d901-11e7-9448-42010a8e0fd3") 
I1204 14:46:21.208597       9 operation_generator.go:522] MountVolume.SetUp succeeded for volume "default-token-xchk5" (UniqueName: "kubernetes.io/secret/e428a834-d901-11e7-9448-42010a8e0fd3-default-token-xchk5") pod "load-medium-1010-gh8nw" (UID: "e428a834-d901-11e7-9448-42010a8e0fd3") 
I1204 14:46:21.290338       9 kubelet.go:1852] SyncLoop (DELETE, "api"): "load-medium-1010-gh8nw_e2e-tests-load-30-nodepods-11-ksdr9(e428a834-d901-11e7-9448-42010a8e0fd3)"
I1204 14:46:21.299284       9 kuberuntime_manager.go:385] No sandbox for pod "load-medium-1010-gh8nw_e2e-tests-load-30-nodepods-11-ksdr9(e428a834-d901-11e7-9448-42010a8e0fd3)" can be found. Need to start a new one
I1204 14:46:21.359059       9 kubelet_pods.go:1098] Killing unwanted pod "load-medium-1010-gh8nw"
I1204 14:46:21.795311       9 kubelet.go:1881] SyncLoop (PLEG): "load-medium-1010-gh8nw_e2e-tests-load-30-nodepods-11-ksdr9(e428a834-d901-11e7-9448-42010a8e0fd3)", event: &pleg.PodLifecycleEvent{ID:"e428a834-d901-11e7-9448-42010a8e0fd3", Type:"ContainerDied", Data:"d26c6dc1df8a493f"}
E1204 14:46:21.897191       9 pod_workers.go:186] Error syncing pod e428a834-d901-11e7-9448-42010a8e0fd3 ("load-medium-1010-gh8nw_e2e-tests-load-30-nodepods-11-ksdr9(e428a834-d901-11e7-9448-42010a8e0fd3)"), skipping: failed to "StartContainer" for "load-medium-1010" with RunContainerError: "failed to start container \"4d3a382426ddc7b9\": failed to start container \"4d3a382426ddc7b9\": Error response from daemon: cannot join network of a non running container: d26c6dc1df8a493f"
I1204 14:46:23.332238       9 kubelet_pods.go:1098] Killing unwanted pod "load-medium-1010-gh8nw"

Clearly we're not seeing those 'pod is terminated, but some containers have not been cleaned up' msgs forever (like we were seeing earlier) - which seems to be because of your fix. However, the pod still remains undeleted. My suspicion is sth around the network-related hack you added in the hollow-kubelet as part of the fix. Could you PTAL?

@dashpole
Copy link
Contributor

dashpole commented Dec 4, 2017

Looking...

@dashpole
Copy link
Contributor

dashpole commented Dec 4, 2017

This looks like the culprit log line: remote_runtime.go:246] RemoveContainer "4d3a382426ddc7b9" from runtime service failed: rpc error: code = Unknown desc = failed to remove container "4d3a382426ddc7b9": container not stopped
It looks like even though we are returning an error when starting the container, we have already added that container to the "RunningContainers" list in the fake client during CreateContainer.
I spoke with @Random-Liu, and we think the best solution for now is to change ListContainers to remove those containers which have not been started when options.All is not set. I will post a PR.

@dashpole
Copy link
Contributor

dashpole commented Dec 4, 2017

Ref: #56821

k8s-github-robot pushed a commit that referenced this issue Dec 5, 2017
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

[Test Fix] Fake docker client can remove containers which have not been started

**What this PR does / why we need it**:
During kubemark-5000 scalability tests, which use the fake docker client, we encountered a bug where containers where the pod was deleted before the container was started could not be deleted.
This is because we only remove pods from the `ExitedContainers` list.  Containers are only added to this when they have been created, started, and then stopped.  However, containers that have only been created, but not started cannot be deleted.  This PR fixes this issue by allowing containers with `State.Running=false` to be deleted.

**Which issue(s) this PR fixes**:
Ref #53327 

**Release note**:
```release-note
NONE
```
/sig node
/kind bug
/priority critical-urgent
/assign @Random-Liu @dchen1107 @shyamjvs
@shyamjvs
Copy link
Member Author

The job is now green - https://k8s-testgrid.appspot.com/sig-scalability-kubemark#kubemark-5000
We've had 3 successful runs (750, 752, 753). 751 failed, but that's due to failure while creating master (might help once @porridge moves the job to us-east1-b?, if he hasn't already).

Closing this issue. Thanks!

@porridge
Copy link
Member

porridge commented Dec 11, 2017 via email

@porridge
Copy link
Member

porridge commented Dec 11, 2017 via email

akhilerm pushed a commit to akhilerm/apimachinery that referenced this issue Sep 20, 2022
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Optimize random string generator to avoid multiple locks & use bit-masking

Ref kubernetes/kubernetes#53327

We recently started seeing a 50% decrease in scheduling throughput (for e.g in kubemark-500 scale job) and turns out kubernetes/kubernetes#53135 introduced it.
The reason is [this call](https://github.com/jsafrane/kubernetes/blob/2caae38d323720a96be34606589f41488ba82b87/plugin/pkg/scheduler/algorithm/predicates/predicates.go#L272) to create a random 32-length string.
From the code of the `rand` utility (which is being heavily used throughout the system for randomizing object names), I noticed following performance issues:
- to create an n-length string, we are making n calls to `rand.Intn()` each of which does a lock+unlock operation on the RNG.. while just 1 lock+unlock operation is enough for all
- we're choosing one character (from an alphabet of 27 chars) per each random integer.. while we can select 10 characters using a single int63 (by masking and bit-shifting) as 1 character uses just 5 bits of randomness
- the character set is defined as a global slice (mutable), so the compiler needs to fetch length of the slice on each invocation to `len()` (we're making n of those).. while we can just use a const string (immutable) which will make len directly available as a cached constant (yes, go does it!)

This PR is making the above fixes. I'll try to add some benchmarking to measure the difference (as @wojtek-t suggested).

/cc @kubernetes/sig-scalability-misc @kubernetes/sig-scheduling-bugs @kubernetes/sig-api-machinery-misc @wojtek-t @smarterclayton

Kubernetes-commit: f1d9962fec2ee2d52f41bcbc14b920a3ab66d9a9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

No branches or pull requests