Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't try to create VolumeSpec immediately after underlying PVC is being deleted #86670

Closed
wants to merge 1 commit into from

Conversation

tedyu
Copy link
Contributor

@tedyu tedyu commented Dec 27, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
As @mm4tt mentioned in #86434, when PVC is being deleted, desired_state_of_world_populator shouldn't try to create VolumeSpec.

This PR is a performance bug fix.

The fix is rebased on @jingxu97 's comment:
#86670 (comment)

Original proposal is to add map pvcsUnderDeletion to desiredStateOfWorldPopulator.
The map is from namespaced claim name to the earliest time when next GET request to api-server is made.
The proposal is able to handle generic api server error return and limits the change within desired_state_of_world_populator.go

Which issue(s) this PR fixes:
Fixes #86434

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 27, 2019
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 27, 2019
@tedyu
Copy link
Contributor Author

tedyu commented Dec 27, 2019

/test pull-kubernetes-e2e-gce

@tedyu
Copy link
Contributor Author

tedyu commented Dec 27, 2019

/cc @msau42

Copy link
Contributor

@mattjmcnaughton mattjmcnaughton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this bug :)

I left some stylistic comments below.

I also have two requests :)

  1. I'd love to see some form of test coverage for this change.
  2. Can you share more about how we can be confident that we're updating pvcBeingDeleted everywhere that we need to? These types of changes always make me nervous that either we're missing a place where we should be updating pvcBeingDeleted, or more likely, in six months someone will make a change and forget to update pvcBeingDeleted.

@tedyu
Copy link
Contributor Author

tedyu commented Dec 28, 2019

I renamed the map pvcsUnderDeletion (using plural) to distinguish from the constant.

For test coverage, after running through desired_state_of_world_populator_test.go, I haven't found how the following would be hit:

		if pvc.ObjectMeta.DeletionTimestamp != nil {
			return nil, errors.New(pvcBeingDeleted)

I think the best proof is the (significantly) reduced QPS in load test (ref #86434).

For #2, findAndRemoveDeletedPods is the only func which handles pod removal from volume.
Cleansing pvcsUnderDeletion is covered.

@tedyu
Copy link
Contributor Author

tedyu commented Dec 28, 2019

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 28, 2019
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 14, 2020
@tedyu
Copy link
Contributor Author

tedyu commented Jan 14, 2020

@jingxu97
Can you look at desired_state_of_world_test.go and see if the changes are good.

Once you confirm, I will replicate the changes to other tests.

@yutedz yutedz force-pushed the pvc-being-del branch 2 times, most recently from 026c470 to 415ac40 Compare January 14, 2020 22:14
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 14, 2020
@tedyu
Copy link
Contributor Author

tedyu commented Jan 14, 2020

/test pull-kubernetes-kubemark-e2e-gce-big

@tedyu
Copy link
Contributor Author

tedyu commented Jan 15, 2020

I was looking at failed storage test in https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/86670/pull-kubernetes-e2e-gce/1217223508564643840/

From kubelet log in https://gcsweb.k8s.io/gcs/kubernetes-jenkins/pr-logs/pull/86670/pull-kubernetes-e2e-gce/1217223508564643840/artifacts/e2e-856a555d1e-674b9-minion-group-02tq/

I0114 23:39:01.972740    1355 operation_generator.go:647] MountVolume.SetUp succeeded for volume "default-token-9s5qj" (UniqueName: "kubernetes.io/secret/b3ab2900-06bc-46c9-8915-2ddac072a92b-default-token-9s5qj") pod "security-context-f563ac6a-291f-401c-b301-d67f3b3d3e45" (UID: "b3ab2900-06bc-46c9-8915-2ddac072a92b")
...
I0114 23:41:02.054635    1355 kubelet.go:1930] SyncLoop (DELETE, "api"): "security-context-f563ac6a-291f-401c-b301-d67f3b3d3e45_persistent-local-volumes-test-6988(b3ab2900-06bc-46c9-8915-2ddac072a92b)"

I0114 23:41:02.125883    1355 status_manager.go:551] Status for pod "security-context-f563ac6a-291f-401c-b301-d67f3b3d3e45_persistent-local-volumes-test-6988(b3ab2900-06bc-46c9-8915-2ddac072a92b)" updated successfully: (2, {Phase:Pending Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-01-14 23:39:01 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-01-14 23:39:01 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [write-pod]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-01-14 23:39:01 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [write-pod]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-01-14 23:39:01 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.40.0.3 PodIP: PodIPs:[] StartTime:2020-01-14 23:39:01 +0000 UTC InitContainerStatuses:[] ContainerStatuses:[{Name:write-pod State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:,Message:,StartedAt:0001-01-01 00:00:00 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,ContainerID:,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:docker.io/library/busybox:1.29 ImageID: ContainerID: Started:0xc001453400}] QOSClass:BestEffort EphemeralContainerStatuses:[]})
I0114 23:41:02.126093    1355 kubelet_pods.go:934] Pod "security-context-f563ac6a-291f-401c-b301-d67f3b3d3e45_persistent-local-volumes-test-6988(b3ab2900-06bc-46c9-8915-2ddac072a92b)" is terminated, but some volumes have not been cleaned up

@tedyu
Copy link
Contributor Author

tedyu commented Jan 15, 2020

From kubelet.log in https://gcsweb.k8s.io/gcs/kubernetes-jenkins/pr-logs/pull/86670/pull-kubernetes-e2e-gce/1217254343355404288/artifacts/e2e-c322eb374e-674b9-minion-group-24bm/

I0115 01:56:34.902488    1428 factory.go:177] Factory "containerd" was unable to handle container "/kubepods/besteffort/pod1e617c1f-5568-4c33-aa85-2c9483ad37d1/919e7ebfd5db866149bd52b8f5d3d13bcb50a83afd6f5ad4c556a15e3828998d"
I0115 01:56:34.899460    1428 kuberuntime_container.go:821] Removing container "39d1ac2d44c7e4c7db2fa7aad9e0e8da5eac0f1d25a25e280fc04bb7c9212a48"
E0115 01:56:34.899583    1428 remote_runtime.go:222] StartContainer "aaa68b7792ef66104be4c03734a721739ffd72b0810db6a4b82184869cdc2076" from runtime service failed: rpc error: code = Unknown desc = failed to start container "aaa68b7792ef66104be4c03734a721739ffd72b0810db6a4b82184869cdc2076": Error response from daemon: OCI runtime start failed: container process is already dead: unknown
E0115 01:56:34.904209    1428 kuberuntime_manager.go:801] container start failed: RunContainerError: failed to start container "aaa68b7792ef66104be4c03734a721739ffd72b0810db6a4b82184869cdc2076": Error response from daemon: OCI runtime start failed: container process is already dead: unknown

Should be related to:
#86312

@tedyu
Copy link
Contributor Author

tedyu commented Jan 15, 2020

/test pull-kubernetes-e2e-gce

1 similar comment
@tedyu
Copy link
Contributor Author

tedyu commented Jan 15, 2020

/test pull-kubernetes-e2e-gce

@tedyu
Copy link
Contributor Author

tedyu commented Jan 15, 2020

@msau42 @jingxu97
The following two tests frequently failed (in various test runs):

[sig-storage] PersistentVolumes-local [Volume type: dir-bindmounted] One pod requesting one prebound PVC should be able to mount volume and write from pod1 expand_more 2m32s

[sig-storage] PersistentVolumes-local [Volume type: dir-link] One pod requesting one prebound PVC should be able to mount volume and read from pod1 expand_more

I have verified that calling desiredStateOfWorld.GetUniqueVolumeName() versus the getUniqueVolumeName() in dswp makes no difference.

Let me see if I can find out why the above two tests don't pass.

@tedyu
Copy link
Contributor Author

tedyu commented Jan 23, 2020

/test pull-kubernetes-e2e-gce

1 similar comment
@tedyu
Copy link
Contributor Author

tedyu commented Jan 23, 2020

/test pull-kubernetes-e2e-gce

@jingxu97
Copy link
Contributor

@tedyu thanks, pls let me know if I can help anything.

@wojtek-t
Copy link
Member

@tedyu - any progress on that? I think this would be really useful to have - we're hitting it a lot in scalability tests (I know, we can change the tests, but fixing that seems more appropriate in general)

@tedyu
Copy link
Contributor Author

tedyu commented Feb 13, 2020

@wojtek-t
Pardon for the late reply.
This PR is almost ready - just need to get a green run of e2e-gce.

@tedyu
Copy link
Contributor Author

tedyu commented Feb 13, 2020

/test pull-kubernetes-e2e-gce

@k8s-ci-robot
Copy link
Contributor

@tedyu: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce 0066184 link /test pull-kubernetes-e2e-gce

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@tedyu
Copy link
Contributor Author

tedyu commented Feb 14, 2020

in favor of #88141

@tedyu tedyu closed this Feb 14, 2020
@@ -326,10 +327,16 @@ func (dswp *desiredStateOfWorldPopulator) processPodVolumes(
allVolumesAdded = false
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this continue will skip the rest if it fails to createVolumeSpec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If volumeSpec is not useful, the subsequent GetUniqueVolumeName() call shouldn't be made. Hence the continue.

If you know a way to bypass using volumeSpec, please let me know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

podVolume.Name is of the form: dswp-test-volume-name

uniqueVolumeName is of the form: fake-plugin/dswp-test-volume-name

I cannot use podVolume.Name in place of uniqueVolumeName

if err == nil {
// Add volume to desired state of world
_, err = dswp.desiredStateOfWorld.AddPodToVolume(
uniquePodName, pod, uniqueVolumeName, volumeSpec, podVolume.Name, volumeGidValue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need to change AddPodToVolume interface actually.
The previous GetUniqueVolumeName is only called if createVolumeSpec failed.
Here only after createVolumeSpec passes, it calls AddPodToVolume.

Sorry for the confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mentioned, GetUniqueVolumeName is called when createVolumeSpec fails. It seems the refactoring in desiredStateOfWorld is useful.
I'd like to keep it if possible.

@@ -271,8 +280,8 @@ func (dsw *desiredStateOfWorld) AddPodToVolume(
dsw.volumesToMount[volumeName] = volumeToMount{
volumeName: volumeName,
podsToMount: make(map[types.UniquePodName]podToMount),
pluginIsAttachable: attachable,
pluginIsDeviceMountable: deviceMountable,
pluginIsAttachable: dsw.isAttachableVolume(volumeSpec),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you don't change the interface, here you call GetUniqueVolumeName, and that function could return uniquename, and also pluginIsAttachable and pluginIsDeviceMountable information, I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kubelets issue 2K QPS of GET PersistentVolumeClaim requests during 5K node load test
10 participants