Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When volume is not marked in-use, do not backoff #106853

Merged

Conversation

gnufied
Copy link
Member

@gnufied gnufied commented Dec 7, 2021

We unnecessarily trigger exp. backoff when volume is not marked in-use. Instead we can wait for volume to be marked as in-use before triggering operation_executor. This could result in reduced time when mounting attached volumes.

/sig storage
/kind bug

cc @jsafrane @jingxu97

Allow attached volumes to be mounted quicker by skipping exp. backoff when checking for reported-in-use volumes

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. sig/storage Categorizes an issue or PR as relevant to SIG Storage. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 7, 2021
@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Dec 7, 2021
@gnufied
Copy link
Member Author

gnufied commented Dec 7, 2021

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Dec 7, 2021
@gnufied
Copy link
Member Author

gnufied commented Dec 7, 2021

/assign @jingxu97

@jsafrane
Copy link
Member

jsafrane commented Dec 8, 2021

/approve

Maybe a little context: right now, when a pod lands on a node, kubelet does two things in parallel:

  1. Updates node.status.volumesInUse every 10 seconds.
  2. VolumeManager calls VerifyControllerAttachedVolume. VerifyControllerAttachedVolume is an operation with exp. backoff and checks for the volume to be in node.status.volumesInUse. At the time the point 1. writes node status, the exp. backoff of VerifyControllerAttachedVolume can be at 5-10 seconds already, which delays volume mount.

With this PR, the exp. backoff starts when the VolumeManager knows the volume already is in node.status.volumesInUse and node.status.volumeattached (might not be latest value, so still need a final check by getting api object directly), speeding up pod startup time by few seconds.

@jingxu97, PTAL

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 8, 2021
@gnufied gnufied force-pushed the disable-exp-backoff-volume-not-inuse branch 2 times, most recently from f7d760a to 90bb32f Compare December 9, 2021 12:56
@gnufied
Copy link
Member Author

gnufied commented Dec 9, 2021

@jsafrane @jingxu97 so I went ahead and implemented a similar mechanism for avoiding exp. backoff while checking node.Status.VolumeAttached too. I am not yet sure if we should split the PR in two but please carefully review anyway, we can decide on that later.

@gnufied
Copy link
Member Author

gnufied commented Dec 9, 2021

/retest

@ehashman ehashman added this to Triage in SIG Node PR Triage Dec 11, 2021
@jingxu97
Copy link
Contributor

jingxu97 commented Dec 16, 2021

In desired state, volumeToMount struct already has ReportInUse information. Whenever node status is updated, kubelet will mark volumeToMount.ReportInUse in desired state desiredStateOfWorld.MarkVolumesReportedInUse. So reconciler can directly check this, no need to check node status?

Does ReportedInUse also means volume is already attached to the node? I thought we need to check both. node.Status.VolumesAttached is updated by attach-detach-controller whereas node.Status.VolumeInUse is updated by kubelet.

The current logic in VerifyControllerAttachedVolumeFunc, we check both whether ReportedInUse and VolumesAttached. To avoid checking ReportedInUse trigger backoff, now we can move the check ReportedInUse in reconciler before calling VerifyControllerAttachedVolumeFunc. Only after ReportedInUse is added, we can go ahead to continue the previous logic to check volumeAttached.

@gnufied
Copy link
Member Author

gnufied commented Dec 16, 2021

The current logic in VerifyControllerAttachedVolumeFunc, we check both whether ReportedInUse and VolumesAttached. To avoid checking ReportedInUse trigger backoff, now we can move the check ReportedInUse in reconciler before calling VerifyControllerAttachedVolumeFunc.

Yes I already did that. See - https://github.com/kubernetes/kubernetes/pull/106853/files#diff-e9392bf9a117fa5eda2756ca13783db044a18ad4585b9b1fad776f660dc13068R206

@gnufied
Copy link
Member Author

gnufied commented Dec 17, 2021

@jingxu97 if the PR looks alright can you plz lgtm?

assert.NoError(t, volumetesting.VerifyWaitForAttachCallCount(
0 /* expectedWaitForAttachCallCount */, fakePlugin))
assert.NoError(t, volumetesting.VerifyMountDeviceCallCount(
0 /* expectedMountDeviceCallCount */, fakePlugin))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without this code change to pre-check volumeInUse and volumeattached, the test can also pass? It will fail VerifyControllerAttachedVolume, it will not go to waitforattachcall too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that sounds right. But then it will have IsOperationSafeToRetry check failing. But since we can't count VerifyControllerAttachedVolume calls, this test here should be fine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it is ok. we could try to improve some test logic later.

@@ -270,6 +270,20 @@ func (kvh *kubeletVolumeHost) GetNodeLabels() (map[string]string, error) {
return node.Labels, nil
}

func (kvh *kubeletVolumeHost) GetAttachedVolumes() (map[v1.UniqueVolumeName]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetAttachedVolumes() were used in actual_state_world for both controller and kubelet side.
how about use a different name like GetAttachedVolumesFromNodeStatus?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -1514,6 +1515,16 @@ func (og *operationGenerator) GenerateVerifyControllerAttachedVolumeFunc(
return volumetypes.GeneratedOperations{}, volumeToMount.GenerateErrorDetailed("VerifyControllerAttachedVolume.FindPluginBySpec failed", err)
}

if volumeToMount.PluginIsAttachable {
cachedAttachedVolumes, _ := og.volumePluginMgr.Host.GetAttachedVolumes()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add comment indicating this is early check attached volume from node status using nodelister to avoid back off? Later we will directly call api server to get the latest node status to confirm the VolumeAttached list to avoid race condition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

@jingxu97
Copy link
Contributor

the code looks good to me, just a few small comments.

Thank you for the PR! I think this change can really improve some performance. Hopefully we can get some data about it. We should also consider to cherrypick this.

@gnufied gnufied force-pushed the disable-exp-backoff-volume-not-inuse branch from 90bb32f to 7989f27 Compare December 20, 2021 16:57
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 20, 2021
@gnufied
Copy link
Member Author

gnufied commented Dec 20, 2021

The usual improvement from measurments

Before:

volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="0.1"} 0
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="0.25"} 0
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="0.5"} 0
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="1"} 0
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="2.5"} 0
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="5"} 0
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="10"} 10
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="15"} 10
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="25"} 12
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="50"} 12
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="120"} 12
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="300"} 12
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="600"} 12
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="+Inf"} 12
volume_operation_total_seconds_sum{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume"} 123.11098267599999
volume_operation_total_seconds_count{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume"} 12

After
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="0.25"} 0
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="0.5"} 0
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="1"} 0
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="2.5"} 1
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="5"} 2
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="10"} 10
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="15"} 13
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="25"} 13
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="50"} 13
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="120"} 13
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="300"} 13
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="600"} 13
volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="+Inf"} 13
volume_operation_total_seconds_sum{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume"} 107.01795236
volume_operation_total_seconds_count{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume"} 13

So overall 10.25 vs 8.23. Also as you can see most operations now complete within 15s whereas before some operations took 20s to complete.

@jsafrane
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 21, 2021
@jingxu97
Copy link
Contributor

The usual improvement from measurments

Before:

volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="0.1"} 0 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="0.25"} 0 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="0.5"} 0 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="1"} 0 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="2.5"} 0 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="5"} 0 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="10"} 10 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="15"} 10 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="25"} 12 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="50"} 12 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="120"} 12 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="300"} 12 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="600"} 12 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="+Inf"} 12 volume_operation_total_seconds_sum{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume"} 123.11098267599999 volume_operation_total_seconds_count{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume"} 12

After volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="0.25"} 0 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="0.5"} 0 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="1"} 0 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="2.5"} 1 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="5"} 2 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="10"} 10 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="15"} 13 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="25"} 13 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="50"} 13 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="120"} 13 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="300"} 13 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="600"} 13 volume_operation_total_seconds_bucket{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume",le="+Inf"} 13 volume_operation_total_seconds_sum{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume"} 107.01795236 volume_operation_total_seconds_count{operation_name="volume_mount",plugin_name="kubernetes.io/vsphere-volume"} 13

So overall 10.25 vs 8.23. Also as you can see most operations now complete within 15s whereas before some operations took 20s to complete.

that's great, already 20% improvement!
/lgtm
/approve

@jingxu97
Copy link
Contributor

/assign @Random-Liu could you help review and approve this PR? Thanks!

@Random-Liu
Copy link
Member

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gnufied, jingxu97, jsafrane, Random-Liu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 23, 2021
@pacoxu
Copy link
Member

pacoxu commented Dec 23, 2021

Unrelated flake in pull-kubernetes-integration: FAIL: TestCronJobLaunchesPodAndCleansUp

/retest

@k8s-ci-robot k8s-ci-robot merged commit f0dbc32 into kubernetes:master Dec 23, 2021
SIG Node PR Triage automation moved this from Triage to Done Dec 23, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.24 milestone Dec 23, 2021
k8s-ci-robot added a commit that referenced this pull request Jan 25, 2022
…853-upstream-release-1.22

Automated cherry pick of #106853: When volume is not marked in-use, do not backoff
k8s-ci-robot added a commit that referenced this pull request Jan 25, 2022
…853-upstream-release-1.23

Automated cherry pick of #106853: When volume is not marked in-use, do not backoff
@jingxu97
Copy link
Contributor

jingxu97 commented Apr 9, 2022

@gnufied should we consider cherrypick this change?

@gnufied
Copy link
Member Author

gnufied commented Apr 12, 2022

@jingxu97 I already backported to 1.23 and 1.22 , were you thinking earlier versions than those?

@jingxu97
Copy link
Contributor

oh, I missed it.
Maybe 1.21 can be also useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants