New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
azure: remove disk locks per vm during attach/detach #85115
Conversation
/area provider/azure (for testing repercussions of removing lock and ensuring updates to same VM is not a problem + soak tests) /cc @khenidak |
/assign @andyzhangx |
@aramase disk attach/detach lock cannot be removed now, is there a way to replace it with another better lock? This lock |
@andyzhangx I'm going to test the behavior of removing the lock. If there are any issues, then yes we can investigate using the existing lock with different config or a different locking implementation. |
4e665ee
to
2109246
Compare
/test pull-kubernetes-e2e-gce |
1 similar comment
/test pull-kubernetes-e2e-gce |
@@ -58,17 +57,16 @@ var defaultBackOff = kwait.Backoff{ | |||
Jitter: 0.0, | |||
} | |||
|
|||
// acquire lock to attach/detach disk in one node | |||
var diskOpMutex = keymutex.NewHashed(0) |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
/test pull-kubernetes-node-e2e-containerd |
381b0a1
to
57df625
Compare
/test pull-kubernetes-kubemark-e2e-gce-big |
Test result -
The tests are done on a cluster with 2 vcpus. This means the length of mutexes is equal to the CPU num (2). In case of AKS, the master has 8vcpus from what @andyzhangx told. This would mean with the old keymutex, there could be 8 concurrent updates (attach/detach) in parallel provided there are no hash collisions. With the changes in this PR, all vm updates happen in parallel which reduces disk attach/detach time for numerous disks. @andyzhangx PTAL! /hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/hold cancel |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andyzhangx, aramase The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/priority important-soon |
@aramase could you also cherry-pick this PR to old releases? thanks. |
@andyzhangx will do! |
…5-upstream-release-1.16 Automated cherry pick of #85115: remove disk locks per vm
…5-upstream-release-1.15 Automated cherry pick of #85115: remove disk locks per vm
…5-upstream-release-1.14 Automated cherry pick of #85115: remove disk locks per vm
What type of PR is this?
/kind bug
What this PR does / why we need it:
Currently the per VM lock logic uses number of CPUs by default to lock which throttles number of concurrent attach/detach requests. With this updated locking logic, disk attach/detach for different nodes will be run concurrently. Disk attach/detach operations for the same node will be done sequentially.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
Test result -