-
Notifications
You must be signed in to change notification settings - Fork 943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix: fix csi plugin concurrency issue on FuseRecovery and NodeUnpublishVolume #3448
bugfix: fix csi plugin concurrency issue on FuseRecovery and NodeUnpublishVolume #3448
Conversation
Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
/test fluid-e2e |
pkg/csi/plugins/nodeserver.go
Outdated
} | ||
// targetPath is corrupted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using if-else to make it easy to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
pkg/csi/recover/recover.go
Outdated
should, err := r.shouldRecover(point.MountPath) | ||
if err != nil { | ||
glog.Warningf("FuseRecovery: found path %s which is unable to recover due to error %v, skip it", point.MountPath, err) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If continue
here, the lock of mountPath will not release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thx for pointing this!
Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
Codecov Report
@@ Coverage Diff @@
## master #3448 +/- ##
==========================================
- Coverage 64.28% 64.24% -0.04%
==========================================
Files 442 442
Lines 26420 26441 +21
==========================================
+ Hits 16984 16988 +4
- Misses 7430 7444 +14
- Partials 2006 2009 +3
|
Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
/test fluid-e2e |
pkg/csi/recover/recover.go
Outdated
brokenMounts, err := mountinfo.GetBrokenMountPoints() | ||
if err != nil { | ||
glog.Error(err) | ||
return | ||
} | ||
|
||
for _, point := range brokenMounts { | ||
glog.V(4).Infof("Get broken mount point: %v", point) | ||
if lock := r.locks.TryAcquire(point.MountPath); !lock { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using r.checkAndRecoverMounts() to wrap the logic in the loop? We can use defer r.locks.Release
to make the logic simper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for the suggestion. Moved it into func doRecover
Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
Kudos, SonarCloud Quality Gate passed!
|
/test fluid-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cheyang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…blishVolume (fluid-cloudnative#3448) * Add comments for NodeUnpublishVolume Signed-off-by: trafalgarzzz <trafalgarz@outlook.com> * Refactor NodeUnpublishVolume code Signed-off-by: trafalgarzzz <trafalgarz@outlook.com> * FuseRecovery uses volume locks to avoid race conditions Signed-off-by: trafalgarzzz <trafalgarz@outlook.com> * Refactor node server with codes.Internal error code Signed-off-by: trafalgarzzz <trafalgarz@outlook.com> * Rename CSI Config to RunningContext Signed-off-by: trafalgarzzz <trafalgarz@outlook.com> * Fix github actions checks Signed-off-by: trafalgarzzz <trafalgarz@outlook.com> * Fix lock release Signed-off-by: trafalgarzzz <trafalgarz@outlook.com> * Refactor recover logic Signed-off-by: trafalgarzzz <trafalgarz@outlook.com> --------- Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
…and NodeUnpublishVolume (#3448) (#3453) * Bugfix: ignore not connected error in NodeUnpublishVolume (#3445) * ignore not connected error in NodeUnpublishVolume Signed-off-by: wangshulin <wangshulin@smail.nju.edu.cn> * fix check nil error Signed-off-by: wangshulin <wangshulin@smail.nju.edu.cn> * simplify error judgment Signed-off-by: wangshulin <wangshulin@smail.nju.edu.cn> --------- Signed-off-by: wangshulin <wangshulin@smail.nju.edu.cn> * bugfix: fix csi plugin concurrency issue on FuseRecovery and NodeUnpublishVolume (#3448) * Add comments for NodeUnpublishVolume Signed-off-by: trafalgarzzz <trafalgarz@outlook.com> * Refactor NodeUnpublishVolume code Signed-off-by: trafalgarzzz <trafalgarz@outlook.com> * FuseRecovery uses volume locks to avoid race conditions Signed-off-by: trafalgarzzz <trafalgarz@outlook.com> * Refactor node server with codes.Internal error code Signed-off-by: trafalgarzzz <trafalgarz@outlook.com> * Rename CSI Config to RunningContext Signed-off-by: trafalgarzzz <trafalgarz@outlook.com> * Fix github actions checks Signed-off-by: trafalgarzzz <trafalgarz@outlook.com> * Fix lock release Signed-off-by: trafalgarzzz <trafalgarz@outlook.com> * Refactor recover logic Signed-off-by: trafalgarzzz <trafalgarz@outlook.com> --------- Signed-off-by: trafalgarzzz <trafalgarz@outlook.com> --------- Signed-off-by: wangshulin <wangshulin@smail.nju.edu.cn> Signed-off-by: trafalgarzzz <trafalgarz@outlook.com> Co-authored-by: wangshulin <89928606+wangshli@users.noreply.github.com>
Ⅰ. Describe what this PR does
Fix potential concurrency issue when FuseRecovery and NodeUnpublishVolume process on a same target path.
Ⅱ. Does this pull request fix one issue?
fixes #3449
Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.
Ⅳ. Describe how to verify it
Ⅴ. Special notes for reviews