Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix race condition issue for smb mount on windows #75371

Merged
merged 1 commit into from Mar 18, 2019

Conversation

@andyzhangx
Copy link
Member

andyzhangx commented Mar 14, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
This PR brings the lock when do smb mounting on the node with same target source, otherwise there would be race condition as user already hits.

Which issue(s) this PR fixes:

Fixes #75180

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

fix race condition issue for smb mount on windows

/kind bug
/assign @jingxu97 @msau42 @PatrickLang @michmike @craiglpeters
/priority critical-urgent
/sig windows
/test pull-kubernetes-cross

@andyzhangx andyzhangx force-pushed the andyzhangx:smb-mount-win-lock branch from ae9dec8 to 4b4b6cd Mar 14, 2019

@andyzhangx

This comment has been minimized.

Copy link
Member Author

andyzhangx commented Mar 14, 2019

/test pull-kubernetes-cross

@michmike

This comment has been minimized.

Copy link

michmike commented Mar 14, 2019

@andyzhangx is this something we need for 1.14? can you talk a little bit about how a user could hit this issue (under what conditions)?
cc: @craiglpeters

@andyzhangx

This comment has been minimized.

Copy link
Member Author

andyzhangx commented Mar 15, 2019

hi @michmike I could easily repro this issue by "kubectl create -f https://raw.githubusercontent.com/andyzhangx/demo/master/windows/azurefile/aspnet-azurefile-multiple-replicas.yaml", the error is like following:

Events:
  Type     Reason       Age    From                 Message
  ----     ------       ----   ----                 -------
  Normal   Scheduled    2m28s  default-scheduler    Successfully assigned default/deployment-azurefile-7c8599c8c8-4fbbn to 1426k8s001
  Warning  FailedMount  2m17s  kubelet, 1426k8s001  MountVolume.SetUp failed for volume "pvc-b731c131-46d2-11e9-ba1f-000d3aa28f1b" : Remove-SmbGlobalMapping failed: exit status 1, output: "Remove-SmbGlobalMapping : No MSFT_SmbGlobalMapping objects found with property 'RemotePath' equal to \r\n'\\\\fa33e7109465b11e98xxxx.file.core.windows.net\\andy-win1134-dynamic-pvc-b731c131-46d2-11e9-ba1f-000d3aa28f1b'.  \r\nVerify the value of the property and retry.\r\nAt line:1 char:1\r\n+ Remove-SmbGlobalMapping -RemotePath $Env:smbremotepath -Force\r\n+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : ObjectNotFound: (\\\\fa33e7109465b...1f-000d3xxx:String) [Remove-SmbGlobalMapping], Ci \r\n   mJobException\r\n    + FullyQualifiedErrorId : CmdletizationQuery_NotFound_RemotePath,Remove-SmbGlobalMapping\r\n \r\n"
  Warning  FailedMount  2m12s  kubelet, 1426k8s001  MountVolume.SetUp failed for volume "pvc-b731c131-46d2-11e9-ba1f-000d3aa28f1b" : Remove-SmbGlobalMapping failed: exit status 1, output: "Remove-SmbGlobalMapping : This network connection does not exist. \r\nAt line:1 char:1\r\n+ Remove-SmbGlobalMapping -RemotePath $Env:smbremotepath -Force\r\n+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : NotSpecified: (MSFT_SmbGlobalM...le.core.win...):ROOT/Microsoft/...mbGlobalMapping) [Rem \r\n   ove-SmbGlobalMapping], CimException\r\n    + FullyQualifiedErrorId : Windows System Error 2250,Remove-SmbGlobalMapping\r\n \r\n"
  Normal   Pulling      96s    kubelet, 1426k8s001  pulling image "mcr.microsoft.com/dotnet/framework/aspnet:4.7.2-windowsservercore-ltsc2019"

And I have verified this PR has fixed this issue. It's better make it into v1.14 if possible.
/test pull-kubernetes-cross

cc @harshaperera532

@andyzhangx

This comment has been minimized.

Copy link
Member Author

andyzhangx commented Mar 15, 2019

/test pull-kubernetes-cross

@michmike

This comment has been minimized.

Copy link

michmike commented Mar 15, 2019

i pinged sig-storage
cc: @kubernetes/sig-storage-pr-reviews

@jsafrane

This comment has been minimized.

Copy link
Member

jsafrane commented Mar 15, 2019

I can't say anything about Windows, but the code looks OK.
/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Mar 15, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@michmike

This comment has been minimized.

Copy link

michmike commented Mar 15, 2019

/milestone v1.14

@k8s-ci-robot k8s-ci-robot added this to the v1.14 milestone Mar 15, 2019

@michmike michmike added this to 1.14 Release Blocking (Windows GA, gMSA alpha) in SIG-Windows Mar 15, 2019

@jingxu97

This comment has been minimized.

Copy link
Contributor

jingxu97 commented Mar 15, 2019

I have a few questions

  1. Why we hit this issue in 1.13? Do we have the same issue before?

  2. So in windows, there will be race condition if user tries to mount with the same target? But according to your example, isn't the mount target different under different pod volume directory?

  3. How often you hit this problem, I tried the similar deployment, but could not reproduce it.

@andyzhangx

This comment has been minimized.

Copy link
Member Author

andyzhangx commented Mar 16, 2019

Why we hit this issue in 1.13? Do we have the same issue before?

I introduced a fix for remount same smb path issue on windows: #73661, that fix introduces this issue, it's not a regression btw since original remount on same smb path does not work.

So in windows, there will be race condition if user tries to mount with the same target? But according to your example, isn't the mount target different under different pod volume directory?

I used 6 replicas in my example, though those pod has different mount dirs, while they point to same smb path, that will cause race condition.

How often you hit this problem, I tried the similar deployment, but could not reproduce it.

If you use same smb path with multiple pods in parallel, I think it could reproduce easily.

@jingxu97 Let me explain a little more here. On Linux, every smb mount is independent even mount with same smb path, so there won't be race condition, while on Windows, we are using SMB Global Mapping, same smb path shares the smb mount, so we need to add lock for this entry.

@xmudrii

This comment has been minimized.

Copy link
Member

xmudrii commented Mar 18, 2019

@andyzhangx Is this PR still planned to be merged in time for 1.14? I'd like to remind that Code Thaw is starting tomorrow (Tuesday PST EOD), so it would be nice to merge this PR by then, otherwise it'll compete with herd of incoming PRs post-thaw.

@andyzhangx

This comment has been minimized.

Copy link
Member Author

andyzhangx commented Mar 18, 2019

@xmudrii yes, it should be merged into v1.14
could someone lgtm, thanks. @jingxu97 @msau42

@michmike

This comment has been minimized.

Copy link

michmike commented Mar 18, 2019

@neolit123 can you please take a look as well?

@spiffxp

This comment has been minimized.

Copy link
Member

spiffxp commented Mar 18, 2019

/lgtm
/hold
Before removing the hold I'd like to hear back from @jingxu97 to understand if their questions have been sufficiently answered

@k8s-ci-robot k8s-ci-robot added the lgtm label Mar 18, 2019

@jingxu97

This comment has been minimized.

Copy link
Contributor

jingxu97 commented Mar 18, 2019

/lgtm

@spiffxp

This comment has been minimized.

Copy link
Member

spiffxp commented Mar 18, 2019

/hold cancel

@neolit123
Copy link
Member

neolit123 left a comment

LGTM

@michmike

This comment has been minimized.

Copy link

michmike commented Mar 18, 2019

thank you all for a quick review and approval!

@michmike

This comment has been minimized.

Copy link

michmike commented Mar 18, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot merged commit a4f2590 into kubernetes:master Mar 18, 2019

17 checks passed

cla/linuxfoundation andyzhangx authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details

SIG-Windows automation moved this from 1.14 Release Blocking (Windows GA, gMSA alpha) to Done (v1.15) Mar 18, 2019

@michmike michmike moved this from Done (v1.15) to Done (v.1.14) in SIG-Windows Mar 18, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.