Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volumes fail to clean up when kubelet restart due to race between actual and desired state #75345

Open
jingxu97 opened this Issue Mar 13, 2019 · 9 comments

Comments

Projects
None yet
6 participants
@jingxu97
Copy link
Contributor

jingxu97 commented Mar 13, 2019

We have tests to delete pods while kubelet restarts, the test is a little bit flaky because the following race condition

  1. stop kubelet, delete pod gracefully and restarts kubelet
  2. pod's volumes are first put into the volume desired state. Then populator will try to remove it because pod is deleted. However, it is normally failed because we check whether actual state already has the volume record or not.
  3. actual state verify volume is attached and have a record of it, but it does not record it as mounted yet
  4. desired state remove the volume right after step 3 happened.
  5. reconciler never has a chance to umount volume because actual state does not has it as mounted (only has it as attached)

I think in step 3, we should check actual state has this volume as mounted or not, instead of just checking whether it exists or not (attached).

@mariantalla

This comment has been minimized.

Copy link
Contributor

mariantalla commented Mar 15, 2019

Hey @jingxu97, just checking in - any updates on this one?

@athenabot

This comment has been minimized.

Copy link

athenabot commented Mar 16, 2019

/sig node

These SIGs are my best guesses for this issue. Please comment /remove-sig <name> if I am incorrect about one.
🤖 I am an (alpha) bot run by @vllry. 👩‍🔬

@mariantalla

This comment has been minimized.

Copy link
Contributor

mariantalla commented Mar 18, 2019

/remove-sig node

@k8s-ci-robot k8s-ci-robot removed the sig/node label Mar 18, 2019

@mariantalla

This comment has been minimized.

Copy link
Contributor

mariantalla commented Mar 18, 2019

Hi @jingxu97 / @msau42 - any news on this one? #75328 is currently flaking about 15% of the time in release-blocking boards.

@jingxu97

This comment has been minimized.

Copy link
Contributor Author

jingxu97 commented Mar 18, 2019

@mariantalla sorry for the delay. I will work on a fix today.

@spiffxp

This comment has been minimized.

Copy link
Member

spiffxp commented Mar 18, 2019

/milestone v1.14

@k8s-ci-robot k8s-ci-robot added this to the v1.14 milestone Mar 18, 2019

@spiffxp

This comment has been minimized.

Copy link
Member

spiffxp commented Mar 18, 2019

/priority important-soon

jingxu97 added a commit to jingxu97/kubernetes that referenced this issue Mar 19, 2019

Fix race condition between actual and desired state in kublet volume
manager

This PR fixes the issue kubernetes#75345. This fix modified the checking volume in
actual state when validating whether volume can be removed from desired state or not. Only if volume status is already mounted in actual state, it can be removed from desired state.
For the case of mounting fails always, it can still work because the
check also validate whether pod still exist in pod manager. In case of
mount fails, pod should be able to removed from pod manager so that
volume can also be removed from desired state.
@jingxu97

This comment has been minimized.

Copy link
Contributor Author

jingxu97 commented Mar 19, 2019

open a PR #75458 to fix it

@nikopen

This comment has been minimized.

Copy link
Member

nikopen commented Mar 21, 2019

/milestone clear

the PR can merge to master and can be cherry picked to 1.14.1 if it doesnt make 1.14.0

@k8s-ci-robot k8s-ci-robot removed this from the v1.14 milestone Mar 21, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.