Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinguish between volume path and mount path #74653

Merged

Conversation

cofyc
Copy link
Member

@cofyc cofyc commented Feb 27, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:

Which issue(s) this PR fixes:

partially addresses #74650

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Because some plugins mount volume on sub-directory of volume path, we need to distinguish between volume path and mount path to fix issue in reconstruction.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 27, 2019
@cofyc
Copy link
Member Author

cofyc commented Feb 27, 2019

/priority important-soon
/sig storage
/assign @msau42 @jingxu97

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 27, 2019
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Feb 27, 2019
@@ -917,7 +917,7 @@ func (oe *operationExecutor) CheckVolumeExistenceOperation(
if attachable != nil {
var isNotMount bool
var mountCheckErr error
if isNotMount, mountCheckErr = mounter.IsLikelyNotMountPoint(mountPath); mountCheckErr != nil {
if isNotMount, mountCheckErr = mounter.IsNotMountPoint(mountPath); mountCheckErr != nil {
Copy link
Member Author

@cofyc cofyc Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for safety, we should use mounter.IsNotMountPoint here because mount source may be a directory on the host (bind mount) (reverted, because it checks for attachable volume only).

A very special case here is mount.IsNotMountPoint/IsLikelyNotMountPoint may fail because mount point is broken (e.g. stale NFS file handle). If this happens, reconstruction will fail and never recover because reconstruction happens only once.

I'm not sure if this is expected. If it is not expected, maybe we need to use another way to check volume existence in reconstruction or don't check it at all?

@cofyc cofyc force-pushed the fix74650-CheckVolumeExistenceOperation branch from e85a1ca to 8940976 Compare February 27, 2019 11:31
@msau42
Copy link
Member

msau42 commented Feb 27, 2019

/test pull-kubernetes-e2e-gce-csi-serial

if checkErr != nil {
return nil, checkErr
}
// If mount or symlink doesn't exist, volume reconstruction should be failed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jingxu97 can this check be removed? This means that the directory still exists and should be cleaned up.

Also this assumes that all volumes use mounts, which is not necessarily true (esp for ephemeral types).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, this is the PR that introduced this check. Reading through the original bug and discussion, I'm not sure how this fix actually addresses the original issue... cc @chakri-nelluri

#51985

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msau42 PR #51985 adds an extra check to make sure that the volume is mounted on a reboot and return a failure for volume reconstruction, if it is not. Hope this clarifies the need for check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok @jingxu97 and I discussed and I understand now.

If the volume was partially torn down (ie the mount point is gone, but the directory still exists), then we don't want reconstruction to add the volume back to the ASW (because it's actually not mounted) and cause an empty directory to be mounted into the pod. Let's update the comment here to make that reason more obvious.

However, this brings up some other issues:

  • What happens if the volume plugin doesn't actually use mounts? Maybe this isn't an issue for attachable plugins. I can't think of a real attachable plugin that is not mount-based.
  • If there was a case of kubelet/node crash in the middle of cleanup, then we may not be able to properly cleanup the directories for CSI because we only try reconstruction once during kubelet initialization, when the CSI plugin may not be registered yet. This is also more of a corner case and we don't need to consider fixing it urgently for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jingxu97 can this check be removed? This means that the directory still exists and should be cleaned up.

Also this assumes that all volumes use mounts, which is not necessarily true (esp for ephemeral types).

@msau42 This code is conditional and was only meant for plugins which support attach and needs to be reattached and mounted on a node reboot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • What happens if the volume plugin doesn't actually use mounts? Maybe this isn't an issue for attachable plugins. I can't think of a real attachable plugin that is not mount-based.

Block devices?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If there was a case of kubelet/node crash in the middle of cleanup, then we may not be able to properly cleanup the directories for CSI because we only try reconstruction once during kubelet initialization, when the CSI plugin may not be registered yet. This is also more of a corner case and we don't need to consider fixing it urgently for now.

@msau42 This might show up if we restart Kubelet to recover for any other issues..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This CheckVolumeExistance function does handle raw block, although I wonder if it possible to the link to change across reboots? I think this is what the losetup is supposed to protect against. cc @wongma7

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only kubelet manipulates these links, technically it can change but so can a mount device (we check that a mount point exists only, not if it points to the right thing). (losetup doesn't help here as the failure scenario the losetup code is intended to prevent is something different like: volume at /dev/sda somehow gets attached then another volume gets attached at /dev/sda, silently. The loop device holding onto /dev/sda is supposed to prevent detach from happening at all. If there's a reboot, I don't believe the loop devices persist anyhow.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should test the reboot behavior for raw block separately to make sure we're not going to accidentally give the wrong device to container if the device names change on a reboot.

@msau42
Copy link
Member

msau42 commented Mar 4, 2019

/lgtm
for fixing the immediate issue. There are a few other followup issues that we can investigate/address separately.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 4, 2019
@saad-ali
Copy link
Member

saad-ali commented Mar 5, 2019

/assign

@cofyc cofyc changed the title Distinguish volume path with mount path Distinguish between volume path and mount path Mar 6, 2019
Copy link
Member

@saad-ali saad-ali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

/assign @jingxu97
Jing is our reconstruction expert, let's make sure she takes a look at this.

@@ -516,6 +507,16 @@ func (rc *reconciler) reconstructVolume(volume podVolume) (*reconstructedVolume,
newMounterErr)
}

// Check existence of mount point for filesystem volume or symbolic link for block volume
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment describes how CheckVolumeExistenceOperation works internally to check volume existence. It's correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @saad-ali means to add a comment about the change? Instead of using volume.volumePath, we need to use volumeMounter.GetPath() because the mount point setup is different for different plugins. We have to be careful about this when adding/modifying code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, sorry, because it's already merged, we can add it later.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cofyc, saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 7, 2019
@k8s-ci-robot k8s-ci-robot merged commit 8dd6028 into kubernetes:master Mar 7, 2019
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Mar 14, 2019
@tpepper
Copy link
Member

tpepper commented Mar 18, 2019

/kind bug

k8s-ci-robot added a commit that referenced this pull request Mar 19, 2019
…upstream-release-1.13

Automated cherry pick of #74653: Distinguish volume path with mount path
@cofyc cofyc deleted the fix74650-CheckVolumeExistenceOperation branch May 4, 2019 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants