Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make mount errors uncertain #110650

Closed

Conversation

gnufied
Copy link
Member

@gnufied gnufied commented Jun 18, 2022

Make all mount errors result in uncertainly mounted volmes

This is a quickfix for - #96635 and possibly #109991 and other related issues.

This PR marks all volumes as uncertain if mount operation fails. But this PR although shorter than #110670 has some caveats:

  1. As I mentioned in Fix pod stuck in termination state when mount fails or gets skipped after kubelet restart #110670 , this PR introduces a slight behaviour change in how/when pods are deleted if kubelet never successfully mounted the volume. After this PR - all volumes (even for which mounts are failing) must be unmounted if pod is to be deleted.
  2. The other problem is - in some cases mount operation may fail even before reaching the plugin. For example if a volume was mounted and while kubelet was down, an user edited topology of the node, then after kubelet restart such volumes can't be mounted anymore and hence they won't be unmounted as well, even if pod is deleted. Such volumes will be forever stuck to the node until manually fixed.

This is why I think, adding volumes read during reconstruction in uncertain state as proposed by #110670 is kinda necessary. But then - part#2 of that fix, where kubelet is marking such volumes as unmounted anyways if mount fails can either use this change or we need to keep track of reconstructed volumes in ASOW as proposed in - #110670

@k8s-ci-robot
Copy link
Contributor

@gnufied: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 18, 2022
@gnufied
Copy link
Member Author

gnufied commented Jun 18, 2022

/sig storage
/kind bug

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. kind/bug Categorizes issue or PR as related to a bug. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Jun 18, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gnufied

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jun 18, 2022
@gnufied
Copy link
Member Author

gnufied commented Jun 19, 2022

/retest

@pacoxu pacoxu added this to Needs Reviewer in SIG Node PR Triage Jun 20, 2022
@pacoxu
Copy link
Member

pacoxu commented Jun 20, 2022

#110658 is opened for the storage-snapshot test failure.

@jingxu97
Copy link
Contributor

could you add in the issue comments about why this change is needed? @gnufied

@gnufied
Copy link
Member Author

gnufied commented Jun 21, 2022

@jingxu97 added an detailed explanation.

@gnufied
Copy link
Member Author

gnufied commented Jun 29, 2022

/retest

@k8s-ci-robot
Copy link
Contributor

@gnufied: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-gce-storage-snapshot f27c245 link false /test pull-kubernetes-e2e-gce-storage-snapshot

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@xing-yang
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 24, 2022
@k8s-ci-robot
Copy link
Contributor

@gnufied: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 28, 2022
@pacoxu pacoxu moved this from Needs Reviewer to Waiting on Author in SIG Node PR Triage Aug 3, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 26, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 25, 2022
@gnufied gnufied closed this Dec 5, 2022
SIG Node PR Triage automation moved this from Waiting on Author to Done Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants