Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robust VolumeManager reconstruction after kubelet restart #3756

Open
5 of 8 tasks
jsafrane opened this issue Jan 19, 2023 · 24 comments
Open
5 of 8 tasks

Robust VolumeManager reconstruction after kubelet restart #3756

jsafrane opened this issue Jan 19, 2023 · 24 comments
Assignees
Labels
lead-opted-in Denotes that an issue has been opted in to a release sig/storage Categorizes an issue or PR as relevant to SIG Storage. stage/stable Denotes an issue tracking an enhancement targeted for Stable/GA status
Milestone

Comments

@jsafrane
Copy link
Member

jsafrane commented Jan 19, 2023

Enhancement Description

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 19, 2023
@jsafrane
Copy link
Member Author

/sig storage

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 20, 2023
@jsafrane
Copy link
Member Author

/label lead-opted-in
/milestone v1.27
/stage alpha

@k8s-ci-robot k8s-ci-robot added the stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status label Jan 27, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Jan 27, 2023
@jsafrane jsafrane added the lead-opted-in Denotes that an issue has been opted in to a release label Jan 27, 2023
@npolshakova
Copy link

Hello @jsafrane 👋, 1.27 Enhancements team here.

Just checking in as we approach enhancements freeze on 18:00 PDT Thursday 9th February 2023.

This enhancement is targeting for stage alpha for 1.27 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • KEP readme using the latest template has been merged into the k/enhancements repo.
  • KEP status is marked as implementable for latest-milestone: 1.27
  • KEP readme has a updated detailed test plan section filled out
  • KEP readme has up to date graduation criteria
  • KEP has a production readiness review that has been completed and merged into k/enhancements.

It looks like #3763 will address most of these issues.

The status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

@jsafrane
Copy link
Member Author

jsafrane commented Feb 9, 2023

This enhancement is targeting for stage alpha for 1.27 (correct me, if otherwise)

We're targeting Beta directly, we had alpha as part of https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1710-selinux-relabeling

And the KEP is merged, together with PRR approval, I think everything is fine for 1.27

@npolshakova
Copy link

Great! This enhancement meets all the requirements for being included in v1.27 and is now tracked for the release.

One thing to note, make sure to update the PRR section in the KEP README.

/stage beta
/remove-label tracked/no
/label tracked/yes

@k8s-ci-robot k8s-ci-robot added tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team stage/beta Denotes an issue tracking an enhancement targeted for Beta status and removed stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status labels Feb 9, 2023
@k8s-ci-robot
Copy link
Contributor

@npolshakova: Those labels are not set on the issue: tracked/no

In response to this:

Great! This enhancement meets all the requirements for being included in v1.27 and is now tracked for the release.

One thing to note, make sure to update the PRR section in the KEP README.

/stage beta
/remove-label tracked/no
/label tracked/yes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mickeyboxell
Copy link

Hi @jsafrane 👋, I’m reaching out from the 1.27 Release Docs team. This enhancement is marked as ‘Needs Docs’ for the 1.27 release.

Please follow the steps detailed in the documentation to open a PR against dev-1.27 branch in the k/website repo. This PR can be just a placeholder at this time, and must be created by March 16. For more information, please take a look at Documenting for a release to familiarize yourself with the documentation requirements for the release.

Please feel free to reach out with any questions. Thanks!

@npolshakova
Copy link

npolshakova commented Mar 9, 2023

Hi @jsafrane,

Checking in as we approach 1.27 code freeze at 17:00 PDT on Tuesday 14th March 2023.

Please ensure the following items are completed:

  • All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
  • All PRs are fully merged by the code freeze deadline.

For this enhancement, it looks like the following PRs are open and need to be merged before code freeze:

Please let me know if there are any other PRs in k/k I should be tracking for this KEP.
As always, we are here to help should questions come up. Thanks!

@jsafrane
Copy link
Member Author

There are kubernetes/kubernetes#115972 and kubernetes/kubernetes#115965 that are part of this feature and were already merged.

@jsafrane
Copy link
Member Author

Docs: kubernetes/website#40038

@Atharva-Shinde Atharva-Shinde removed this from the v1.27 milestone May 14, 2023
@Atharva-Shinde Atharva-Shinde removed tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team lead-opted-in Denotes that an issue has been opted in to a release labels May 14, 2023
starlingx-github pushed a commit to starlingx/stx-puppet that referenced this issue Jul 28, 2023
After a forced reboot, kubelet fails to clean up orphaned
volume files and directories. These orphaned volumes
persist because the directories still hold files, a situation
that would typically be resolved during a normal flow
but instead is caused by the forced reboot.

The volume directories are tied to the pods' unique identifiers.
Since none of the pods are running after the reboot, the associated
volume directories and files can be safely removed.

The work in this commit will eventually be superseded by
kubernetes/enhancements#3756.

Test Plan:

PASS:
- Build an iso and install an aio-sx system and verify
that the cleanup script is installed and the kubelet service
is running with
ExecStarPre=/usr/local/bin/kubelet-cleanup-orphaned-volumes.sh.

PASS:
- Reboot the system with active pods that
contain files in their volumes directories and
verify that all volume directories and their files under
/var/lib/kubelet/pods/ are deleted after reboot.

PASS:
- Verify that explictly restarting the kubelet service does
not attempt to delete kubelet volume directories.

PASS:
- Verify volume-subpaths directories and files are cleaned up
after reboot.

Closes-Bug: 2027810

Change-Id: Ie7e637c4d5e79ec08d33bd80dade35890b711548
Signed-off-by: Gleb Aronsky <gleb.aronsky@windriver.com>
@yuqitao
Copy link

yuqitao commented Nov 18, 2023

Could we take kubernetes/kubernetes#121937 into consideration to robust VolumeManager reconstruction? @jsafrane

@jsafrane
Copy link
Member Author

/milestone v1.30
/label lead-opted-in

@k8s-ci-robot k8s-ci-robot added this to the v1.30 milestone Jan 23, 2024
@jsafrane jsafrane added the lead-opted-in Denotes that an issue has been opted in to a release label Jan 23, 2024
@jsafrane
Copy link
Member Author

Could we take kubernetes/kubernetes#121937 into consideration to robust VolumeManager reconstruction? @jsafrane

I think that reconstruction of global directories is worth its own feature + feature gate. We would need to go back to alpha with NewVolumeManagerReconstruction.

@jsafrane
Copy link
Member Author

jsafrane commented Jan 23, 2024

Downgrade/upgrade test report

I tested installation of 1.28.5 (the feature is enabled there) -> downgrade kubelet to 1.27.9 -> upgrade kubelet back to 1.28.5 using a vanilla cluster installed by kops. The feature is limited to kubelet, I did not downgrade API server and KCM.

No issues found, new/old kubelet can read /var/lib/kubelet of the old/new kubelet just fine and clean its volumes.

  • $ shows a command on a client machine that has a valid $KUBECONFIG.
  • [root@i-067d188457e418795 ~]# shows a command that runs on a worker node of a cluster.
  1. Install Kubernetes 1.28.5 on AWS via kops + cordon the master (to force all pods on the single worker). The feature is enabled by default there.
$ kubectl get node
NAME                  STATUS                     ROLES           AGE     VERSION
i-047a75412be9c4c78   Ready,SchedulingDisabled   control-plane   14m     v1.28.5
i-067d188457e418795   Ready                      node            9m43s   v1.28.5
  1. On the node, download kubelet 1.27.9 to /usr/local/bin/kubelet-old
[root@i-067d188457e418795 ~]# curl -L https://dl.k8s.io/v1.27.9/bin/linux/amd64/kubelet -o /usr/local/bin/kubelet-old
[root@i-067d188457e418795 ~]# chmod 755 /usr/local/bin/kubelet-old
[root@i-067d188457e418795 ~]# /usr/local/bin/kubelet-old --version
Kubernetes v1.27.9
  1. Run a StatefulSet with 3 replicas, one PVC each. All of them will run on the single worker node.
$ kubectl get pod -o wide
NAME     READY   STATUS    RESTARTS   AGE     IP             NODE                  NOMINATED NODE   READINESS GATES
test-0   1/1     Running   0          5m36s   100.96.1.83    i-067d188457e418795   <none>           <none>
test-1   1/1     Running   0          5m36s   100.96.1.125   i-067d188457e418795   <none>           <none>
test-2   1/1     Running   0          5m36s   100.96.1.97    i-067d188457e418795   <none>           <none>
  1. Notice what AWS EBS volumes are mounted on the node
[root@i-067d188457e418795 ~]# mount | grep nvme
/dev/nvme1n1 on /var/lib/kubelet/plugins/kubernetes.io/csi/ebs.csi.aws.com/7a4cb7998eeb7e14975a26affeb2aa6af76bb0a10f47cd1f130381f9f5dde9ff/globalmount type ext4 (rw,relatime,seclabel)
/dev/nvme3n1 on /var/lib/kubelet/plugins/kubernetes.io/csi/ebs.csi.aws.com/0b3edd46c63b582efaf67b521b8fd47aad2e4684ac762629d67d97e89794ead4/globalmount type ext4 (rw,relatime,seclabel)
/dev/nvme2n1 on /var/lib/kubelet/plugins/kubernetes.io/csi/ebs.csi.aws.com/4be4c2a6844fbab07b76071826a07476c4fc0cedc7772f4be4a2e60632dcd053/globalmount type ext4 (rw,relatime,seclabel)
/dev/nvme1n1 on /var/lib/kubelet/pods/65ecfa1f-0b7f-4504-ba9c-725b60c5c761/volumes/kubernetes.io~csi/pvc-ddd5f720-274f-4f56-b1c4-5b6c53670093/mount type ext4 (rw,relatime,seclabel)
/dev/nvme2n1 on /var/lib/kubelet/pods/6265edac-ab1f-47af-a3ba-75189cb22c99/volumes/kubernetes.io~csi/pvc-0b86ad21-5cd3-4dea-bdaf-763533f29307/mount type ext4 (rw,relatime,seclabel)
/dev/nvme3n1 on /var/lib/kubelet/pods/9a009c6b-f7a0-4375-a3cd-42d303e9216f/volumes/kubernetes.io~csi/pvc-18060273-7ec5-43a1-9441-2be55394aff0/mount type ext4 (rw,relatime,seclabel)
  1. Test downgrade: shut down kubelet, change systemd kubelet.service to kubelet-old, force-delete the pods.
[root@i-067d188457e418795 ~]# systemctl stop kubelet
[root@i-067d188457e418795 ~]# vi /usr/lib/systemd/system/kubelet.service # Change the kubelet binary to /usr/local/bin/kubelet-old
[root@i-067d188457e418795 ~]# systemctl daemon-reload
$ kubectl delete pod --all --force

Force-delete will trigger the volume reconstruction in kubelet - the newly started kubelet cannot see the deleted Pods in the API server and thus has to reconstruct its state from the OS (/proc/mounts, /var/lib/kubelet/pods/, ...).
Note that StatefulSet will re-create the Pods almost instantly, but they will have different UID, i.e. they are different Pods for kubelet.

[root@i-067d188457e418795 ~]# systemctl start kubelet.service
  1. Check that the new kubelet is 1.27.9:
$ kubectl get node
NAME                  STATUS                     ROLES           AGE   VERSION
i-047a75412be9c4c78   Ready,SchedulingDisabled   control-plane   19m   v1.28.5
i-067d188457e418795   Ready                      node            14m   v1.27.9
  1. Check that the 1.27 kubelet unmounted the old volumes:
[root@i-067d188457e418795 ~]# mount | egrep "65ecfa1f-0b7f-4504-ba9c-725b60c5c761|6265edac-ab1f-47af-a3ba-75189cb22c99|9a009c6b-f7a0-4375-a3cd-42d303e9216f"
<empty output>
  1. Check that /var/lib/kubelet/pods/<uid> directories of all deleted pods was fully cleaned:
[root@i-067d188457e418795 ~]# ls -la /var/lib/kubelet/pods/{65ecfa1f-0b7f-4504-ba9c-725b60c5c761,6265edac-ab1f-47af-a3ba-75189cb22c99,9a009c6b-f7a0-4375-a3cd-42d303e9216f}
ls: cannot access '/var/lib/kubelet/pods/65ecfa1f-0b7f-4504-ba9c-725b60c5c761': No such file or directory
ls: cannot access '/var/lib/kubelet/pods/6265edac-ab1f-47af-a3ba-75189cb22c99': No such file or directory
ls: cannot access '/var/lib/kubelet/pods/9a009c6b-f7a0-4375-a3cd-42d303e9216f': No such file or directory
  1. Test upgrade: shut down kubelet, change the systemd service back to kubelet 1.28.6, force delete existing pods, start kubelet. Showing just results:
$ kubectl get node
NAME                  STATUS                     ROLES           AGE   VERSION
i-047a75412be9c4c78   Ready,SchedulingDisabled   control-plane   29m   v1.28.5
i-067d188457e418795   Ready                      node            24m   v1.28.5

All pod-dirs of the force-deleted pods were unmounted + deleted.

@AnaMMedina21
Copy link

AnaMMedina21 commented Feb 7, 2024

Hello {enhancement owner} 👋, Enhancements team here.

Just checking in as we approach enhancements freeze on 02:00 UTC Friday 9th February 2024.

This enhancement is targeting for stage beta for v1.30 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • KEP readme using the latest template has been merged into the k/enhancements repo.
  • KEP status is marked as implementable for latest-milestone: v1.30. KEPs targeting stable will need to be marked as implemented after code PRs are merged and the feature gates are removed.
  • KEP readme has up-to-date graduation criteria
  • KEP has a production readiness review that has been completed and merged into k/enhancements. (For more information on the PRR process, check here).

For this KEP, we would just need to update the following:

  • Can you update the KEP to include the latest Scalability question Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
  • Can the kep.yaml milestones be updated to reflect milestone plans

The status of this enhancement is marked as at risk for enhancement freeze. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

@jsafrane
Copy link
Member Author

jsafrane commented Feb 8, 2024

Can you update the KEP to include the latest Scalability question Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

Can the kep.yaml milestones be updated to reflect milestone plans

@AnaMMedina21, both were updated yesterday in #4432

@AnaMMedina21
Copy link

KEP status is marked as implementable for latest-milestone: v1.30. KEPs targeting stable will need to be marked as implemented after code PRs are merged and the feature gates are removed.

@jsafrane Can we also update kep.yamls status to implemented, assuming feature flags have been removed. That would be the final thing missing for the enhancements freeze.

@meganwolf0
Copy link

With all the requirements fulfilled this enhancement is now marked as tracked for the upcoming enhancements freeze 🚀

@xing-yang xing-yang added stage/stable Denotes an issue tracking an enhancement targeted for Stable/GA status and removed stage/beta Denotes an issue tracking an enhancement targeted for Beta status labels Feb 9, 2024
@xing-yang
Copy link
Contributor

Fixed the label. It's moving from beta to GA in 1.30.

@jsafrane
Copy link
Member Author

jsafrane commented Feb 9, 2024

Can we also update kep.yamls status to implemented, assuming feature flags have been removed. That would be the final thing missing for the enhancements freeze.

I think we marked KEPs as implemented after GA, but we might have been doing it wrong :-). I filed #4506

@kikisdeliveryservice
Copy link
Member

kikisdeliveryservice commented Feb 12, 2024

Can we also update kep.yamls status to implemented, assuming feature flags have been removed. That would be the final thing missing for the enhancements freeze.

I think we marked KEPs as implemented after GA, but we might have been doing it wrong :-). I filed #4506

@jsafrane You are correct. The enhancement is only updated to implemented once all code, docs, etc... are merged and the entire feature is finished. Enhancements should not be marked implemented ahead of time.

@Princesso
Copy link

Hello @jsafrane, 👋 1.30 Docs Shadow here.
Does this enhancement work planned for 1.30 require any new docs or modifications to existing docs?
If so, please follow the steps here to open a PR against the dev-1.30 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday, February 22nd, 2024 18:00 PDT.
Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release.
Thank you!

@a-mccarthy
Copy link

Hi @jsafrane,

👋 from the v1.30 Communications Team! We'd love for you to opt in to write a feature blog about your enhancement!

We encourage blogs for features including, but not limited to: breaking changes, features and changes important to our users, and features that have been in progress for a long time and are graduating.

To opt in, you need to open a Feature Blog placeholder PR against the website repository.
The placeholder PR deadline is 27th February, 2024.
Here's the 1.30 Release Calendar

@jsafrane
Copy link
Member Author

I opened placeholder doc: kubernetes/website#45282

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lead-opted-in Denotes that an issue has been opted in to a release sig/storage Categorizes an issue or PR as relevant to SIG Storage. stage/stable Denotes an issue tracking an enhancement targeted for Stable/GA status
Projects
Status: Tracked
Status: Tracked for Enhancements Freeze
Development

No branches or pull requests