Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMPROVEMENT] Support K8s 1.25 by updating removed deprecated resource versions like PodSecurityPolicy #4003

Closed
innobead opened this issue May 19, 2022 · 42 comments
Assignees
Labels
area/kubernetes Kubernetes related like K8s version compatibility highlight Important feature/issue to highlight kind/improvement Request for improvement of existing function priority/0 Must be fixed in this release (managed by PO) release/note Note for installation, upgrade and highlighted issues require/doc Require updating the longhorn.io documentation require/manual-test-plan Require adding/updating manual test cases if they can't be automated
Milestone

Comments

@innobead
Copy link
Member

innobead commented May 19, 2022

Is your improvement request related to a feature? Please describe

PodSecurityPolicy has been deprecated and will be removed from K8s 1.25, so we need to find an alternative way to resolve the need for PSP in Longhorn to support 1.25.

Also, some deprecated resource versions are also removed from 1.25. Need to resolve this via #4239 or even consider to backport this to 1.3 & 1.2 via an adaptive way to determine the K8s version of the cluster to use which API resource version if possible (except PSP, because it's totally removed instead of version bump).

  • Cronjob v1beta1 -> v1
  • EndpointSlice v1beta1 -> v1
  • Event v1beta1 -> v1
  • HorizontalPodAutoscaler v2beta1 -> v2
  • PodDisruptionBudget v1beta1 -> removed

Note: client-go is backward compatible with K8s any version.

Compatibility: client-go <-> Kubernetes clusters
Since Kubernetes is backwards compatible with clients, older client-go versions will work with many different Kubernetes cluster versions.

Describe the solution you'd like

Deprecate PSP if it's not needed. Otherwise, we need an alternative solution like https://kubernetes.io/docs/tasks/configure-pod-container/migrate-from-psp/.

Describe alternatives you've considered

N/A

Additional context

https://www.kubernetes.dev/resources/release/#timeline
https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-25
#4239

@innobead innobead added area/kubernetes Kubernetes related like K8s version compatibility priority/0 Must be fixed in this release (managed by PO) kind/improvement Request for improvement of existing function backport/1.3.1 labels May 19, 2022
@innobead innobead added this to the v1.4.0 milestone May 19, 2022
@innobead
Copy link
Member Author

This needs to consider to backport.

@mattmeye
Copy link

This was released in Longhorn v1.2.5-rc1 release but not in Longhorn v1.2.5 release. Is there an ETA for an official release?

@innobead
Copy link
Member Author

This was released in Longhorn v1.2.5-rc1 release but not in Longhorn v1.2.5 release. Is there an ETA for an official release?

It was wrongly mentioned in the rc release note. It is planned for 1.4.0 instead, the end of this year.

@daimoniac
Copy link

dont' forget CronJobs #4490 (comment)

@innobead
Copy link
Member Author

innobead commented Sep 13, 2022

dont' forget CronJobs #4490 (comment)

Thanks 👍 , all deprecated resources will be taken care of together. cc @PhanLe1010

@skandragon
Copy link

I managed to upgrade to kubernetes 1.25 on several nodes before I noticed the issue, so I'm hoping a release comes out that will handle that version as rolling back isn't an option.

@mattmeye
Copy link

@skandragon I have exactly the same problem too. However, I cannot wait until the end of the year and unfortunately have to set up the cluster again.

@PurpleBooth
Copy link

PurpleBooth commented Sep 21, 2022

It was wrongly mentioned in the rc release note. It is planned for 1.4.0 instead, the end of this year.

Are you sure about this scheduling, by the end of the year, that is Tuesday 6th December 2022, 1.26 will be out?

@innobead
Copy link
Member Author

Knowing people are eager for this, but for now, the plan is still the same, because Longhorn has its release scope and cadence.

Please keep following up on the update here, and we will work on this soon or later.

@skandragon
Copy link

@innobead what is required to do beyond just changing the API calls? I know that means version 1.3 would be locked to 1.25 or higher; the better option is likely to try to use the proper API calls depending on what version was in use, I suppose?

I know I was able to just change the betav1 magic to v1 and it just worked as far as I could tell.

A bigger issue for me with longhorn is unrelated, and that's that it won't run under Talos. :/

@BjoernPetersen
Copy link

Just want to mention that the wording of the issue is misleading.

PodSecurityPolicy will be deprecated from K8s 1.25 [...]

PodSecurityPolicy has been deprecated since K8s 1.21 and will be removed in 1.25.

@kjaleshire
Copy link

That's really unfortunate, Longhorn will be unusable by the most recent Kubernetes release for the next two months.

I encourage you to reconsider this release plan.

@dominch
Copy link

dominch commented Oct 5, 2022

I just hit this issue on k3s (latest channel) and needed to revert to 1.24 (where other coredns bug is present)
This should happen ASAP because 1.25 is on the way and any service based on longhorn will fail :(

@innobead innobead added the highlight Important feature/issue to highlight label Oct 5, 2022
@MohammedNoureldin
Copy link

MohammedNoureldin commented Oct 26, 2022

(subjected to change any time)

@PhanLe1010 I know that you mean to an earlier date ;)

Maybe not really relevant question, but adding a GitHub repo and installing this chart is not really something that works out of the box, and no link online helped so far. Could you summarize that in 2 lines of bash commands?

@PhanLe1010
Copy link
Contributor

The bash command would be like
git clone https://github.com/longhorn/longhorn.git && cd longhorn && helm install longhorn ./chart --namespace longhorn-system --create-namespace

Another alternative is kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml

@MohammedNoureldin
Copy link

Oh, thanks! I thought it is possible to install it using Helm from a GitHub repo on the fly, but it turned out that I need to clone it. I will try it now. Thanks a lot!

@dkrizic
Copy link

dkrizic commented Oct 26, 2022

I just want to confirm that the helm chart @ master works excellently on Kubernetes 1.25.3 with the default settings. I am now waiting for the release 1.4.0

@chriscchien
Copy link
Contributor

chriscchien commented Oct 28, 2022

Verified in longhorn master-head 777bc5 with test steps
Result Pass

cc @PhanLe1010 , @khushboo-rancher


Case 1: within a Kubernetes version

  • Test basic recurring snapshot/backup jobs are working correctly
Kubernetes Version Pipeline Link Verify Result
v1.25.3+k3s1 Verified on local environment ✔️
v1.24.7+k3s1 https://ci.longhorn.io/job/private/job/longhorn-tests-regression/2239/ ✔️
v1.23.1+k3s2 https://ci.longhorn.io/job/public/job/master/job/sles/job/amd64/job/longhorn-tests-sles-amd64/292/testReport/tests/test_recurring_job/ ✔️
v1.22.15+k3s1 https://ci.longhorn.io/job/private/job/longhorn-tests-regression/2238/ ✔️
v1.21.10+k3s1 https://ci.longhorn.io/job/private/job/longhorn-tests-regression/2237/ ✔️

  • Test generating support bundle generation. Make sure that the above jobs are in /yamls/kubernetes/cronjobs folder
Kubernetes Version Verify Result
v1.25.3+k3s1 ✔️
v1.24.7+k3s1 ✔️
v1.23.13+k3s1 ✔️
v1.22.15+k3s1 ✔️
v1.21.10+k3s1 ✔️

  • Test upgrade Longhorn from 1.3.2 or v1.2.5 to master

Test basic functionalities (workload, upgrade engine image, replica rebuilding)

Kubernetes Version Upgrade from v1.3.2 Upgrade from v1.2.5
v1.25.3+k3s1 NA NA
v1.24.7+k3s1 ✔️ ✔️
v1.23.13+k3s1 ✔️ ✔️
v1.22.15+k3s1 ✔️ ✔️
v1.21.10+k3s1 ✔️ ✔️

Verify CSI snapshot upgrade (by pytest test_csi_snapshotter.py after upgrade)

Kubernetes Version Upgrade from v1.3.2 Upgrade from v1.2.5
v1.25.3+k3s1 NA NA
v1.24.7+k3s1 ✔️ ✔️
v1.23.13+k3s1 ✔️ ✔️
v1.22.15+k3s1 ✔️ ✔️
v1.21.10+k3s1 ✔️ ✔️

  • Test zone and topology,
Kubernetes Version Verify Result
v1.25.3+k3s1 ✔️
v1.24.7+k3s1 ✔️
v1.23.13+k3s1 ✔️
v1.22.15+k3s1 ✔️
v1.21.10+k3s1 ✔️

  • Test uninstallation
Kubernetes Version Verify Result
v1.25.3+k3s1 ✔️
v1.24.7+k3s1 ✔️
v1.23.13+k3s1 ✔️
v1.22.15+k3s1 ✔️
v1.21.10+k3s1 ✔️

Case 2: Test a Kubernetes version upgrade

  • Update from v1.24.7+k3s1 to v1.25.3+k3s1 and workload still working properly
  1. Install Longhorn master in Kubernetes v1.24 cluster
  2. Create some workload, Longhorn volumes, cronjobs, etc...
  3. Upgrade Kubernetes to v.125
  4. Verify that Longhorn is running ok. Basic functionality is working ok

@khushboo-rancher
Copy link
Contributor

@chriscchien Could you just test one more scenario, deploying Longhorn on a hardened cluster?

@abderraxim
Copy link

abderraxim commented Oct 29, 2022

Head Chart installs perfectly and everything looks fine
N.B.: please include the dev chart in your repository

MicroK8s 1.25.3

@nabeelshaikh7
Copy link

and some how installing longhorn crashed treafik in my k3s cluster

@chriscchien
Copy link
Contributor

Hi @khushboo-rancher ,

I tried deploy Longhorn master on hardened cluster but the deploy not scuccessed.
Deployment longhorn-driver-deployer raised error and most Longhorn component pod not deployed

[FailedCreate] pods "longhorn-driver-deployer-f9546bd9b-" is forbidden: PodSecurityPolicy: unable to admit pod: [spec.initContainers[0].securityContext.runAsUser: Invalid value: 0: running with the root UID is forbidden spec.containers[0].securityContext.runAsUser: Invalid value: 0: running with the root UID is forbidden]

@yangchiu
Copy link
Member

yangchiu commented Nov 4, 2022

Hi @khushboo-rancher ,

I tried deploy Longhorn master on hardened cluster but the deploy not scuccessed. Deployment longhorn-driver-deployer raised error and most Longhorn component pod not deployed

[FailedCreate] pods "longhorn-driver-deployer-f9546bd9b-" is forbidden: PodSecurityPolicy: unable to admit pod: [spec.initContainers[0].securityContext.runAsUser: Invalid value: 0: running with the root UID is forbidden spec.containers[0].securityContext.runAsUser: Invalid value: 0: running with the root UID is forbidden]

@chriscchien Would you mind to share the test steps here?

@PhanLe1010
Copy link
Contributor

PhanLe1010 commented Nov 4, 2022

Probably need to enable the PSP on this hardened cluster. Can you switch this field to true

longhorn/chart/values.yaml

Lines 253 to 255 in 23d2641

# For Kubernetes < v1.25, if your cluster enables Pod Security Policy admission controller,
# set this to `true` to ship longhorn-psp which allow privileged Longhorn pods to start
enablePSP: false

@chriscchien
Copy link
Contributor

chriscchien commented Nov 5, 2022

Hi @PhanLe1010

Using helm to install Longhorn master head by command helm install longhorn ./chart -n longhorn-system which enablePSP set to true can deploy/uninstall Longhorn and workload worked well on the hardened cluster

For use kubectl to deploy Longhorn, I tried add below section into longhon.yaml to deploy but still have problem in longhorn-driver-deployer

---
# Source: longhorn/templates/psp.yaml
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: longhorn-psp
  labels:
    app.kubernetes.io/name: longhorn
    app.kubernetes.io/instance: longhorn
    app.kubernetes.io/version: v1.4.0-dev
spec:
  privileged: true
  allowPrivilegeEscalation: true
  requiredDropCapabilities:
  - NET_RAW
  allowedCapabilities:
  - SYS_ADMIN
  hostNetwork: false
  hostIPC: false
  hostPID: true
  runAsUser:
    rule: RunAsAny
  seLinux:
    rule: RunAsAny
  fsGroup:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  volumes:
  - configMap
  - downwardAPI
  - emptyDir
  - secret
  - projected
  - hostPath
---

cc @khushboo-rancher, @yangchiu

@PhanLe1010
Copy link
Contributor

PhanLe1010 commented Nov 5, 2022

longhorn.yaml is generated by running helm template . The default Helm value enablePSP is false so Helm template does not generate PSP yaml into longhorn.yaml. This behavior is expected because we are moving away from PSP. However, it is a bit inconvenient for the user using old hardened clusters with PSP enabled, therefore we have 2 options:

  1. Document and pointing the user to PSP yaml
  2. Put the PSP yaml under the comment section into the longhorn.yaml. This method would require us to modify the generate-longhorn-yaml.sh

cc @innobead Which option would you suggest from PM perspective?

@PhanLe1010
Copy link
Contributor

For use kubectl to deploy Longhorn, I tried add below section into longhon.yaml to deploy but still have problem in longhorn-driver-deployer

You also need to add PSP into longhorn-service-account. AKA, you need to put this whole section into longhorn.yaml

@chriscchien
Copy link
Contributor

chriscchien commented Nov 5, 2022

Thank you @PhanLe1010 ,

Deploy Longhorn on hardened cluster by kubectl and helm all worked well( including steps install, create/scale workload and uninstall).

@innobead
Copy link
Member Author

innobead commented Nov 7, 2022

longhorn.yaml is generated by running helm template . The default Helm value enablePSP is false so Helm template does not generate PSP yaml into longhorn.yaml. This behavior is expected because we are moving away from PSP. However, it is a bit inconvenient for the user using old hardened clusters with PSP enabled, therefore we have 2 options:

  1. Document and pointing the user to PSP yaml
  2. Put the PSP yaml under the comment section into the longhorn.yaml. This method would require us to modify the generate-longhorn-yaml.sh

cc @innobead Which option would you suggest from PM perspective?

I think we need to do both for users who use kubectl to apply the manifest directly. If in the future, it becomes complicated, we can use kustomize instead, but right now it's enough to have a separate manifest for < 1.25.

@PhanLe1010 Could you help with this? Thanks.

krittin added a commit to krittin/homeassistant-on-raspberrypi that referenced this issue Nov 11, 2022
axeII added a commit to axeII/home-ops that referenced this issue Nov 28, 2022
This was referenced Nov 28, 2022
@innobead innobead added the release/note Note for installation, upgrade and highlighted issues label Nov 30, 2022
valtzu added a commit to valtzu/rpi-images that referenced this issue Dec 11, 2022
@chriscchien
Copy link
Contributor

chriscchien commented Dec 14, 2022

In addition, tested upgrade Longhorn from old release to v1.40-rc1 then upgrade Kubernetes version to v1.25
Below scenario were all passed

In v1.24.7+k3s1

  • Upgrade Longhorn from v1.2.6 to v1.4.0-rc1 then upgrade kubernetes to v1.25.4+k3s1
  • Upgrade Longhorn from v1.3.2 to v1.4.0-rc1 then upgrade kubernetes to v1.25.4+k3s1

In v1.23.11+k3s1

  • Upgrade Longhorn from v1.2.6 to v1.4.0-rc1 then upgrade kubernetes to v1.25.4+k3s1
  • Upgrade Longhorn from v1.3.2 to v1.4.0-rc1 then upgrade kubernetes to v1.25.4+k3s1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubernetes Kubernetes related like K8s version compatibility highlight Important feature/issue to highlight kind/improvement Request for improvement of existing function priority/0 Must be fixed in this release (managed by PO) release/note Note for installation, upgrade and highlighted issues require/doc Require updating the longhorn.io documentation require/manual-test-plan Require adding/updating manual test cases if they can't be automated
Projects
None yet
Development

No branches or pull requests