Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PSP Deprecation on Vintage #2533

Closed
4 tasks
Tracked by #1994
Rotfuks opened this issue Jun 5, 2023 · 9 comments
Closed
4 tasks
Tracked by #1994

PSP Deprecation on Vintage #2533

Rotfuks opened this issue Jun 5, 2023 · 9 comments
Assignees
Labels
area/kaas Mission: Cloud Native Platform - Self-driving Kubernetes as a Service team/turtles Team Turtles

Comments

@Rotfuks
Copy link
Contributor

Rotfuks commented Jun 5, 2023

Motivation

With the upgrade to Kubernetes 1.25.0 PodSecurityPolicy is being deprecated and we need to replace it. For the replacement we already decided to go for Kyverno which is owned by shield. Here we need to clarify and moderate that this is already rolled out everywhere and we are safe to get rid of PSP with the upgrade.

Todo

  • Check with Shield on the state of their kyverno rollout
    • Add Kyverno and policies to Workload Clusters ( default-apps ? )
  • Investigate and document what effect this deprecation and new component will have for our V1.25.0 rollout
    • especially important: what do we have to keep in mind in the update process?
  • Investigate what would be a good test case to test for a successful rollout of the new PSS on 1.25.0

Related Issues

Outcome

  • We are prepared for the PSP Deprecation done with the upgrade
  • We are confident our upgrade process does not break stuff because of the psp deprecation
@Rotfuks Rotfuks added area/kaas Mission: Cloud Native Platform - Self-driving Kubernetes as a Service team/turtles Team Turtles labels Jun 5, 2023
@primeroz primeroz self-assigned this Jun 7, 2023
@primeroz
Copy link

primeroz commented Jun 8, 2023

Check with Shield on the state of their kyverno rollout

Kyverno is rolled out with policies in Autdit mode , exceptions need to be created for all Failing Apps

  • Dashboard reporting which workload are failing which policy here
  • Phoenix is already adding exceptions for many apps - Add PSS exceptions for Azure Vintage policy warning #2493
  • We need to review what other teams are doing with their app , to speed this up we might just do the exceptions on those ?
  • QUESTION: Is the plan to switch kyverno policies into ENFORCE mode once all exceptions are in place and applied to all clusters running on 1.24 ?
  • QUESTION: Should we get a conditional rendering of PSP in the APPS based on the kubernetes version ?

Overview of Apps Turtles need to fix

Based on This Dashboard for the set of workloads owned by Turtles defined in This Doc

@primeroz
Copy link

primeroz commented Jun 12, 2023

Investigate and document what effect this deprecation and new component will have for our V1.25.0 rollout

  • We currently only deploy kyverno in the MC Clusters ( this is true for both CAPI and Vintage )
    • We need to install it on WC , alongside the policies, to replace PSP with it
    • Vintage - 19.1.0 will ship kyverno in audit mode , 20 will ship in enforce mode
    • CAPI - Not in progress yet - shield will work on it after vintage
    • We should not add any Kyverno Exception to main branch of any app we install on WC or they will fail to install ( due to lack of kyverno crd )
  • We need to make sure we have updated all our APPS with either exceptions or fixes to manifests
    • TODO: What about customer managed workloads ?
  • Upgrading a cluster to 1.25 right now requires

@primeroz
Copy link

primeroz commented Jun 15, 2023

PSP Conditional in helm charts

from : https://github.com/search?q=org%3Agiantswarm+%22kind%3A+PodSecurityPolicy%22&type=code

Apps, Charts and operators

Customers clusters

@primeroz
Copy link

primeroz commented Jun 15, 2023

ISSUE Check that the capabilities check is actually working

Before moving on i wanted to check that {{- if .Capabilities.APIVersions.Has "policy/v1beta1/PodSecurityPolicy" }} https://github.com/giantswarm/node-exporter-app/pull/202/files is working

so i

  • Created a WC cluster on glippy with 1.24.13 - all apps were installed including node-exporter
  • Upgraded the cluster to 1.25.10
  • Removed the checksum for node-exporter to trigger a reinstall

It is still rendering the PSP ... so the check is failing for some reason

➜ kubectl --kubeconfig /dev/shm/cluster05.kubeconfig get node | grep control-plane
cluster05-control-plane-4f2a33d4-bfzmv   Ready    control-plane   28m   v1.25.10
cluster05-control-plane-4f2a33d4-jf5pb   Ready    control-plane   33m   v1.25.10
cluster05-control-plane-4f2a33d4-qtzwg   Ready    control-plane   38m   v1.25.10

➜ kubectl --kubeconfig /dev/shm/cluster05.kubeconfig get chart -n giantswarm node-exporter -o yaml | yq .status.reason
Deployment failed for `1.16.1` with `unknown-error`: "unable to build kubernetes objects from current release manifest: resource mapping not found for name: \"node-exporter-node-exporter\" namespace: \"kube-system\" from \"\": no matches for kind \"PodSecurityPolicy\" in version \"policy/v1beta1\"\nensure CRDs are installed first"

this is happening even if i try to upgrade the helm release myself

➜ helm --kubeconfig /dev/shm/cluster05.kubeconfig upgrade --dry-run node-exporter -n kube-system . | cat 
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /dev/shm/cluster05.kubeconfig
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /dev/shm/cluster05.kubeconfig
Error: UPGRADE FAILED: unable to build kubernetes objects from current release manifest: resource mapping not found for name: "node-exporter-node-exporter" namespace: "kube-system" from "": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"
ensure CRDs are installed first

does the capabilities check only works on install ?

i tihnk this is because the helm chart was installed with 1.24 and still has the PodSecurityPolicy referenced

➜ helm --kubeconfig /dev/shm/cluster05.kubeconfig get all -n kube-system node-exporter 2>/dev/null | grep -i podsecuritypolicy
kind: PodSecurityPolicy

so even with the capabilities check it fails because that API is not available anymore and it can't compare the current release with the new release ...


apparently is a known issue https://stackoverflow.com/a/74204169

it can't create a diff patch because the go client cannot longer parse the deprecated api


the official solution to this is https://helm.sh/docs/topics/kubernetes_apis/#updating-api-versions-of-a-release-manifest

  • download the secret
  • edit it
  • update the secret

for every chart in every cluster that has a PSP rendered

maybe we can get chart-operator to do some magic when it detects a cluster is 1.25 since there is also an helm plugin for this
https://github.com/helm/helm-mapkubeapis , asking @giantswarm/team-honeybadger ?

otherwise the only other solution is to remove PSP before we upgrade to 1.25

@primeroz
Copy link

I am moving this to blocked since

  • we need to wait for Kyverno to be installed and enforced in all WC Clusters

Once that is done and deployed everywhere we can proceed with the removal of PSP in 1.24 clusters and then start testing upgrade to 1.25

@Rotfuks Rotfuks changed the title PSP Deprecation PSP Deprecation on Vintage Jun 27, 2023
@primeroz
Copy link

primeroz commented Jul 11, 2023

As discussed in slack we will focus on vintage and don't worry about CAPI for now , this means the task is easier and can be summarized in

for Vintage MC and WC

  • We have release with kyerno in 'audit' mode first
  • We have release with kyverno in 'enforce' mode secondly
    • In this release we also remove the PSP from the app that are part of the release
  • We handle the APPS that are not installed as part of the release
  • We upgrade to 1.25

I will put this in blocked until we are ready to start removing PSPs

@stone-z
Copy link
Contributor

stone-z commented Jul 31, 2023

The compatibility matrix lays out the current understanding of when to use PSP vs Kyverno: https://github.com/giantswarm/security-bundle#compatibility-matrix

Basically, GS v20 == k8s v1.25 == kyverno in enforce mode, use security bundle v1.0.0 or above
GS <v20 == k8s <v1.25 == kyverno in audit mode, use security bundle v0.X.X whenever the team is ready

@Rotfuks
Copy link
Contributor Author

Rotfuks commented Aug 3, 2023

@stone-z, @alex-dabija@T-Kukawka and me had a meeting about the PSP Migration Plan and here's the notes for it:

  • We need to put PSS to enforce mode, before we remove PSP
  • We need to remove PSP before we upgrade to K8SV1.25.0
  • We have a Grafana dashboard to see if PSP resources are removed and PSS is in right mode, so this can be a “are you K8SV1.25.0 ready?”-Dashboard
    • But there are some extra apps, that we’re not really seeing in that dashboard
    • It’s not the nicest way to get there, because it’s through port forwarding in the WC where it’s installed - @stone-z will have a look in how we can improve here
  • We could block the V20 release rollout based on the requirements, that PSP has to be removed and PSS has to be in enforce mode without open issues. @stone-z will have a look, how we could do that.
  • We either need to break our rule to have K8S Version changes in major versions (and K8SV1.25.0 in V20) or not having breaking changes in minor versions.
  • We coordinated the following approach
    • V19.1.0 with kyverno in audit mode
    • V19.2.0 with kyverno in enforce mode, we drop PSP from our (default) apps, customers should drop PSP in this version, Kyverno is auditing PSPs, education of customers to not create new PSPs needed.
    • V20 will be blocked until PSPs are completely removed and roll out K8SV1.25.0
    • We can use the MC to test the “No-PSP” Policy and directly upgrade them to K8SV1.25.0

@njuettner
Copy link
Member

Closing the issue for now. From turtles side we updated all apps/components to prepare for PSS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kaas Mission: Cloud Native Platform - Self-driving Kubernetes as a Service team/turtles Team Turtles
Projects
None yet
Development

No branches or pull requests

5 participants