Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] Add support for PodSecurityAdmissionConfigurationTemplate #1112

Closed
mouellet opened this issue May 1, 2023 · 14 comments
Closed

[RFE] Add support for PodSecurityAdmissionConfigurationTemplate #1112

mouellet opened this issue May 1, 2023 · 14 comments

Comments

@mouellet
Copy link
Contributor

mouellet commented May 1, 2023

Is your feature request related to a problem? Please describe.

Rancher v2.7.2 now comes w/ a new CRD for Pod Security Admission (PSA) Configuration Templates that can only be provisioned manually.

Describe the solution you'd like

  1. Add a new resource/datasource for rancher2_pod_security_admission_configuration_template.
  2. Add a the field defaultPodSecurityAdmissionConfigurationTemplateName to the rancher_cluster and rancher_cluster_v2 resources.

Describe alternatives you've considered

Additional context

SURE-6290

@lazyfrosch
Copy link
Contributor

lazyfrosch commented May 8, 2023

Anyone working on that? I noticed that setting profile cis-1.23 via API, it will not set the PSACT to rancher-restricted. Defaulting that seems to be a feature from the UI only.

So updating to 1.25 would require kubernetes_version, rke_config.machine_selector_config.config.profile and default_pod_security_admission_configuration_template_name once implemented.

Without a CIS profile it will work of course, but the RKE2 default under cis-1.23 will block Rancher add-ons to be deployed.

@nickvth
Copy link

nickvth commented May 25, 2023

Q3? This will be to late, can this issue be moved to Q2?

@frankbou
Copy link

frankbou commented May 30, 2023

I configured a PSA template manually on a RKE1 test cluster by using the Rancher UI.
That RKE test cluster has been previously created by using Terraform.

After the PSA template change, the next time I did a "terraform apply -refresh-only", i got an error returned by Terraform.

_Error: rke_config.0.services.0.kube_api.0.admission_configuration.plugins: '' expected type 'string', got unconvertible type '[]interface {}', value: '[map[configuration:map[apiVersion:pod-security.admission.config.k8s.io/v1beta1 defaults:map[audit:restricted audit-version:latest enforce:restricted enforce-version:latest warn:restricted warn-version:latest] exemptions:map[namespaces:[calico-apiserver calico-system cattle-alerting cattle-csp-adapter-system cattle-epinio-system cattle-externalip-system cattle-fleet-local-system cattle-fleet-system cattle-gatekeeper-system cattle-global-data cattle-global-nt cattle-impersonation-system cattle-istio cattle-istio-system cattle-logging cattle-logging-system cattle-monitoring-system cattle-neuvector-system cattle-prometheus cattle-sriov-system cattle-system cattle-ui-plugin-system cattle-windows-gmsa-system cert-manager cis-operator-system fleet-default ingress-nginx istio-system kube-node-lease kube-public kube-system longhorn-system rancher-alerting-drivers security-scan tigera-operator]] kind:PodSecurityConfiguration] name:PodSecurity path:]]'
with module.cluster.module.rke.rancher2_cluster.this,
on ../../modules/rancher/cluster/main.tf line 1, in resource "rancher2_cluster" "this":
1: resource "rancher2_cluster" "this" {__

I think this means that if you configure a PSA template on your cluster manually by using the Rancher UI, it may cause issues with existing Terraform deployments.

@nickvth
Copy link

nickvth commented Jul 6, 2023

pull request #1119, maybe someone can review this.

@papanito
Copy link

papanito commented Jul 6, 2023

Thus would be really urgent for me cause it basically blocks us from upgrading AND using terraform, especially also cause the use of ignore_changes is not possible in this instance

The underlying type is not supported, so Terraform fails to parse the admission_configuration from Rancher when reading the current cluster state, before processing any ignore_changes.

So we either have to wait for a fix or we have to remove the cluster from tf state and manually manage them again

@a-blender
Copy link
Contributor

a-blender commented Jul 19, 2023

QA Test Template

RKE is ready to test and RKE2 is still being worked on. Test the RKE case using this test plan and TF rc v3.1.0-rc5.

Critical test case: please also verify that not setting the admission_configuration field in the earlier version of Terraform and then upgrading to v3.1.0-rc5 does not throw a schema error. When upgrading to the latest Terraform, you should see this change in the terraform.tfstate file:

tfstate on previous tfp version:

...
"kube_api": [
                      {
                        "admission_configuration": {},
...

tfstate with updated tfp version:

...
"kube_api": [
                      {
                        "admission_configuration": [],
...

Verify no errors whatsoever are seen.

Doing this will verify that the state migration logic to change the admission_configuration PSA field from a Type.Map to a complex type does not cause regressions for users with clusters that were provisioned using an earlier version of Terraform. Reach out with any questions!

@Josh-Diamond
Copy link
Contributor

Josh-Diamond commented Jul 20, 2023

Ticket #1112 - Test Results - ✅ - [for RKE implementation only]

Verified w/ HA Helm on Rancher v2.7-95f0b50ddf387c0d98a211f0217b34207348502d-head:

Scenario Test Case Result
1. Provisioned downstream RKE 1.24 cluster w/ tfp-rancher2 v1.24.1 => configure PSA => upgrade tfp-rancher2 to local build of v3.1.0-rc5 => refactor main.tf file to accommodate admission_configuration block => terraform refresh => ensure no errors
2. Provisioned downstream RKE 1.26 cluster w/ tfp-rancher2 v1.24.1 => configure PSA => upgrade tfp-rancher2 to local build of v3.1.0-rc5 => refactor main.tf file to accommodate admission_configuration block => terraform refresh => ensure no errors
3. Import downstream RKE 1.24 cluster w/ admission_configuration set w/ tfp-rancher2 v1.24.1 => upgrade tfp-rancher2 to local build of v3.1.0-rc5 => refactor main.tf file to accommodate admission_configuration block => terraform refresh => ensure no errors
4. Import downstream RKE 1.26 cluster w/ admission_configuration set w/ tfp-rancher2 v1.24.1 => upgrade tfp-rancher2 to local build of v3.1.0-rc5 => refactor main.tf file to accommodate admission_configuration block => terraform refresh => ensure no errors
5. Provisioned downstream RKE 1.26 cluster w/ tfp-rancher2 v1.24.1 => upgrade tfp-rancher2 to local build of v3.1.0-rc5 => terraform refresh => ensure no errors

Scenario 1 -

  1. Fresh install of Rancher v2.7-head
  2. Using tfp-rancher2 v1.24.1, provision a downstream RKE 1.24 cluster
  3. Once active, configure PSA - [steps outlined here]
  4. Once active, create a new namespace and attempt to deploy a workload
  5. Verified - workload failed due to restricted policy; as expected
  6. Upgrade tfp-rancher2 to use local build of v3.1.0-rc5
  7. Refactor main.tf to use RFE admission_configuration block under resource rancher2_cluster > rke_config > services > kube_api
  8. Refresh terraform by running terraform apply -refresh-only
  9. Verified - terraform successfully refreshes; tf.state file as expected;

Scenario 2 -

  1. Fresh install of Rancher v2.7-head
  2. Using tfp-rancher2 v1.24.1, provision a downstream RKE 1.26 cluster
  3. Once active, configure PSA - [steps outlined here]
  4. Once active, create a new namespace and attempt to deploy a workload
  5. Verified - workload failed due to restricted policy; as expected
  6. Upgrade tfp-rancher2 to use local build of v3.1.0-rc5
  7. Refactor main.tf to use RFE admission_configuration block under resource rancher2_cluster > rke_config > services > kube_api
  8. Refresh terraform by running terraform apply -refresh-only
  9. Verified - terraform successfully refreshes; tf.state file as expected;

Scenario 3 -

  1. Fresh install of Rancher v2.7-head
  2. Spin up standalone RKE 1.24 cluster w/ admission_configuration set in cluster.yml - [i used rke v1.4.7]
  3. Once active, use tfp-rancher2 v1.24.1 to import the cluster into Rancher
  4. View Cluster Explorer and create a new namespace + deployment
  5. Verified - deployment fails due to restricted policy; as expected
  6. Upgrade tfp-rancher2 to use local build of v3.1.0-rc5
  7. Refresh terraform by running terraform apply -refresh-only
  8. Verified - terraform successfully refreshes; tf.state file as expected

Scenario 4 -

  1. Fresh install of Rancher v2.7-head
  2. Spin up standalone RKE 1.26 cluster w/ admission_configuration set in cluster.yml - [i used rke v1.4.7]
  3. Once active, use tfp-rancher2 v1.24.1 to import the cluster into Rancher
  4. View Cluster Explorer and create a new namespace + deployment
  5. Verified - deployment fails due to restricted policy; as expected
  6. Upgrade tfp-rancher2 to use local build of v3.1.0-rc5
  7. Refresh terraform by running terraform apply -refresh-only
  8. Verified - terraform successfully refreshes; tf.state file as expected

Scenario 5 -

  1. Fresh install of Rancher v2.7-head
  2. Using tfp-rancher2 v1.24.1, provision a downstream RKE 1.26 cluster
  3. Once active, upgrade tfp-rancher2 to use local build of v3.1.0-rc5 - [terraform init -upgrade]
  4. Refresh terraform by running terraform apply -refresh-only
  5. Verified - terraform successfully refreshes; tf.state file as expected

@Josh-Diamond
Copy link
Contributor

Josh-Diamond commented Jul 20, 2023

RKE test cases pass 🎉

Will close out this ticket once RKE2 has been tested + validated

lazyfrosch added a commit to lazyfrosch/terraform-provider-rancher2 that referenced this issue Jul 21, 2023
@a-blender
Copy link
Contributor

a-blender commented Jul 24, 2023

@Josh-Diamond New rc v3.1.0-rc6 has been cut for PSA on rke2 clusters, waiting on build. Will add a test template shortly, stand by

@a-blender
Copy link
Contributor

a-blender commented Jul 24, 2023

QA Test Template

@Josh-Diamond Please test this issue using this test plan on both an RKE2 1.25 and 1.26 cluster. It might also be a good idea to provision a 1.24 cluster with Terraform, upgrade it to 1.25 and then define the PSA template and kube-apiserver-arg in an additional apply since PSACT is supported in 1.25+ clusters and we want to make sure there's no issues with old clusters provisioned with Terraform.

@Josh-Diamond
Copy link
Contributor

Ticket #1112 - Test Results - ✅ - [for RKE2/K3s implementation]

Verified with HA Helm on Rancher v2.7-dc105f0827ef217e2b4ff0f99f48c998d62aefce-head:

Scenario Test Case Result
1. Provision a downstream RKE2 v1.26 cluster w/ PSACT configured
2. Provision a downstream K3s v1.25 cluster w/ PSACT configured
3. Provision a downstream RKE2 v1.24 cluster => upgrade k8s version to v1.25 => then configure PSA

Scenario 1 -

  1. Fresh install of Rancher v2.7-head
  2. Using tfp-rancher2 v3.1.0-rc6, provision a downstream RKE2 (AWS) Node driver cluster v1.26.6+rke2r1, with PSACT set to rancher-restricted
  3. Once active, navigate to Cluster Explorer > Projects/Namespaces and create a new namespace, then deploy a workload under that newly created namespace
  4. Verified - workload fails to deploy; forbidden error due to restricted policy; as expected
  5. Add a new node
  6. Verified - node successfully added + active; tf does not try to reconcile changes to kube-apiserver-arg (which is set by mutating webhook)
  7. Update PSACT from rancher-restricted to rancher-privileged
  8. Verified - cluster successfully updates
  9. Navigate to Cluster Explorer > Projects/Namespaces and create a new namespace, then deploy a workload under that newly created namespace
  10. Verified - workload successfully deploys; as expected

Scenario 2 -

  1. Fresh install of Rancher v2.7-head
  2. Using tfp-rancher2 v3.1.0-rc6, provision a downstream K3s (AWS) Node driver cluster v1.25.11+k3s1, with PSACT set to rancher-restricted
  3. Once active, navigate to Cluster Explorer > Projects/Namespaces and create a new namespace, then deploy a workload under that newly created namespace
  4. Verified - workload fails to deploy; forbidden error due to restricted policy; as expected
  5. Add a new node
  6. Verified - node successfully added + active; tf does not try to reconcile changes to kube-apiserver-arg (which is set by mutating webhook)
  7. Update PSACT from rancher-restricted to rancher-privileged
  8. Verified - cluster successfully updates
  9. Navigate to Cluster Explorer > Projects/Namespaces and create a new namespace, then deploy a workload under that newly created namespace
  10. Verified - workload successfully deploys; as expected

Scenario 3 -

  1. Fresh install of Rancher v2.7-head
  2. Using tfp-rancher2 v3.1.0-rc6, provision a downstream RKE2 (AWS) Node driver cluster v1.24.15+rke2r1
  3. Once active, using tfp-rancher2 v3.1.0-rc6, upgrade the cluster's k8s version to v1.25.11+rke2r1
  4. Once active, configure PSACT and set it to rancher-restricted
  5. Once active, navigate to Cluster Explorer > Projects/Namespaces` and create a new namespace, then deploy a workload under that newly created namespace
  6. Verified - workload fails to deploy; forbidden error due to restricted policy; as expected
  7. Add a new node
  8. Verified - node successfully added + active; tf does not try to reconcile changes to kube-apiserver-arg (which is set by mutating webhook)
  9. Update PSACT from rancher-restricted to rancher-privileged
  10. Verified - cluster successfully updates
  11. Navigate to Cluster Explorer > Projects/Namespaces and create a new namespace, then deploy a workload under that newly created namespace
  12. Verified - workload successfully deploys; as expected
  13. Create a custom PSACT - (use baseline for admission control mode)
  14. Update PSACT from rancher-privileged to the custom PSACT created in step 11
  15. Verified - cluster successfully updates to custom PSACT; as expected

@Josh-Diamond
Copy link
Contributor

In addition to the above outlined test cases, I've verified that w/ a downstream RKE cluster, you are able to successfully update the PSA using tfp-rancher2 v3.1.0-rc6 .. (i.e. privileged => restricted)

@Josh-Diamond
Copy link
Contributor

All test cases pass 🎉 closing out this issue

@a-blender
Copy link
Contributor

a-blender commented Aug 3, 2023

Per the requested RFE, we have done

- [x] Add a new resource/datasource for rancher2_pod_security_admission_configuration_template - this will be addressed in #1189

This means PSACT support for RKE and RKE2 clusters in Terraform is currently available where a user can configure an admission_configuration policy in RKE or set the pod_security_policy_admission_configuration_template_name for any cluster type. However, due to the timeline within this feature was requested, a new template resource still needs to be implemented.

If you wish to configure PSACT in Terraform with a custom admission configuration template, an easy workaround is to create the template in Rancher and then set the template name in your Terraform config file. Plans to implement the new resource will be logged in a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests