Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add PSACT to cluster_v2 resource and data-source #1117

Merged
merged 1 commit into from Jul 24, 2023

Conversation

lazyfrosch
Copy link
Contributor

@lazyfrosch lazyfrosch commented May 10, 2023

Issue: #1112 (partial implementation for RKE2)

Feature

Implement setting the default PSACT using Terraform.

Solution

Add a new property to be set using Terraform, similar to default PSP.

Testing

resource "rancher2_cluster_v2" "cluster" {
  name = "test-cluster"

  kubernetes_version = "v1.25.9+rke2r1"

  default_pod_security_admission_configuration_template_name = "rancher-restricted"

  rke_config {
    machine_selector_config {
      config = {
        profile = "cis-1.23"

        protect-kernel-defaults = true
      }
    }
  }
}

Engineering Testing

Manual Testing

Automated Testing

QA Testing Considerations

Regressions Considerations

None expected, if unset, Rancher < 2.7.2 should be fine.

@lazyfrosch
Copy link
Contributor Author

Please let know if I can do anything else to contribute.

@mouellet
Copy link
Contributor

@lazyfrosch, would you mind implementing this for the rancher2_cluster resource as well?

@lazyfrosch
Copy link
Contributor Author

I could look into it, should be an easy change as well. Would love to get some feedback from the maintainers though

@lazyfrosch
Copy link
Contributor Author

Looks like setting PSACT will cause problems with machine_global_config, as enabling PSACT will amend kube-apiserver-arg with the file for the policy.

Currently working on a workaround. It might me time to switch machine_global_config to a structured data type instead of a YAML string.

      ~ rke_config {
          ~ machine_global_config = <<-EOT
                cni: calico
              - kube-apiserver-arg:
              - - admission-control-config-file=/etc/rancher/rke2/config/rancher-psact.yaml
                tls-san:
                - k8s-test-something.example.com
            EOT

@lazyfrosch
Copy link
Contributor Author

A workaround for our module would be something like the following, adding the path for PSACT config file, if PSACT is enabled.

This at least will avoid constant changes being supposed.

Any thoughts?

locals {
  # see https://github.com/rancher/webhook/blob/3cefe89c44f502905b63200d875486d3590b081e/pkg/resources/provisioning.cattle.io/v1/cluster/mutator.go#L160
  rancher_psact_mount_path = "/etc/rancher/rke2/config/rancher-psact.yaml"

  # Default parameter to enable PSACT in RKE2
  # Normally this is set by the rancher-webhook mutation (but we need to set it here for Terraform when a PSACT is selected)
  kube_apiserver_arg_psact = var.default_psa_template != null && var.default_psa_template != "" ? ["admission-control-config-file=${local.rancher_psact_mount_path}"] : []

  kube_apiserver_arg = var.kubernetes_apiserver_use_defaults ? concat(local.kube_apiserver_arg_psact, var.kubernetes_apiserver_args) : var.kubernetes_apiserver_args
}

# ...

resource "rancher2_cluster_v2" "cluster" {
  # ...
  rke_config {
    machine_global_config = yamlencode({
      cni = "calico"
      # disable = [
      #   "rke2-ingress-nginx"
      # ]
      kube-apiserver-arg = local.kube_apiserver_arg
      tls-san = [
        module.vip_control_plane_address.fqdn,
      ]
    })
  }
  # ...
}

@a-blender
Copy link
Contributor

a-blender commented Jul 19, 2023

@lazyfrosch Thanks for opening this! This has already been implemented for the rancher2_cluster resource in PR #1119 so what you have now lgtm.

Did you ever find a workaround for Terraform displaying constant changes with machine_global_config? If we change it to a complex type, I'd recommend that be a PR we test separately. Could you also resolve the test conflicts?

@lazyfrosch
Copy link
Contributor Author

@a-blender I'm doing a workaround in setting kube-apiserver-arg as seen above in our Terraform module for k8s clusters.

The change is done by a MutatingWebhook with the rancher-webhook deployment. It is not easy to implement this properly within the provider.

@a-blender
Copy link
Contributor

a-blender commented Jul 19, 2023

@lazyfrosch Ahh, so the problem is that when PSACT is enabled for a v2 prov cluster via Terraform, Rancher webhook will auto add kube-apiserver-arg with the file for the PSACT policy which causes Terraform to always see changes? Yeah, I doubt it's feasible to change how PSACT works on the Rancher backend. We just want TF to have as much parity with that as we can.

@lazyfrosch
Copy link
Contributor Author

@a-blender exactly, Terraform would try to revert it, while the MutationWebhook is always editing it back.

Part of the problem is the string/YAML format of that field, where it is hard to ignore certain elements changing.

@a-blender a-blender requested review from a-blender, jakefhyde, jiaqiluo and a team and removed request for jiaqiluo and jakefhyde July 19, 2023 15:42
@a-blender
Copy link
Contributor

a-blender commented Jul 19, 2023

@lazyfrosch Yeah, I'm looking at lifecycle ignore_changes but I don't think we can ignore a meta arg within a string. But based on the multi-structural nature of a machine global config, we may want to keep the type as encoded YAML because we don't know what subfields will be string vs list.

Honestly, the more I investigate this the more I like your solution. I think it'd be fine to require setting kube-apiserver-arg as a local var as part of the PSACT configuration.

Discussing options with my team.

@a-blender
Copy link
Contributor

a-blender commented Jul 20, 2023

I confirmed with the following less verbose configuration, when PSACT is set that Terraform doesn't try to apply changes to kube_apiserver_arg.

locals {
  psact_mount_path = "/etc/rancher/rke2/config/rancher-psact.yaml"
  kube_apiserver_arg = ["admission-control-config-file=${local.psact_mount_path}"]
}

resource "rancher2_cluster_v2" "ablender-rke2" {
  # ...
  default_pod_security_admission_configuration_template_name = var.default_psa_template
  rke_config {
    machine_global_config = yamlencode({
      kube-apiserver-arg = local.kube_apiserver_arg
    })

Terraform plan

image

I will open a docs PR for this.

Otherwise lgtm, @lazyfrosch can you please resolve the test file conflicts?

@lazyfrosch
Copy link
Contributor Author

Will rebase it later... the resolve editor in web is not that good 🤣

@lazyfrosch
Copy link
Contributor Author

Rebased

@a-blender
Copy link
Contributor

a-blender commented Jul 21, 2023

@lazyfrosch Amazing thanks. Could you give more details on what/how you tested this? Did you verify that deploying a workload functions properly with rancher-restricted set via TF?

@lazyfrosch
Copy link
Contributor Author

Yeah just create a new namespace and a deployment, it will fail without proper security context settings.

Migration from 1.24 can be a bit tricky for PSP to PSACT reason, but this is not a Rancher specific problem.

When you are on 1.24 and enable PSA, both features will check the resources created.

@a-blender
Copy link
Contributor

@lazyfrosch Gotcha, I'll test on my end as well. Can you update the Testing section of the PR with your test steps in addition to the tf config file?

@lazyfrosch
Copy link
Contributor Author

Will do on Monday

@a-blender
Copy link
Contributor

@lazyfrosch No worries, are trying to get this merged today for the 2.7.5 TF release so I'll test PSA functionality on my end and post my results here. Thanks for all your help.

@a-blender
Copy link
Contributor

a-blender commented Jul 21, 2023

Test Template

Test steps

  • Provision an RKE2 1.26 cluster via Terraform with PSACT enabled
  • Verify cluster and nodes come up active
  • Verify spec has psa template name set to rancher-restricted
  • Verify creating new deployment in the default ns fails
  • Add 1 node -> terraform apply
  • Verify tf does not try to reconcile changes to kube-apiserver-arg that is set by the mutating webhook.
main.tf
terraform {
  required_providers {
    rancher2 = {
      source = "terraform.local/local/rancher2"
      version = "3.1.0-rc6"
    }
  }
}

provider "rancher2" {
  api_url   = var.rancher_api_url 
  token_key = var.rancher_admin_bearer_token
  insecure  = true
}

locals {
  rancher_psact_mount_path = "/etc/rancher/rke2/config/rancher-psact.yaml"

  # Default parameter to enable PSACT in RKE2
  # Normally this is set by rancher-webhook (but we need to set it here for Terraform when PSACT is enabled)
  kube_apiserver_arg = var.default_psa_template != null && var.default_psa_template != "" ? ["admission-control-config-file=${local.rancher_psact_mount_path}"] : []
}

# Create amazonec2 cloud credential
resource "rancher2_cloud_credential" "foo" {
  name = "foo"
  amazonec2_credential_config {
    access_key = var.aws_access_key
    secret_key = var.aws_secret_key
  }
}

# Create amazonec2 machine config v2
resource "rancher2_machine_config_v2" "foo" {
  generate_name = "ablender-machine"
  amazonec2_config {
    ami            = var.aws_ami
    region         = var.aws_region
    security_group = [var.aws_security_group_name]
    subnet_id      = var.aws_subnet_id
    vpc_id         = var.aws_vpc_id
    zone           = var.aws_zone_letter
    root_size      = var.aws_root_size
  }
}

# Create a new rancher v2 amazonec2 RKE2 Cluster v2
resource "rancher2_cluster_v2" "ablender-rke2" {
  name = var.rke2_cluster_name
  kubernetes_version = "v1.26.6+rke2r1"
  enable_network_policy = false
  default_cluster_role_for_project_members = "user"
  default_pod_security_admission_configuration_template_name = var.default_psa_template
  rke_config {
    machine_global_config = yamlencode({
      cni = "calico"
      disable-kube-proxy = false
      etcd-expose-metrics = false
      kube-apiserver-arg = local.kube_apiserver_arg
    })
    machine_pools {
      name = "pool1"
      cloud_credential_secret_name = rancher2_cloud_credential.foo.id
      control_plane_role = true
      etcd_role = true
      worker_role = true
      quantity = 1
      machine_config {
        kind = rancher2_machine_config_v2.foo.kind
        name = rancher2_machine_config_v2.foo.name
      }
    }
  }
}

Cluster is active
image

PSA template name is set correctly in the spec
image

Deployment fails as expected
image

All test cases pass without the CIS profile being explicitly sent via Terraform, but according to the docs setting CIS is not required to verify that PSA functions correctly. This lgtm (with a separate PR to update the docs, but I'd rather get this in now so it can be tested).

@a-blender a-blender requested review from a team and jiaqiluo July 21, 2023 22:28
Copy link
Member

@jiaqiluo jiaqiluo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jiaqiluo jiaqiluo requested a review from a team July 21, 2023 23:31
@a-blender a-blender merged commit 4642c36 into rancher:master Jul 24, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants