[RFE] New Fields for Cluster Agent Customization feature #1097

snasovich · 2023-03-31T20:06:56Z

Implement TFP-side changes for supporting functionality to be added by rancher/rancher#41035

a-blender · 2023-06-22T14:52:19Z

QA Test Template

Problem

Cluster/Fleet agent deployment customization is not supported by the TFP.

Solution

Add cluster_agent_deployment_customization and fleet_agent_deployment_customization (and nested) fields tolerations, affinity, and resource requirements to the rancher2 provider so users will be able to define those values for RKE, RKE2/K3s, and EKS clusters.

Testing

Test RC: v3.1.0-rc2

Engineering Testing

This feature has been tested on rke and v2 prov clusters (EKS) by provisioning with both cluster and agent deployment customization, updating the customization, and then removing it and adding it back and making sure the cluster and agent redeploys function as expected.

Manual Testing

Test plan

rke single all-role node + agent customization
- remove agent customization, verify removed
- add it back, verify customization is added
- update customization, verified it got updated on the v3 cluster spec and the agent deployment yaml
rke2 (v2 prov) single all-role node + agent customization
- same as above
eks 2 node + agent customization
- same as above

Note: you may not see cluster go into Updating state in the UI for added/updated agent customization or minor updates because the redeploy is fast.

main.tf.rke_agent_customization

terraform {
  required_providers {
    rancher2 = {
      source  = "terraform.local/local/rancher2"
      version = "1.0.0"
    }
  }
}

provider "rancher2" {
  api_url   = var.rancher_api_url
  token_key = var.rancher_admin_bearer_token
  insecure  = true
}

data "rancher2_cloud_credential" "rancher2_cloud_credential" {
  name = var.cloud_credential_name
}

resource "rancher2_cluster" "ablender-rke" {
  name = var.rke_cluster_name
  rke_config {
    kubernetes_version = "v1.26.4-rancher2-1"
    network {
      plugin = var.rke_network_plugin
    }
    services {
      etcd {
        creation = "6h"
        retention = "72h"
      }
    }
    ingress {
      default_backend = "false"
    }
  }
  cluster_auth_endpoint {}
  cluster_agent_deployment_customization {
    append_tolerations {
      effect = "NoSchedule"
      key    = "tolerate/control-plane-test"
      value  = "true"
}
    override_affinity = <<EOF
{
  "nodeAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": {
      "nodeSelectorTerms": [{
        "matchExpressions": [{
          "key": "beta.kubernetes.io/os",
          "operator": "NotIn",
          "values": [
            "windows"
          ]
        }]
      }]
    }
  }
}
EOF
    override_resource_requirements {
      cpu_limit      = "800m"
      cpu_request    = "500m"
      memory_limit   = "800Mi"
      memory_request = "500Mi"
    }
  }
  fleet_agent_deployment_customization {
    append_tolerations {
      effect = "NoSchedule"
      key    = "tolerate/worker-test"
      value  = "true"
    }
override_affinity = <<EOF
{
  "nodeAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": {
      "nodeSelectorTerms": [{
        "matchExpressions": [{
          "key": "not.this/nodepool",
          "operator": "In",
          "values": [
            "true"
          ]
        }]
      }]
    }
  }
}
EOF
    override_resource_requirements {
      cpu_limit      = "800m"
      cpu_request    = "500m"
      memory_limit   = "800Mi"
      memory_request = "500Mi"
    }
  }
}

resource "rancher2_node_template" "rancher2_node_template" {
  name = var.rke_node_template_name
  amazonec2_config {
    access_key     = var.aws_access_key
    secret_key     = var.aws_secret_key
    region         = var.aws_region
    ami            = var.aws_ami
    security_group = [var.aws_security_group_name]
    subnet_id      = var.aws_subnet_id
    vpc_id         = var.aws_vpc_id
    zone           = var.aws_zone_letter
    root_size      = var.aws_root_size
    instance_type  = var.aws_instance_type
  }
}

resource "rancher2_node_pool" "pool1" {
  cluster_id       = rancher2_cluster.ablender-rke.id
  name             = "pool1"
  hostname_prefix  = "tf-pool1-"
  node_template_id = rancher2_node_template.rancher2_node_template.id
  quantity         = 1
  control_plane    = true
  etcd             = true 
  worker           = true 
}

main.tf.eks_agent_customization

terraform {
  required_providers {
    rancher2 = {
      source  = "terraform.local/local/rancher2"
      version = "1.0.0"
    }
  }
}

provider "rancher2" {
  api_url   = var.rancher_api_url
  token_key = var.rancher_admin_bearer_token
  insecure  = true
}

resource "rancher2_cloud_credential" "foo" {
  name = "foo"
  amazonec2_credential_config {
    access_key = var.aws_access_key
    secret_key = var.aws_secret_key
  }
}

resource "rancher2_cluster" "foo" {
  name = var.eks_name
  description = "Terraform EKS cluster"
  eks_config_v2 {
    cloud_credential_id = rancher2_cloud_credential.foo.id
    region              = var.eks_region
    kubernetes_version  = var.eks_kube_version
    subnets             = var.eks_subnets
    security_groups     = var.eks_security_groups
    node_groups {
      name = "node_group1"
      instance_type = var.eks_instance_type
      desired_size  = var.eks_desired_size
      max_size      = var.eks_max_size
      min_size      = var.eks_min_size
    }
    private_access = true
    public_access  = true
  }
  cluster_agent_deployment_customization {
    append_tolerations {
      effect = "NoSchedule"
      key    = "tolerate/worker-test"
      value  = "true"
    }
    override_affinity = <<EOF
{
  "nodeAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": {
      "nodeSelectorTerms": [{
        "matchExpressions": [{
          "key": "not.this/nodepool",
          "operator": "In",
          "values": [
            "true"
          ]
        }]
      }]
    }
  }
}
EOF
    override_resource_requirements {
      cpu_limit      = "800m"
      cpu_request    = "500m"
      memory_limit   = "800Mi"
      memory_request = "500Mi"
    }
	}
fleet_agent_deployment_customization {
    append_tolerations {
      effect = "NoSchedule"
      key    = "tolerate/worker-test"
      value  = "true"
    }
    override_affinity = <<EOF
{
  "nodeAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": {
      "nodeSelectorTerms": [{
        "matchExpressions": [{
          "key": "not.this/nodepool",
          "operator": "In",
          "values": [
            "true"
          ]
        }]
      }]
    }
  }
}
EOF
    override_resource_requirements {
      cpu_limit      = "800m"
      cpu_request    = "500m"
      memory_limit   = "800Mi"
      memory_request = "500Mi"
    }
  }
}

Automated Testing

Automated structure tests were updated for rancher_cluster and rancher_cluster_v2 resources and sub-resources, and new test files were added for

toleration_v2 resource
resource_requirement resource
resource_requirement_v2 resource

Run go clean -testcache && go test -v ./rancher2 and verify all tests pass.

QA Testing Considerations

Regressions Considerations

TF agent customization support touches the create/update/deletion of all rancher_cluster (rke) and rancher_cluster_v2 (RKE2/K3s and hosted providers). Agent customization can be added to AKS/GKE v2 clusters via Terraform so that may be worth checking, but EKS is priority. Please make sure to check rke2 as my local setup was failing for other reasons during testing, as well as an upgrade case.

a-blender · 2023-06-22T21:45:06Z

@slickwarren @thaneunsoo This is ready to test using https://github.com/rancher/terraform-provider-rancher2/releases/tag/v3.1.0-rc2.

zube · 2023-06-23T19:58:28Z

thaneunsoo said: ### Test Environment: ###
Rancher version: v2.7-head 0bcf068
Rancher cluster type: HA
Docker version: 20.10

Downstream cluster type: rke2 aws node driver cluster using terraform

Testing:

Tested this issue with the following steps:

Create rke2 aws node driver cluster on v2.7-head using terraform
Add the new fields for cluster agent customization feature

clusterAgentDeploymentCustomization
- append_tolerations
- override_affinity
- override_resource_requirements
  - cpu_limit
  - cpu_request
  - memory_limit
  - memory_request
fleetAgentDeploymentCustomization
- append_tolerations
- override_affinity
- override_resource_requirements
  - cpu_limit
  - cpu_request
  - memory_limit
  - memory_request

Apply plan

Result
Terraform is taking the new fields and are being applied in the cluster

snasovich added kind/enhancement team/area2 labels Mar 31, 2023

snasovich added this to the 2023-Q2-v2.7x - Terraform milestone Mar 31, 2023

snasovich assigned a-blender Mar 31, 2023

snasovich added the [zube]: Next Up label May 9, 2023

a-blender added [zube]: Working and removed [zube]: Next Up labels May 25, 2023

a-blender mentioned this issue Jun 1, 2023

Terraform rancher2 agent customization #1137

Merged

8 tasks

a-blender added the area/terraform label Jun 5, 2023

slickwarren added the QA/S label Jun 14, 2023

slickwarren self-assigned this Jun 14, 2023

a-blender added the [zube]: Review label Jun 22, 2023

zube bot removed the [zube]: Working label Jun 22, 2023

a-blender added the [zube]: To Test label Jun 22, 2023

zube bot removed the [zube]: Review label Jun 22, 2023

thaneunsoo self-assigned this Jun 22, 2023

zube bot added [zube]: QA Working and removed [zube]: To Test labels Jun 22, 2023

zube bot unassigned slickwarren Jun 23, 2023

zube bot closed this as completed Jun 23, 2023

zube bot added [zube]: Done and removed [zube]: QA Working labels Jun 23, 2023

a-blender mentioned this issue Jul 25, 2023

Add Terraform docs for v3.1.1 patch release #1175

Merged

4 tasks

zube bot removed the [zube]: Done label Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFE] New Fields for Cluster Agent Customization feature #1097

[RFE] New Fields for Cluster Agent Customization feature #1097

snasovich commented Mar 31, 2023

a-blender commented Jun 22, 2023 •

edited

Loading

a-blender commented Jun 22, 2023

zube bot commented Jun 23, 2023

[RFE] New Fields for Cluster Agent Customization feature #1097

[RFE] New Fields for Cluster Agent Customization feature #1097

Comments

snasovich commented Mar 31, 2023

a-blender commented Jun 22, 2023 • edited Loading

QA Test Template

Problem

Solution

Testing

Engineering Testing

Manual Testing

Automated Testing

QA Testing Considerations

Regressions Considerations

a-blender commented Jun 22, 2023

zube bot commented Jun 23, 2023

Testing:

a-blender commented Jun 22, 2023 •

edited

Loading