Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] New Fields for Cluster Agent Customization feature #1097

Closed
snasovich opened this issue Mar 31, 2023 · 3 comments
Closed

[RFE] New Fields for Cluster Agent Customization feature #1097

snasovich opened this issue Mar 31, 2023 · 3 comments

Comments

@snasovich
Copy link
Collaborator

Implement TFP-side changes for supporting functionality to be added by rancher/rancher#41035

@a-blender
Copy link
Contributor

a-blender commented Jun 22, 2023

QA Test Template

Issue: #1097

Problem

Cluster/Fleet agent deployment customization is not supported by the TFP.

Solution

Add cluster_agent_deployment_customization and fleet_agent_deployment_customization (and nested) fields tolerations, affinity, and resource requirements to the rancher2 provider so users will be able to define those values for RKE, RKE2/K3s, and EKS clusters.

Testing

Test RC: v3.1.0-rc2

Engineering Testing

This feature has been tested on rke and v2 prov clusters (EKS) by provisioning with both cluster and agent deployment customization, updating the customization, and then removing it and adding it back and making sure the cluster and agent redeploys function as expected.

Manual Testing

Test plan

  • rke single all-role node + agent customization
    • remove agent customization, verify removed
    • add it back, verify customization is added
    • update customization, verified it got updated on the v3 cluster spec and the agent deployment yaml
  • rke2 (v2 prov) single all-role node + agent customization
    • same as above
  • eks 2 node + agent customization
    • same as above

Note: you may not see cluster go into Updating state in the UI for added/updated agent customization or minor updates because the redeploy is fast.

main.tf.rke_agent_customization
terraform {
  required_providers {
    rancher2 = {
      source  = "terraform.local/local/rancher2"
      version = "1.0.0"
    }
  }
}

provider "rancher2" {
  api_url   = var.rancher_api_url
  token_key = var.rancher_admin_bearer_token
  insecure  = true
}

data "rancher2_cloud_credential" "rancher2_cloud_credential" {
  name = var.cloud_credential_name
}

resource "rancher2_cluster" "ablender-rke" {
  name = var.rke_cluster_name
  rke_config {
    kubernetes_version = "v1.26.4-rancher2-1"
    network {
      plugin = var.rke_network_plugin
    }
    services {
      etcd {
        creation = "6h"
        retention = "72h"
      }
    }
    ingress {
      default_backend = "false"
    }
  }
  cluster_auth_endpoint {}
  cluster_agent_deployment_customization {
    append_tolerations {
      effect = "NoSchedule"
      key    = "tolerate/control-plane-test"
      value  = "true"
}
    override_affinity = <<EOF
{
  "nodeAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": {
      "nodeSelectorTerms": [{
        "matchExpressions": [{
          "key": "beta.kubernetes.io/os",
          "operator": "NotIn",
          "values": [
            "windows"
          ]
        }]
      }]
    }
  }
}
EOF
    override_resource_requirements {
      cpu_limit      = "800m"
      cpu_request    = "500m"
      memory_limit   = "800Mi"
      memory_request = "500Mi"
    }
  }
  fleet_agent_deployment_customization {
    append_tolerations {
      effect = "NoSchedule"
      key    = "tolerate/worker-test"
      value  = "true"
    }
override_affinity = <<EOF
{
  "nodeAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": {
      "nodeSelectorTerms": [{
        "matchExpressions": [{
          "key": "not.this/nodepool",
          "operator": "In",
          "values": [
            "true"
          ]
        }]
      }]
    }
  }
}
EOF
    override_resource_requirements {
      cpu_limit      = "800m"
      cpu_request    = "500m"
      memory_limit   = "800Mi"
      memory_request = "500Mi"
    }
  }
}

resource "rancher2_node_template" "rancher2_node_template" {
  name = var.rke_node_template_name
  amazonec2_config {
    access_key     = var.aws_access_key
    secret_key     = var.aws_secret_key
    region         = var.aws_region
    ami            = var.aws_ami
    security_group = [var.aws_security_group_name]
    subnet_id      = var.aws_subnet_id
    vpc_id         = var.aws_vpc_id
    zone           = var.aws_zone_letter
    root_size      = var.aws_root_size
    instance_type  = var.aws_instance_type
  }
}

resource "rancher2_node_pool" "pool1" {
  cluster_id       = rancher2_cluster.ablender-rke.id
  name             = "pool1"
  hostname_prefix  = "tf-pool1-"
  node_template_id = rancher2_node_template.rancher2_node_template.id
  quantity         = 1
  control_plane    = true
  etcd             = true 
  worker           = true 
}
main.tf.eks_agent_customization
terraform {
  required_providers {
    rancher2 = {
      source  = "terraform.local/local/rancher2"
      version = "1.0.0"
    }
  }
}

provider "rancher2" {
  api_url   = var.rancher_api_url
  token_key = var.rancher_admin_bearer_token
  insecure  = true
}

resource "rancher2_cloud_credential" "foo" {
  name = "foo"
  amazonec2_credential_config {
    access_key = var.aws_access_key
    secret_key = var.aws_secret_key
  }
}

resource "rancher2_cluster" "foo" {
  name = var.eks_name
  description = "Terraform EKS cluster"
  eks_config_v2 {
    cloud_credential_id = rancher2_cloud_credential.foo.id
    region              = var.eks_region
    kubernetes_version  = var.eks_kube_version
    subnets             = var.eks_subnets
    security_groups     = var.eks_security_groups
    node_groups {
      name = "node_group1"
      instance_type = var.eks_instance_type
      desired_size  = var.eks_desired_size
      max_size      = var.eks_max_size
      min_size      = var.eks_min_size
    }
    private_access = true
    public_access  = true
  }
  cluster_agent_deployment_customization {
    append_tolerations {
      effect = "NoSchedule"
      key    = "tolerate/worker-test"
      value  = "true"
    }
    override_affinity = <<EOF
{
  "nodeAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": {
      "nodeSelectorTerms": [{
        "matchExpressions": [{
          "key": "not.this/nodepool",
          "operator": "In",
          "values": [
            "true"
          ]
        }]
      }]
    }
  }
}
EOF
    override_resource_requirements {
      cpu_limit      = "800m"
      cpu_request    = "500m"
      memory_limit   = "800Mi"
      memory_request = "500Mi"
    }
	}
fleet_agent_deployment_customization {
    append_tolerations {
      effect = "NoSchedule"
      key    = "tolerate/worker-test"
      value  = "true"
    }
    override_affinity = <<EOF
{
  "nodeAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": {
      "nodeSelectorTerms": [{
        "matchExpressions": [{
          "key": "not.this/nodepool",
          "operator": "In",
          "values": [
            "true"
          ]
        }]
      }]
    }
  }
}
EOF
    override_resource_requirements {
      cpu_limit      = "800m"
      cpu_request    = "500m"
      memory_limit   = "800Mi"
      memory_request = "500Mi"
    }
  }
}

Automated Testing

Automated structure tests were updated for rancher_cluster and rancher_cluster_v2 resources and sub-resources, and new test files were added for

  • toleration_v2 resource
  • resource_requirement resource
  • resource_requirement_v2 resource

Run go clean -testcache && go test -v ./rancher2 and verify all tests pass.

QA Testing Considerations

Regressions Considerations

TF agent customization support touches the create/update/deletion of all rancher_cluster (rke) and rancher_cluster_v2 (RKE2/K3s and hosted providers). Agent customization can be added to AKS/GKE v2 clusters via Terraform so that may be worth checking, but EKS is priority. Please make sure to check rke2 as my local setup was failing for other reasons during testing, as well as an upgrade case.

@a-blender
Copy link
Contributor

@zube zube bot unassigned slickwarren Jun 23, 2023
@zube
Copy link

zube bot commented Jun 23, 2023

thaneunsoo said: ### Test Environment: ###
Rancher version: v2.7-head 0bcf068
Rancher cluster type: HA
Docker version: 20.10

Downstream cluster type: rke2 aws node driver cluster using terraform


Testing:

Tested this issue with the following steps:

  1. Create rke2 aws node driver cluster on v2.7-head using terraform
  2. Add the new fields for cluster agent customization feature
  • clusterAgentDeploymentCustomization
    • append_tolerations
    • override_affinity
    • override_resource_requirements
      • cpu_limit
      • cpu_request
      • memory_limit
      • memory_request
  • fleetAgentDeploymentCustomization
    • append_tolerations
    • override_affinity
    • override_resource_requirements
      • cpu_limit
      • cpu_request
      • memory_limit
      • memory_request
  1. Apply plan

Result
Terraform is taking the new fields and are being applied in the cluster
image.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants