Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cli] Loading rke config doesn't work #25416

Closed
mickaeldecastro opened this issue Feb 14, 2020 · 12 comments
Closed

[cli] Loading rke config doesn't work #25416

mickaeldecastro opened this issue Feb 14, 2020 · 12 comments
Assignees
Labels
area/cli internal kind/bug Issues that are defects reported by users or that we know have reached a real release QA/S team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support
Milestone

Comments

@mickaeldecastro
Copy link

mickaeldecastro commented Feb 14, 2020

What kind of request is this (question/bug/enhancement/feature request):
We try to create cluster using rancher cli. When we load a rke configuration with cloud_provider vsphere the virtual_center is not set.

Steps to reproduce (least amount of steps as possible):
Generate RKE configuration with rke binary
Create a cluster using rancher cli with --rke-config option

# rke config --name cluster.yml
# rancher cluster create test --rke-config cluster.yml


- cluster.yml

# If you intened to deploy Kubernetes in an air-gapped environment,
# please consult the documentation on how to configure custom RKE images.
nodes:
- address: "node1"
  port: "22"
  internal_address: ""
  role:
  - controlplane
  - etcd
  hostname_override: master
  user: ubuntu
  #docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: "node2"
  port: "22"
  internal_address: ""
  role:
  - worker
  hostname_override: worker-1
  user: ubuntu
  #docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: "node3"
  port: "22"
  internal_address: ""
  role:
  - worker
  hostname_override: worker-2
  user: '`'
  #docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: "node4"
  port: "22"
  internal_address: ""
  role:
  - worker
  hostname_override: worker-3
  user: ubuntu
  #docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
services:
  etcd:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    external_urls: []
    ca_cert: ""
    cert: ""
    key: ""
    path: ""
    uid: 0
    gid: 0
    snapshot: null
    retention: ""
    creation: ""
    backup_config: null
  kube-api:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    service_cluster_ip_range: 10.43.0.0/16
    service_node_port_range: ""
    pod_security_policy: false
    always_pull_images: false
    secrets_encryption_config: null
    audit_log: null
    admission_configuration: null
    event_rate_limit: null
  kube-controller:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    cluster_cidr: 10.42.0.0/16
    service_cluster_ip_range: 10.43.0.0/16
  scheduler:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
  kubelet:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    cluster_domain: test-cluster
    infra_container_image: ""
    cluster_dns_server: 10.43.0.10
    fail_swap_on: false
    generate_serving_certificate: false
  kubeproxy:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
network:
  plugin: flannel
  options: 
    flannel_backend_type: vxlan
  mtu: 0
  node_selector: {}
authentication:
  strategy: x509
  sans: []
  webhook: null
addons: ""
addons_include: []
system_images:
  etcd: rancher/coreos-etcd:v3.4.3-rancher1
  alpine: rancher/rke-tools:v0.1.52
  nginx_proxy: rancher/rke-tools:v0.1.52
  cert_downloader: rancher/rke-tools:v0.1.52
  kubernetes_services_sidecar: rancher/rke-tools:v0.1.52
  kubedns: rancher/k8s-dns-kube-dns:1.15.0
  dnsmasq: rancher/k8s-dns-dnsmasq-nanny:1.15.0
  kubedns_sidecar: rancher/k8s-dns-sidecar:1.15.0
  kubedns_autoscaler: rancher/cluster-proportional-autoscaler:1.7.1
  coredns: rancher/coredns-coredns:1.6.5
  coredns_autoscaler: rancher/cluster-proportional-autoscaler:1.7.1
  kubernetes: rancher/hyperkube:v1.17.2-rancher1
  flannel: rancher/coreos-flannel:v0.11.0-rancher1
  flannel_cni: rancher/flannel-cni:v0.3.0-rancher5
  calico_node: rancher/calico-node:v3.10.2
  calico_cni: rancher/calico-cni:v3.10.2
  calico_controllers: rancher/calico-kube-controllers:v3.10.2
  calico_ctl: rancher/calico-ctl:v2.0.0
  calico_flexvol: rancher/calico-pod2daemon-flexvol:v3.10.2
  canal_node: rancher/calico-node:v3.10.2
  canal_cni: rancher/calico-cni:v3.10.2
  canal_flannel: rancher/coreos-flannel:v0.11.0
  canal_flexvol: rancher/calico-pod2daemon-flexvol:v3.10.2
  weave_node: weaveworks/weave-kube:2.5.2
  weave_cni: weaveworks/weave-npc:2.5.2
  pod_infra_container: rancher/pause:3.1
  ingress: rancher/nginx-ingress-controller:nginx-0.25.1-rancher1
  ingress_backend: rancher/nginx-ingress-controller-defaultbackend:1.5-rancher1
  metrics_server: rancher/metrics-server:v0.3.6
  windows_pod_infra_container: rancher/kubelet-pause:v0.1.3
ssh_key_path: ~/.ssh/id_rsa
ssh_cert_path: ""
ssh_agent_auth: false
authorization:
  mode: rbac
  options: {}
ignore_docker_version: false
kubernetes_version: "v1.17.2-rancher1-2"
private_registries: []
ingress:
  provider: "nginx"
  options: {}
  node_selector: {}
  extra_args: {}
  dns_policy: ""
  extra_envs: []
  extra_volumes: []
  extra_volume_mounts: []
cluster_name: ""
cloud_provider:
  name: vsphere
  vsphereCloudProvider:
    global:
      insecure-flag: true
    virtual_center:
      IP_VCENTER:
        user: user
        password: password
        datacenters: LOCAL
        port: 443
    workspace:
      server: IP_VCENTER
      datacenter: LOCAL
      folder: kube
      default-datastore: Kubernetes
    disk:
      scsicontrollertype: pvscsi
prefix_path: ""
addon_job_timeout: 0
bastion_host:
  address: ""
  port: ""
  user: ""
  ssh_key: ""
  ssh_key_path: ""
  ssh_cert: ""
  ssh_cert_path: ""
monitoring:
  provider: metrics-server
  options: {}
  node_selector: {}
restore:
  restore: false
  snapshot_name: ""
dns: null


```

**Result:**
Cluster is created but cloud_provider doesn't contain virtual_center configuration. 

```
Version: v3
clusters:
  test:
    answers: {}
    dockerRootDir: /var/lib/docker
    enableNetworkPolicy: false
    localClusterAuthEndpoint: {}
    rancherKubernetesEngineConfig:
      addonJobTimeout: 30
      authentication:
        strategy: x509
      authorization:
        mode: rbac
      bastionHost: {}
      cloudProvider:
        name: vsphere
        vsphereCloudProvider:
          disk:
            scsicontrollertype: pvscsi
          global:
            insecure-flag: true
          network: {}
          workspace:
            datacenter: LOCAL
            default-datastore: Kubernetes
            folder: kube
            server: IP_VCENTER
      ignoreDockerVersion: true
      ingress:
        provider: nginx
      kubernetesVersion: v1.17.2-rancher1-2
      monitoring:
        provider: metrics-server
      network:
        plugin: canal
      nodes:
      - address: node1
        port: "22"
        role:
        - controlplane
        - etcd
        user: ubuntu
      - address: node2
        port: "22"
        role:
        - worker
        user: ubuntu
      - address: node3
        port: "22"
        role:
        - worker
        user: '`'
      - address: node4
        port: "22"
        role:
        - worker
        user: ubuntu
      restore: {}
      services:
        etcd:
          backupConfig:
            enabled: true
            intervalHours: 12
            retention: 6
          creation: 12h
          extraArgs:
            election-timeout: "5000"
            heartbeat-interval: "500"
          retention: 72h
          snapshot: false
        kubeApi: {}
        kubeController: {}
        kubelet: {}
        kubeproxy: {}
        scheduler: {}
```
**Other details that may be helpful:**

Rancher Log
```
2020/02/14 10:01:07 http: TLS handshake error from 127.0.0.1:51690: EOF
2020/02/14 10:01:13 [INFO] Deleting cluster [c-ddtfm]
2020/02/14 10:01:18 [INFO] kontainerdriver rancherkubernetesengine listening on address 127.0.0.1:37387
2020/02/14 10:01:18 [INFO] kontainerdriver rancherkubernetesengine stopped
2020/02/14 10:01:18 [INFO] Deleted cluster [c-ddtfm]
2020/02/14 10:01:18 [INFO] [mgmt-cluster-rbac-remove] Deleting namespace c-ddtfm
2020/02/14 10:01:18 [ERROR] ClusterController c-ddtfm [mgmt-cluster-rbac-remove] failed with : clusters.management.cattle.io "c-ddtfm" not found
2020/02/14 10:01:23 [INFO] [mgmt-project-rbac-remove] Deleting namespace p-2t8th
2020/02/14 10:01:23 [INFO] [mgmt-project-rbac-remove] Deleting namespace p-j7pxl
E0214 10:01:24.045505      58 tokens_controller.go:261] error synchronizing serviceaccount c-ddtfm/default: secrets "default-token-pzlw5" is forbidden: unable to create new content in namespace c-ddtfm because it is being terminated
E0214 10:01:24.055308      58 tokens_controller.go:261] error synchronizing serviceaccount c-ddtfm/default: secrets "default-token-s6cd7" is forbidden: unable to create new content in namespace c-ddtfm because it is being terminated
2020/02/14 10:01:24 [INFO] [mgmt-auth-crtb-controller] Deleting roleBinding clusterrolebinding-86sdz
E0214 10:01:24.070128      58 tokens_controller.go:261] error synchronizing serviceaccount c-ddtfm/default: secrets "default-token-hb65d" is forbidden: unable to create new content in namespace c-ddtfm because it is being terminated
E0214 10:01:24.094973      58 tokens_controller.go:261] error synchronizing serviceaccount c-ddtfm/default: secrets "default-token-pbfmz" is forbidden: unable to create new content in namespace c-ddtfm because it is being terminated
2020/02/14 10:01:26 [INFO] [mgmt-cluster-rbac-delete] Creating namespace c-55mkz
2020/02/14 10:01:26 [INFO] [mgmt-cluster-rbac-delete] Creating Default project for cluster c-55mkz
2020/02/14 10:01:26 [INFO] [mgmt-project-rbac-create] Creating namespace p-c4zqs
2020/02/14 10:01:26 [INFO] [mgmt-cluster-rbac-delete] Creating System project for cluster c-55mkz
2020/02/14 10:01:26 [INFO] [mgmt-project-rbac-create] Creating creator projectRoleTemplateBinding for user user-xmjsf for project p-c4zqs
2020/02/14 10:01:26 [INFO] [mgmt-project-rbac-create] Creating namespace p-djx6b
2020/02/14 10:01:26 [INFO] [mgmt-cluster-rbac-delete] Updating cluster c-55mkz
2020/02/14 10:01:26 [INFO] [mgmt-project-rbac-create] Creating creator projectRoleTemplateBinding for user user-xmjsf for project p-djx6b
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Setting InitialRolesPopulated condition on project p-c4zqs
2020/02/14 10:01:26 [INFO] [mgmt-project-rbac-create] Creating creator clusterRoleTemplateBinding for user user-xmjsf for cluster c-55mkz
2020/02/14 10:01:26 [INFO] [mgmt-project-rbac-create] Updating project p-c4zqs
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating clusterRole p-c4zqs-projectowner
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Setting InitialRolesPopulated condition on project p-djx6b
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating clusterRole p-djx6b-projectowner
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Creating clusterRole c-55mkz-clusterowner
2020/02/14 10:01:26 [INFO] [mgmt-project-rbac-create] Updating project p-djx6b
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating roleBinding for membership in project p-c4zqs for subject user-xmjsf
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Creating clusterRoleBinding for membership in cluster c-55mkz for subject user-xmjsf
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating clusterRole c-55mkz-clustermember
2020/02/14 10:01:26 [INFO] [mgmt-project-rbac-create] Updating project p-c4zqs
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating roleBinding for membership in project p-djx6b for subject user-xmjsf
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Setting InitialRolesPopulated condition on cluster
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Creating role cluster-owner in namespace c-55mkz
2020/02/14 10:01:26 [INFO] [mgmt-cluster-rbac-delete] Updating cluster c-55mkz
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating clusterRoleBinding for membership in cluster c-55mkz for subject user-xmjsf
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating clusterRole c-55mkz-clustermember
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Creating roleBinding for subject user-xmjsf with role cluster-owner in namespace
2020/02/14 10:01:26 [INFO] [mgmt-project-rbac-create] Updating project p-djx6b
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating role project-owner in namespace c-55mkz
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Creating role cluster-owner in namespace p-djx6b
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating roleBinding for subject user-xmjsf with role project-owner in namespace
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Creating roleBinding for subject user-xmjsf with role cluster-owner in namespace
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating role project-owner in namespace p-c4zqs
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Creating role cluster-owner in namespace p-c4zqs
2020/02/14 10:01:26 [ERROR] ProjectRoleTemplateBindingController p-djx6b/creator-project-owner [mgmt-auth-prtb-controller] failed with : clusterroles.rbac.authorization.k8s.io "c-55mkz-clustermember" already exists
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Updating clusterRoleBinding clusterrolebinding-hr5hn for cluster membership in cluster c-55mkz for subject user-xmjsf
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating role admin in namespace p-c4zqs
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating roleBinding for subject user-xmjsf with role project-owner in namespace
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Creating roleBinding for subject user-xmjsf with role cluster-owner in namespace
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating roleBinding for subject user-xmjsf with role project-owner in namespace
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating role project-owner in namespace p-djx6b
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating role admin in namespace p-djx6b
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating roleBinding for subject user-xmjsf with role admin in namespace
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating roleBinding for subject user-xmjsf with role project-owner in namespace
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating roleBinding for subject user-xmjsf with role admin in namespace
2020/02/14 10:01:26 [INFO] [mgmt-cluster-rbac-delete] Updating cluster c-55mkz
2020/02/14 10:01:28 http: TLS handshake error from 127.0.0.1:51750: EOF
E0214 10:01:29.067006      58 tokens_controller.go:261] error synchronizing serviceaccount p-j7pxl/default: secrets "default-token-29m9d" is forbidden: unable to create new content in namespace p-j7pxl because it is being terminated
2020/02/14 10:01:29 [INFO] [mgmt-auth-prtb-controller] Updating owner label for roleBinding clusterrolebinding-kt7vb
E0214 10:01:29.077595      58 tokens_controller.go:261] error synchronizing serviceaccount p-j7pxl/default: secrets "default-token-lz2wg" is forbidden: unable to create new content in namespace p-j7pxl because it is being terminated
2020/02/14 10:01:29 [INFO] [mgmt-auth-prtb-controller] Deleting roleBinding clusterrolebinding-kt7vb
E0214 10:01:29.092664      58 tokens_controller.go:261] error synchronizing serviceaccount p-j7pxl/default: secrets "default-token-kngk5" is forbidden: unable to create new content in namespace p-j7pxl because it is being terminated
E0214 10:01:29.117356      58 tokens_controller.go:261] error synchronizing serviceaccount p-j7pxl/default: secrets "default-token-m6mn4" is forbidden: unable to create new content in namespace p-j7pxl because it is being terminated
2020/02/14 10:01:31 http: TLS handshake error from 127.0.0.1:51760: EOF
2020/02/14 10:01:39 http: TLS handshake error from 127.0.0.1:51848: EOF
2020/02/14 10:01:39 http: TLS handshake error from 127.0.0.1:51852: EOF
2020/02/14 10:01:39 http: TLS handshake error from 127.0.0.1:51850: EOF
2020/02/14 10:01:39 http: TLS handshake error from 127.0.0.1:51854: EOF
2020/02/14 10:01:39 http: TLS handshake error from 127.0.0.1:51842: EOF
2020/02/14 10:01:56 http: TLS handshake error from 127.0.0.1:51862: EOF
2020/02/14 10:01:56 http: TLS handshake error from 127.0.0.1:51864: EOF

```

**Environment information**
- Rancher version (`rancher/rancher`/`rancher/server` image tag or shown bottom left in the UI): rancher  v2.3.5 
User Interface	v2.3.36
Helm	.14.3-rancher1
Machine	v0.15.0-rancher29


<!--
If the reported issue is regarding a created cluster, please provide requested info below
-->

**Cluster information**
- Cluster type (Hosted/Infrastructure Provider/Custom/Imported): vSphere
- Machine type (cloud/VM/metal) and specifications (CPU/memory): VM

gz#15447
SURE-2820
@mickaeldecastro mickaeldecastro changed the title Rancher cli loading rke config Rancher cli loading rke config doesn't work Feb 21, 2020
@mickaeldecastro mickaeldecastro changed the title Rancher cli loading rke config doesn't work [cli] Loading rke config doesn't work Feb 21, 2020
@janeczku janeczku added kind/bug Issues that are defects reported by users or that we know have reached a real release area/cli labels Apr 9, 2020
@maggieliu maggieliu modified the milestones: v2.4.4, v2.4 - Backlog Apr 9, 2020
@janeczku
Copy link
Contributor

janeczku commented Apr 9, 2020

Confirmed in Rancher 2.4.3 with Rancher CLI 2.4.3-rc1

Replication

Runrancher cluster create --rke-config rke.yaml using the following rke.yaml content:

services:
  etcd:
    backup_config:
      enabled: true
      interval_hours: 44
      retention: 44

  kubelet:
    extra_binds:
      - '/var/lib/kubelet/plugins_registry:/var/lib/kubelet/plugins_registry'

Check cluster settings in API

Result: User specified RKE config is not applied

Workaround

Change all key names from snake to camel case, e.g.:

services:
  etcd:
    backupConfig:
      enabled: true
      intervalHours: 3
      retention: 32

  kubelet:
    extraBinds:
      - '/var/lib/kubelet/plugins_registry:/var/lib/kubelet/plugins_registry'

@deniseschannon deniseschannon removed this from the v2.4 - Backlog milestone Jan 29, 2021
@zaggash
Copy link

zaggash commented Mar 3, 2021

Duplicate of #27982

gz#15447

@zaggash zaggash marked this as a duplicate of #27982 Mar 3, 2021
@zaggash
Copy link

zaggash commented Mar 3, 2021

The Workaround is not helping, some keys are not working at all.
If you try to change the network plugin to calico

network:
  plugin: calico  

It never changes and stays by default.

@betweenclouds
Copy link

@zaggash did you tried:
--network-provider value Network provider for the cluster (flannel, canal, calico) (default: "canal")

@maxsokolovsky
Copy link
Contributor

Reproduced with CLI v2.4.13. This is definitely a CLI issue and has to do with how we parse top-level keys in the RKE config file.

I don't yet know why (will inquire), but we convert all top-level keys to JSON format. For example, rancher_kubernetes_engine_config becomes rancherKubernetesEngineConfig. As a result, the original key, due to the modification, is not found by the CLI. It's as if this value is not initialized, and the CLI uses default values for it. The same is true for other top-level objects whose key are modified by the CLI before actually being send to Rancher server.

@maxsokolovsky
Copy link
Contributor

maxsokolovsky commented Nov 18, 2021

I may have to ascertain the exact reason for this behavior, but looks like, indeed, it has to do with how we parse/unmarshal the given config file.

@maxsokolovsky
Copy link
Contributor

Also, for all keys inside rancher_kubernetes_engine_config or any other object at any depth, like .services.etcd.backup_config, the CLI expects a key to be in camelCase, not snake_case. So it wants backupConfig. This is configured in rancher/rancher, the CLI just uses it. But rancher_kubernetes_engine_config itself should be in camelCase first to be unmarshalled at all. Basically, all keys should be in camelCase.

@maxsokolovsky
Copy link
Contributor

Opened a PR for the change, but a draft currently, in light of the 2.6.3 release. Will mark as ready for review once 2.6.3 is out, targeting 2.6.4.

@deniseschannon deniseschannon added this to the v2.6.4 milestone Dec 1, 2021
@deniseschannon deniseschannon added the team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support label Dec 1, 2021
@deniseschannon deniseschannon modified the milestones: v2.6.4, v2.6.4 - Triaged Dec 1, 2021
@snasovich snasovich self-assigned this Feb 17, 2022
@maxsokolovsky
Copy link
Contributor

QA Testing

Root cause

Improper config deserialization - wrong top-level field was used, so all fields were ignored.

What was fixed, or what changes have occurred

Now the necessary value is used to deserialize the RKE config.

Areas or cases that should be tested

What areas could experience regressions?

Not all fields can be properly handled, unfortunately. Otherwise, there would be a breaking change needed. Some fields need to be in a certain case, so users need to check those.

For example, for all keys inside rancher_kubernetes_engine_config or any other object at any depth, like .services.etcd.backup_config, the CLI expects a key to be in camelCase, not snake_case. So it wants backupConfig. This is configured in rancher/rancher, the CLI just uses it. But rancher_kubernetes_engine_config itself should be in camelCase first to be deserialized at all. Basically, all keys should be in camelCase.

Steps

  1. Provision a Rancher cluster of any recent 2.5 or 2.6 version.
  2. Get an API Key and use it with the Rancher CLI to login to the cluster.
  3. Get the following example RKE config for a new cluster:
default_pod_security_policy_template_id: restricted
docker_root_dir: /apps/docker
enable_cluster_alerting: false
enable_cluster_monitoring: false
enable_network_policy: false
local_cluster_auth_endpoint:
  enabled: true
name: 'test'

rancher_kubernetes_engine_config:
  addon_job_timeout: 30
  authentication:
    strategy: x509
  dns:
    nodelocal:
      ip_address: ''
      node_selector: null
      update_strategy: {}
  ignore_docker_version: true

  ingress:
    provider: nginx
  kubernetes_version: v1.17.5-rancher1-1
  monitoring:
    provider: metrics-server
    replicas: 1

  network:
    mtu: 0
    options:
      flannel_backend_type: vxlan
    plugin: canal
  private_registries:
    - is_default: true
      url: harbor.davcor.co

  services:
    etcd:
      backup_config:
        enabled: true
        interval_hours: 12
        retention: 6
        safe_timestamp: false
      creation: 12h
      extra_args:
        election-timeout: '5000'
        heartbeat-interval: '500'
      gid: 0
      retention: 72h
      snapshot: false
      uid: 0
    kube_api:
      always_pull_images: false
      pod_security_policy: true
      service_node_port_range: 30000-32767
      secrets_encryption_config:
        enabled: true
      event_rate_limit:
        enabled: true
      audit_log:
        enabled: true
    scheduler:
      extra_args:
        profiling: "false"
        address: "127.0.0.1"
    kube_controller:
      extra_args:
        profiling: "false"
        address: "127.0.0.1"
        feature-gates: "RotateKubeletServerCertificate=true"
    kubelet:
      extra_args:
        authorization-mode: "Webhook"
        streaming-connection-idle-timeout: "1800s"
        protect-kernel-defaults: "true"
        make-iptables-util-chains: "true"
        event-qps: "0"
        anonymous-auth: "false"
        feature-gates: "RotateKubeletServerCertificate=true"
        tls-cipher-suites: "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256"
  ssh_agent_auth: false
  upgrade_strategy:
    drain: false
    max_unavailable_controlplane: '1'
    max_unavailable_worker: 10%
    node_drain_input:
      delete_local_data: false
      force: false
      grace_period: -1
      ignore_daemon_sets: true
      timeout: 120
windows_prefered_cluster: false
  1. Use the older version of the Rancher CLI without the fix. Run rancher cluster create test-bad --rke-config cluster.yml.
  2. Now inspect the YAML in the browser for the new pending cluster. Notice that none of the settings is present in the config. Because deserialization failed, there are default keys and values. For example, addon_job_timeout is set to 45 and not 30, as in the config. Note also that services are missing settings for all components: kube-controller, kube-api and others.
  3. Now use the version of the CLI that has the fix: rancher cluster create test-fixed --rke-config cluster.yml.
  4. Inspect the YAML and notice that most, although not all, settings have been applied. Notice how the services have their settings applied.

@timhaneunsoo
Copy link

Test Environment:

Rancher version: v2.6-head e55a04c
Rancher cluster type: HA
Docker version: 20.10

Downstream cluster type: rancher cli


Testing:

Tested this issue with the following steps:

  1. Provision a Rancher cluster
  2. Get an API Key and use it with the Rancher CLI to login to the cluster.
  3. Get the following example RKE config for a new cluster: (Used cluster.yml provided in previous comment)
  4. Use the older version of the Rancher CLI without the fix. Run rancher cluster create test-bad --rke-config cluster.yml.
  5. Now inspect the YAML in the browser for the new pending cluster. Notice that none of the settings is present in the config. Because deserialization failed, there are default keys and values. For example, addon_job_timeout is set to 45 and not 30, as in the config. Note also that services are missing settings for all components: kube-controller, kube-api and others.
  6. Now use the version of the CLI that has the fix: rancher cluster create test-fixed --rke-config cluster.yml.
    Inspect the YAML and notice that most, although not all, settings have been applied. Notice how the services have their settings applied.

Result - Low Pass

After creating the cluster, the YAML was inspected and it seems only backupConfig service had the YAML settings applied. The other settings do not seem to have the settings applied as mentioned in the testing steps. The following settings were not set with the values from the cluster.yml file.

dockerRootDir: /var/lib/docker 
localClusterAuthEndpoint:
    enabled: false 
rancherKubernetesEngineConfig:
    addonJobTimeout: 45
ingress: {}
kubernetesVersion: v1.22.6-rancher1-2
network:
    mtu: 0
    options:
      flannel_backend_type: vxlan
kubeApi: {}
kubeController: {}
kubelet: {}
kubeproxy: {}
scheduler: {}
node_drain_input:
      delete_local_data: false
      force: false
      grace_period: -1
      ignore_daemon_sets: true
      timeout: 120

Below is the full YAML file after creating cluster using rancher v2.6.4-rc1 cli with the configurations given in the previous comment.

apiVersion: management.cattle.io/v3
kind: Cluster
metadata:
  annotations:
    authz.management.cattle.io/creator-role-bindings: '{"created":["cluster-owner"],"required":["cluster-owner"]}'
    field.cattle.io/creatorId: u-gndx8
    lifecycle.cattle.io/create.cluster-agent-controller-cleanup: "true"
    lifecycle.cattle.io/create.cluster-scoped-gc: "true"
    lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: "true"
  creationTimestamp: "2022-02-23T19:52:28Z"
  finalizers:
  - wrangler.cattle.io/mgmt-cluster-remove
  - controller.cattle.io/cluster-agent-controller-cleanup
  - controller.cattle.io/cluster-scoped-gc
  - controller.cattle.io/cluster-provisioner-controller
  - controller.cattle.io/mgmt-cluster-rbac-remove
  generateName: c-
  generation: 13
  labels:
    cattle.io/creator: norman
  managedFields:
  - apiVersion: management.cattle.io/v3
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:field.cattle.io/creatorId: {}
        f:generateName: {}
        f:labels:
          .: {}
          f:cattle.io/creator: {}
    manager: Go-http-client
    operation: Update
    time: "2022-02-23T19:52:28Z"
  - apiVersion: management.cattle.io/v3
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:authz.management.cattle.io/creator-role-bindings: {}
          f:lifecycle.cattle.io/create.cluster-agent-controller-cleanup: {}
          f:lifecycle.cattle.io/create.cluster-scoped-gc: {}
          f:lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: {}
        f:finalizers:
          .: {}
          v:"controller.cattle.io/cluster-agent-controller-cleanup": {}
          v:"controller.cattle.io/cluster-provisioner-controller": {}
          v:"controller.cattle.io/cluster-scoped-gc": {}
          v:"controller.cattle.io/mgmt-cluster-rbac-remove": {}
          v:"wrangler.cattle.io/mgmt-cluster-remove": {}
      f:spec: {}
      f:status: {}
    manager: rancher
    operation: Update
    time: "2022-02-23T19:52:29Z"
  name: c-gbdgg
  resourceVersion: "385297"
  uid: 4f3c326a-4786-4e15-9d62-f2d8291c1619
spec:
  agentImageOverride: ""
  answers: {}
  description: ""
  desiredAgentImage: ""
  desiredAuthImage: ""
  displayName: test-fixed
  dockerRootDir: /var/lib/docker
  enableClusterAlerting: false
  enableClusterMonitoring: false
  enableNetworkPolicy: false
  fleetWorkspaceName: fleet-default
  internal: false
  localClusterAuthEndpoint:
    enabled: false
  rancherKubernetesEngineConfig:
    addonJobTimeout: 45
    authentication:
      strategy: x509
    authorization: {}
    bastionHost: {}
    cloudProvider: {}
    enableCriDockerd: false
    ignoreDockerVersion: true
    ingress: {}
    kubernetesVersion: v1.22.6-rancher1-2
    monitoring: {}
    network:
      plugin: canal
    restore: {}
    rotateEncryptionKey: false
    services:
      etcd:
        backupConfig:
          enabled: true
          intervalHours: 12
          retention: 6
          s3BackupConfig: null
      kubeApi: {}
      kubeController: {}
      kubelet: {}
      kubeproxy: {}
      scheduler: {}
    sshAgentAuth: false
    systemImages: {}
    upgradeStrategy:
      drain: false
      maxUnavailableControlplane: "1"
      maxUnavailableWorker: 10%
  windowsPreferedCluster: false
status:
  agentImage: ""
  aksStatus:
    privateRequiresTunnel: null
    rbacEnabled: null
    upstreamSpec: null
  allocatable:
    cpu: "0"
    memory: "0"
    pods: "0"
  appliedEnableNetworkPolicy: false
  appliedPodSecurityPolicyTemplateId: ""
  appliedSpec:
    agentImageOverride: ""
    answers: {}
    description: ""
    desiredAgentImage: ""
    desiredAuthImage: ""
    displayName: ""
    enableClusterAlerting: false
    enableClusterMonitoring: false
    enableNetworkPolicy: null
    internal: false
    localClusterAuthEndpoint:
      enabled: false
    windowsPreferedCluster: false
  authImage: ""
  capabilities:
    ingressCapabilities:
    - {}
    loadBalancerCapabilities: {}
    nodePortRange: 30000-32767
  capacity:
    cpu: "0"
    memory: "0"
    pods: "0"
  conditions:
  - status: "True"
    type: Pending
  - lastUpdateTime: "2022-02-23T19:52:29Z"
    message: waiting for etcd, controlplane and worker nodes to be registered
    reason: Provisioning
    status: Unknown
    type: Provisioned
  - lastUpdateTime: "2022-02-23T19:52:29Z"
    message: Waiting for API to be available
    status: Unknown
    type: Waiting
  - lastUpdateTime: "2022-02-23T19:52:28Z"
    status: "True"
    type: BackingNamespaceCreated
  - lastUpdateTime: "2022-02-23T19:52:28Z"
    status: "True"
    type: DefaultProjectCreated
  - lastUpdateTime: "2022-02-23T19:52:28Z"
    status: "True"
    type: SystemProjectCreated
  - lastUpdateTime: "2022-02-23T19:52:28Z"
    status: "True"
    type: InitialRolesPopulated
  - lastUpdateTime: "2022-02-23T19:52:28Z"
    status: "True"
    type: CreatorMadeOwner
  - lastUpdateTime: "2022-02-23T19:52:29Z"
    status: "True"
    type: NoDiskPressure
  - lastUpdateTime: "2022-02-23T19:52:29Z"
    status: "True"
    type: NoMemoryPressure
  - lastUpdateTime: "2022-02-23T19:52:29Z"
    status: "True"
    type: SecretsMigrated
  - lastUpdateTime: "2022-02-23T19:52:39Z"
    status: "False"
    type: Connected
  driver: rancherKubernetesEngine
  eksStatus:
    managedLaunchTemplateID: ""
    managedLaunchTemplateVersions: null
    privateRequiresTunnel: null
    securityGroups: null
    subnets: null
    upstreamSpec: null
    virtualNetwork: ""
  gkeStatus:
    privateRequiresTunnel: null
    upstreamSpec: null
  limits:
    cpu: "0"
    memory: "0"
    pods: "0"
  provider: ""
  requested:
    cpu: "0"
    memory: "0"
    pods: "0"

@maxsokolovsky
Copy link
Contributor

@timhaneunsoo, just to make sure - for the config file, did you use the example from my comment or from the issue description?
I just tried on v2.6-head using my example, and the services got deserialized properly. And I tried the example in the description and got the same behavior as you - only backupConfig in etcd got deserialized.

@timhaneunsoo
Copy link

After review, the fix was found in rancher cli v2.6.4-rc2. Upon testing with the updated cli version, the fix is confirmed and testing is Pass.
Here is the correct YAML below:

apiVersion: management.cattle.io/v3
kind: Cluster
metadata:
  annotations:
    authz.management.cattle.io/creator-role-bindings: '{"created":["cluster-owner"],"required":["cluster-owner"]}'
    field.cattle.io/creatorId: u-8fsf6
    lifecycle.cattle.io/create.cluster-agent-controller-cleanup: "true"
    lifecycle.cattle.io/create.cluster-scoped-gc: "true"
    lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: "true"
  creationTimestamp: "2022-02-23T23:00:37Z"
  finalizers:
  - wrangler.cattle.io/mgmt-cluster-remove
  - controller.cattle.io/cluster-agent-controller-cleanup
  - controller.cattle.io/cluster-scoped-gc
  - controller.cattle.io/cluster-provisioner-controller
  - controller.cattle.io/mgmt-cluster-rbac-remove
  generateName: c-
  generation: 14
  labels:
    cattle.io/creator: norman
  managedFields:
  - apiVersion: management.cattle.io/v3
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:field.cattle.io/creatorId: {}
        f:generateName: {}
        f:labels:
          .: {}
          f:cattle.io/creator: {}
    manager: Go-http-client
    operation: Update
    time: "2022-02-23T23:00:37Z"
  - apiVersion: management.cattle.io/v3
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:authz.management.cattle.io/creator-role-bindings: {}
          f:lifecycle.cattle.io/create.cluster-agent-controller-cleanup: {}
          f:lifecycle.cattle.io/create.cluster-scoped-gc: {}
          f:lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: {}
        f:finalizers:
          .: {}
          v:"controller.cattle.io/cluster-agent-controller-cleanup": {}
          v:"controller.cattle.io/cluster-provisioner-controller": {}
          v:"controller.cattle.io/cluster-scoped-gc": {}
          v:"controller.cattle.io/mgmt-cluster-rbac-remove": {}
          v:"wrangler.cattle.io/mgmt-cluster-remove": {}
      f:spec: {}
      f:status: {}
    manager: rancher
    operation: Update
    time: "2022-02-23T23:00:38Z"
  name: c-w664p
  resourceVersion: "6888"
  uid: 7a6b076f-6e0c-4fda-8397-eb58d3da55c9
spec:
  agentImageOverride: ""
  answers: {}
  defaultPodSecurityPolicyTemplateName: restricted
  description: ""
  desiredAgentImage: ""
  desiredAuthImage: ""
  displayName: test-fixed
  dockerRootDir: /apps/docker
  enableClusterAlerting: false
  enableClusterMonitoring: false
  enableNetworkPolicy: false
  fleetWorkspaceName: fleet-default
  internal: false
  localClusterAuthEndpoint:
    enabled: true
  rancherKubernetesEngineConfig:
    addonJobTimeout: 30
    authentication:
      strategy: x509|webhook
    authorization: {}
    bastionHost: {}
    cloudProvider: {}
    dns:
      nodelocal:
        updateStrategy: {}
    enableCriDockerd: false
    ignoreDockerVersion: true
    ingress:
      defaultBackend: true
      defaultIngressClass: true
      provider: nginx
    kubernetesVersion: v1.17.5-rancher1-1
    monitoring:
      provider: metrics-server
      replicas: 1
    network:
      plugin: canal
    privateRegistries:
    - url: harbor.davcor.co
    restore: {}
    rotateEncryptionKey: false
    services:
      etcd:
        backupConfig:
          enabled: true
          intervalHours: 12
          retention: 6
          s3BackupConfig: null
          timeout: 300
        creation: 12h
        extraArgs:
          election-timeout: "5000"
          heartbeat-interval: "500"
        retention: 72h
        snapshot: false
      kubeApi:
        auditLog:
          enabled: true
        eventRateLimit:
          enabled: true
        podSecurityPolicy: true
        secretsEncryptionConfig:
          enabled: true
        serviceNodePortRange: 30000-32767
      kubeController:
        extraArgs:
          address: 127.0.0.1
          feature-gates: RotateKubeletServerCertificate=true
          profiling: "false"
      kubelet:
        extraArgs:
          anonymous-auth: "false"
          authorization-mode: Webhook
          event-qps: "0"
          feature-gates: RotateKubeletServerCertificate=true
          make-iptables-util-chains: "true"
          protect-kernel-defaults: "true"
          streaming-connection-idle-timeout: 1800s
          tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256
      kubeproxy: {}
      scheduler:
        extraArgs:
          address: 127.0.0.1
          profiling: "false"
    sshAgentAuth: false
    systemImages: {}
    upgradeStrategy:
      drain: false
      maxUnavailableControlplane: "1"
      maxUnavailableWorker: 10%
      nodeDrainInput:
        gracePeriod: -1
        ignoreDaemonSets: true
        timeout: 120
  windowsPreferedCluster: false
status:
  agentImage: ""
  aksStatus:
    privateRequiresTunnel: null
    rbacEnabled: null
    upstreamSpec: null
  allocatable:
    cpu: "0"
    memory: "0"
    pods: "0"
  appliedEnableNetworkPolicy: false
  appliedPodSecurityPolicyTemplateId: ""
  appliedSpec:
    agentImageOverride: ""
    answers: {}
    description: ""
    desiredAgentImage: ""
    desiredAuthImage: ""
    displayName: ""
    enableClusterAlerting: false
    enableClusterMonitoring: false
    enableNetworkPolicy: null
    internal: false
    localClusterAuthEndpoint:
      enabled: false
    windowsPreferedCluster: false
  authImage: ""
  capabilities:
    ingressCapabilities:
    - customDefaultBackend: false
      ingressProvider: nginx
    loadBalancerCapabilities: {}
    nodePortRange: 30000-32767
    pspEnabled: true
  capacity:
    cpu: "0"
    memory: "0"
    pods: "0"
  conditions:
  - status: "True"
    type: Pending
  - lastUpdateTime: "2022-02-23T23:00:38Z"
    message: waiting for etcd, controlplane and worker nodes to be registered
    reason: Provisioning
    status: Unknown
    type: Provisioned
  - lastUpdateTime: "2022-02-23T23:00:38Z"
    message: Waiting for API to be available
    status: Unknown
    type: Waiting
  - lastUpdateTime: "2022-02-23T23:00:37Z"
    status: "True"
    type: BackingNamespaceCreated
  - lastUpdateTime: "2022-02-23T23:00:37Z"
    status: "True"
    type: DefaultProjectCreated
  - lastUpdateTime: "2022-02-23T23:00:37Z"
    status: "True"
    type: SystemProjectCreated
  - lastUpdateTime: "2022-02-23T23:00:37Z"
    status: "True"
    type: InitialRolesPopulated
  - lastUpdateTime: "2022-02-23T23:00:37Z"
    status: "True"
    type: CreatorMadeOwner
  - lastUpdateTime: "2022-02-23T23:00:38Z"
    status: "True"
    type: NoDiskPressure
  - lastUpdateTime: "2022-02-23T23:00:38Z"
    status: "True"
    type: NoMemoryPressure
  - lastUpdateTime: "2022-02-23T23:00:38Z"
    status: "True"
    type: SecretsMigrated
  - lastUpdateTime: "2022-02-23T23:00:38Z"
    status: "False"
    type: Connected
  driver: rancherKubernetesEngine
  eksStatus:
    managedLaunchTemplateID: ""
    managedLaunchTemplateVersions: null
    privateRequiresTunnel: null
    securityGroups: null
    subnets: null
    upstreamSpec: null
    virtualNetwork: ""
  gkeStatus:
    privateRequiresTunnel: null
    upstreamSpec: null
  limits:
    cpu: "0"
    memory: "0"
    pods: "0"
  provider: ""
  requested:
    cpu: "0"
    memory: "0"
    pods: "0"

Currently the CLI download found in the Rancher UI v2.6-head is v2.6.4-rc1 so I ran into the error while testing. The CLI download will be updated to the latest version before the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cli internal kind/bug Issues that are defects reported by users or that we know have reached a real release QA/S team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support
Projects
None yet
Development

No branches or pull requests