[cli] Loading rke config doesn't work #25416

mickaeldecastro · 2020-02-14T11:00:13Z

What kind of request is this (question/bug/enhancement/feature request):
We try to create cluster using rancher cli. When we load a rke configuration with cloud_provider vsphere the virtual_center is not set.

Steps to reproduce (least amount of steps as possible):
Generate RKE configuration with rke binary
Create a cluster using rancher cli with --rke-config option

# rke config --name cluster.yml
# rancher cluster create test --rke-config cluster.yml


- cluster.yml

# If you intened to deploy Kubernetes in an air-gapped environment,
# please consult the documentation on how to configure custom RKE images.
nodes:
- address: "node1"
  port: "22"
  internal_address: ""
  role:
  - controlplane
  - etcd
  hostname_override: master
  user: ubuntu
  #docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: "node2"
  port: "22"
  internal_address: ""
  role:
  - worker
  hostname_override: worker-1
  user: ubuntu
  #docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: "node3"
  port: "22"
  internal_address: ""
  role:
  - worker
  hostname_override: worker-2
  user: '`'
  #docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: "node4"
  port: "22"
  internal_address: ""
  role:
  - worker
  hostname_override: worker-3
  user: ubuntu
  #docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
services:
  etcd:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    external_urls: []
    ca_cert: ""
    cert: ""
    key: ""
    path: ""
    uid: 0
    gid: 0
    snapshot: null
    retention: ""
    creation: ""
    backup_config: null
  kube-api:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    service_cluster_ip_range: 10.43.0.0/16
    service_node_port_range: ""
    pod_security_policy: false
    always_pull_images: false
    secrets_encryption_config: null
    audit_log: null
    admission_configuration: null
    event_rate_limit: null
  kube-controller:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    cluster_cidr: 10.42.0.0/16
    service_cluster_ip_range: 10.43.0.0/16
  scheduler:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
  kubelet:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    cluster_domain: test-cluster
    infra_container_image: ""
    cluster_dns_server: 10.43.0.10
    fail_swap_on: false
    generate_serving_certificate: false
  kubeproxy:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
network:
  plugin: flannel
  options: 
    flannel_backend_type: vxlan
  mtu: 0
  node_selector: {}
authentication:
  strategy: x509
  sans: []
  webhook: null
addons: ""
addons_include: []
system_images:
  etcd: rancher/coreos-etcd:v3.4.3-rancher1
  alpine: rancher/rke-tools:v0.1.52
  nginx_proxy: rancher/rke-tools:v0.1.52
  cert_downloader: rancher/rke-tools:v0.1.52
  kubernetes_services_sidecar: rancher/rke-tools:v0.1.52
  kubedns: rancher/k8s-dns-kube-dns:1.15.0
  dnsmasq: rancher/k8s-dns-dnsmasq-nanny:1.15.0
  kubedns_sidecar: rancher/k8s-dns-sidecar:1.15.0
  kubedns_autoscaler: rancher/cluster-proportional-autoscaler:1.7.1
  coredns: rancher/coredns-coredns:1.6.5
  coredns_autoscaler: rancher/cluster-proportional-autoscaler:1.7.1
  kubernetes: rancher/hyperkube:v1.17.2-rancher1
  flannel: rancher/coreos-flannel:v0.11.0-rancher1
  flannel_cni: rancher/flannel-cni:v0.3.0-rancher5
  calico_node: rancher/calico-node:v3.10.2
  calico_cni: rancher/calico-cni:v3.10.2
  calico_controllers: rancher/calico-kube-controllers:v3.10.2
  calico_ctl: rancher/calico-ctl:v2.0.0
  calico_flexvol: rancher/calico-pod2daemon-flexvol:v3.10.2
  canal_node: rancher/calico-node:v3.10.2
  canal_cni: rancher/calico-cni:v3.10.2
  canal_flannel: rancher/coreos-flannel:v0.11.0
  canal_flexvol: rancher/calico-pod2daemon-flexvol:v3.10.2
  weave_node: weaveworks/weave-kube:2.5.2
  weave_cni: weaveworks/weave-npc:2.5.2
  pod_infra_container: rancher/pause:3.1
  ingress: rancher/nginx-ingress-controller:nginx-0.25.1-rancher1
  ingress_backend: rancher/nginx-ingress-controller-defaultbackend:1.5-rancher1
  metrics_server: rancher/metrics-server:v0.3.6
  windows_pod_infra_container: rancher/kubelet-pause:v0.1.3
ssh_key_path: ~/.ssh/id_rsa
ssh_cert_path: ""
ssh_agent_auth: false
authorization:
  mode: rbac
  options: {}
ignore_docker_version: false
kubernetes_version: "v1.17.2-rancher1-2"
private_registries: []
ingress:
  provider: "nginx"
  options: {}
  node_selector: {}
  extra_args: {}
  dns_policy: ""
  extra_envs: []
  extra_volumes: []
  extra_volume_mounts: []
cluster_name: ""
cloud_provider:
  name: vsphere
  vsphereCloudProvider:
    global:
      insecure-flag: true
    virtual_center:
      IP_VCENTER:
        user: user
        password: password
        datacenters: LOCAL
        port: 443
    workspace:
      server: IP_VCENTER
      datacenter: LOCAL
      folder: kube
      default-datastore: Kubernetes
    disk:
      scsicontrollertype: pvscsi
prefix_path: ""
addon_job_timeout: 0
bastion_host:
  address: ""
  port: ""
  user: ""
  ssh_key: ""
  ssh_key_path: ""
  ssh_cert: ""
  ssh_cert_path: ""
monitoring:
  provider: metrics-server
  options: {}
  node_selector: {}
restore:
  restore: false
  snapshot_name: ""
dns: null


```

**Result:**
Cluster is created but cloud_provider doesn't contain virtual_center configuration. 

```
Version: v3
clusters:
  test:
    answers: {}
    dockerRootDir: /var/lib/docker
    enableNetworkPolicy: false
    localClusterAuthEndpoint: {}
    rancherKubernetesEngineConfig:
      addonJobTimeout: 30
      authentication:
        strategy: x509
      authorization:
        mode: rbac
      bastionHost: {}
      cloudProvider:
        name: vsphere
        vsphereCloudProvider:
          disk:
            scsicontrollertype: pvscsi
          global:
            insecure-flag: true
          network: {}
          workspace:
            datacenter: LOCAL
            default-datastore: Kubernetes
            folder: kube
            server: IP_VCENTER
      ignoreDockerVersion: true
      ingress:
        provider: nginx
      kubernetesVersion: v1.17.2-rancher1-2
      monitoring:
        provider: metrics-server
      network:
        plugin: canal
      nodes:
      - address: node1
        port: "22"
        role:
        - controlplane
        - etcd
        user: ubuntu
      - address: node2
        port: "22"
        role:
        - worker
        user: ubuntu
      - address: node3
        port: "22"
        role:
        - worker
        user: '`'
      - address: node4
        port: "22"
        role:
        - worker
        user: ubuntu
      restore: {}
      services:
        etcd:
          backupConfig:
            enabled: true
            intervalHours: 12
            retention: 6
          creation: 12h
          extraArgs:
            election-timeout: "5000"
            heartbeat-interval: "500"
          retention: 72h
          snapshot: false
        kubeApi: {}
        kubeController: {}
        kubelet: {}
        kubeproxy: {}
        scheduler: {}
```
**Other details that may be helpful:**

Rancher Log
```
2020/02/14 10:01:07 http: TLS handshake error from 127.0.0.1:51690: EOF
2020/02/14 10:01:13 [INFO] Deleting cluster [c-ddtfm]
2020/02/14 10:01:18 [INFO] kontainerdriver rancherkubernetesengine listening on address 127.0.0.1:37387
2020/02/14 10:01:18 [INFO] kontainerdriver rancherkubernetesengine stopped
2020/02/14 10:01:18 [INFO] Deleted cluster [c-ddtfm]
2020/02/14 10:01:18 [INFO] [mgmt-cluster-rbac-remove] Deleting namespace c-ddtfm
2020/02/14 10:01:18 [ERROR] ClusterController c-ddtfm [mgmt-cluster-rbac-remove] failed with : clusters.management.cattle.io "c-ddtfm" not found
2020/02/14 10:01:23 [INFO] [mgmt-project-rbac-remove] Deleting namespace p-2t8th
2020/02/14 10:01:23 [INFO] [mgmt-project-rbac-remove] Deleting namespace p-j7pxl
E0214 10:01:24.045505      58 tokens_controller.go:261] error synchronizing serviceaccount c-ddtfm/default: secrets "default-token-pzlw5" is forbidden: unable to create new content in namespace c-ddtfm because it is being terminated
E0214 10:01:24.055308      58 tokens_controller.go:261] error synchronizing serviceaccount c-ddtfm/default: secrets "default-token-s6cd7" is forbidden: unable to create new content in namespace c-ddtfm because it is being terminated
2020/02/14 10:01:24 [INFO] [mgmt-auth-crtb-controller] Deleting roleBinding clusterrolebinding-86sdz
E0214 10:01:24.070128      58 tokens_controller.go:261] error synchronizing serviceaccount c-ddtfm/default: secrets "default-token-hb65d" is forbidden: unable to create new content in namespace c-ddtfm because it is being terminated
E0214 10:01:24.094973      58 tokens_controller.go:261] error synchronizing serviceaccount c-ddtfm/default: secrets "default-token-pbfmz" is forbidden: unable to create new content in namespace c-ddtfm because it is being terminated
2020/02/14 10:01:26 [INFO] [mgmt-cluster-rbac-delete] Creating namespace c-55mkz
2020/02/14 10:01:26 [INFO] [mgmt-cluster-rbac-delete] Creating Default project for cluster c-55mkz
2020/02/14 10:01:26 [INFO] [mgmt-project-rbac-create] Creating namespace p-c4zqs
2020/02/14 10:01:26 [INFO] [mgmt-cluster-rbac-delete] Creating System project for cluster c-55mkz
2020/02/14 10:01:26 [INFO] [mgmt-project-rbac-create] Creating creator projectRoleTemplateBinding for user user-xmjsf for project p-c4zqs
2020/02/14 10:01:26 [INFO] [mgmt-project-rbac-create] Creating namespace p-djx6b
2020/02/14 10:01:26 [INFO] [mgmt-cluster-rbac-delete] Updating cluster c-55mkz
2020/02/14 10:01:26 [INFO] [mgmt-project-rbac-create] Creating creator projectRoleTemplateBinding for user user-xmjsf for project p-djx6b
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Setting InitialRolesPopulated condition on project p-c4zqs
2020/02/14 10:01:26 [INFO] [mgmt-project-rbac-create] Creating creator clusterRoleTemplateBinding for user user-xmjsf for cluster c-55mkz
2020/02/14 10:01:26 [INFO] [mgmt-project-rbac-create] Updating project p-c4zqs
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating clusterRole p-c4zqs-projectowner
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Setting InitialRolesPopulated condition on project p-djx6b
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating clusterRole p-djx6b-projectowner
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Creating clusterRole c-55mkz-clusterowner
2020/02/14 10:01:26 [INFO] [mgmt-project-rbac-create] Updating project p-djx6b
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating roleBinding for membership in project p-c4zqs for subject user-xmjsf
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Creating clusterRoleBinding for membership in cluster c-55mkz for subject user-xmjsf
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating clusterRole c-55mkz-clustermember
2020/02/14 10:01:26 [INFO] [mgmt-project-rbac-create] Updating project p-c4zqs
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating roleBinding for membership in project p-djx6b for subject user-xmjsf
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Setting InitialRolesPopulated condition on cluster
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Creating role cluster-owner in namespace c-55mkz
2020/02/14 10:01:26 [INFO] [mgmt-cluster-rbac-delete] Updating cluster c-55mkz
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating clusterRoleBinding for membership in cluster c-55mkz for subject user-xmjsf
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating clusterRole c-55mkz-clustermember
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Creating roleBinding for subject user-xmjsf with role cluster-owner in namespace
2020/02/14 10:01:26 [INFO] [mgmt-project-rbac-create] Updating project p-djx6b
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating role project-owner in namespace c-55mkz
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Creating role cluster-owner in namespace p-djx6b
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating roleBinding for subject user-xmjsf with role project-owner in namespace
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Creating roleBinding for subject user-xmjsf with role cluster-owner in namespace
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating role project-owner in namespace p-c4zqs
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Creating role cluster-owner in namespace p-c4zqs
2020/02/14 10:01:26 [ERROR] ProjectRoleTemplateBindingController p-djx6b/creator-project-owner [mgmt-auth-prtb-controller] failed with : clusterroles.rbac.authorization.k8s.io "c-55mkz-clustermember" already exists
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Updating clusterRoleBinding clusterrolebinding-hr5hn for cluster membership in cluster c-55mkz for subject user-xmjsf
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating role admin in namespace p-c4zqs
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating roleBinding for subject user-xmjsf with role project-owner in namespace
2020/02/14 10:01:26 [INFO] [mgmt-auth-crtb-controller] Creating roleBinding for subject user-xmjsf with role cluster-owner in namespace
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating roleBinding for subject user-xmjsf with role project-owner in namespace
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating role project-owner in namespace p-djx6b
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating role admin in namespace p-djx6b
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating roleBinding for subject user-xmjsf with role admin in namespace
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating roleBinding for subject user-xmjsf with role project-owner in namespace
2020/02/14 10:01:26 [INFO] [mgmt-auth-prtb-controller] Creating roleBinding for subject user-xmjsf with role admin in namespace
2020/02/14 10:01:26 [INFO] [mgmt-cluster-rbac-delete] Updating cluster c-55mkz
2020/02/14 10:01:28 http: TLS handshake error from 127.0.0.1:51750: EOF
E0214 10:01:29.067006      58 tokens_controller.go:261] error synchronizing serviceaccount p-j7pxl/default: secrets "default-token-29m9d" is forbidden: unable to create new content in namespace p-j7pxl because it is being terminated
2020/02/14 10:01:29 [INFO] [mgmt-auth-prtb-controller] Updating owner label for roleBinding clusterrolebinding-kt7vb
E0214 10:01:29.077595      58 tokens_controller.go:261] error synchronizing serviceaccount p-j7pxl/default: secrets "default-token-lz2wg" is forbidden: unable to create new content in namespace p-j7pxl because it is being terminated
2020/02/14 10:01:29 [INFO] [mgmt-auth-prtb-controller] Deleting roleBinding clusterrolebinding-kt7vb
E0214 10:01:29.092664      58 tokens_controller.go:261] error synchronizing serviceaccount p-j7pxl/default: secrets "default-token-kngk5" is forbidden: unable to create new content in namespace p-j7pxl because it is being terminated
E0214 10:01:29.117356      58 tokens_controller.go:261] error synchronizing serviceaccount p-j7pxl/default: secrets "default-token-m6mn4" is forbidden: unable to create new content in namespace p-j7pxl because it is being terminated
2020/02/14 10:01:31 http: TLS handshake error from 127.0.0.1:51760: EOF
2020/02/14 10:01:39 http: TLS handshake error from 127.0.0.1:51848: EOF
2020/02/14 10:01:39 http: TLS handshake error from 127.0.0.1:51852: EOF
2020/02/14 10:01:39 http: TLS handshake error from 127.0.0.1:51850: EOF
2020/02/14 10:01:39 http: TLS handshake error from 127.0.0.1:51854: EOF
2020/02/14 10:01:39 http: TLS handshake error from 127.0.0.1:51842: EOF
2020/02/14 10:01:56 http: TLS handshake error from 127.0.0.1:51862: EOF
2020/02/14 10:01:56 http: TLS handshake error from 127.0.0.1:51864: EOF

```

**Environment information**
- Rancher version (`rancher/rancher`/`rancher/server` image tag or shown bottom left in the UI): rancher  v2.3.5 
User Interface	v2.3.36
Helm	.14.3-rancher1
Machine	v0.15.0-rancher29


<!--
If the reported issue is regarding a created cluster, please provide requested info below
-->

**Cluster information**
- Cluster type (Hosted/Infrastructure Provider/Custom/Imported): vSphere
- Machine type (cloud/VM/metal) and specifications (CPU/memory): VM

gz#15447
SURE-2820

The text was updated successfully, but these errors were encountered:

janeczku · 2020-04-09T16:16:47Z

Confirmed in Rancher 2.4.3 with Rancher CLI 2.4.3-rc1

Replication

Runrancher cluster create --rke-config rke.yaml using the following rke.yaml content:

services:
  etcd:
    backup_config:
      enabled: true
      interval_hours: 44
      retention: 44

  kubelet:
    extra_binds:
      - '/var/lib/kubelet/plugins_registry:/var/lib/kubelet/plugins_registry'

Check cluster settings in API

Result: User specified RKE config is not applied

Workaround

Change all key names from snake to camel case, e.g.:

services:
  etcd:
    backupConfig:
      enabled: true
      intervalHours: 3
      retention: 32

  kubelet:
    extraBinds:
      - '/var/lib/kubelet/plugins_registry:/var/lib/kubelet/plugins_registry'

zaggash · 2021-03-03T16:56:12Z

Duplicate of #27982

gz#15447

zaggash · 2021-03-03T17:04:01Z

The Workaround is not helping, some keys are not working at all.
If you try to change the network plugin to calico

network:
  plugin: calico

It never changes and stays by default.

betweenclouds · 2021-06-10T13:51:41Z

@zaggash did you tried:
--network-provider value Network provider for the cluster (flannel, canal, calico) (default: "canal")

maxsokolovsky · 2021-11-18T21:00:32Z

Reproduced with CLI v2.4.13. This is definitely a CLI issue and has to do with how we parse top-level keys in the RKE config file.

I don't yet know why (will inquire), but we convert all top-level keys to JSON format. For example, rancher_kubernetes_engine_config becomes rancherKubernetesEngineConfig. As a result, the original key, due to the modification, is not found by the CLI. It's as if this value is not initialized, and the CLI uses default values for it. The same is true for other top-level objects whose key are modified by the CLI before actually being send to Rancher server.

maxsokolovsky · 2021-11-18T21:26:30Z

I may have to ascertain the exact reason for this behavior, but looks like, indeed, it has to do with how we parse/unmarshal the given config file.

maxsokolovsky · 2021-11-18T22:26:22Z

Also, for all keys inside rancher_kubernetes_engine_config or any other object at any depth, like .services.etcd.backup_config, the CLI expects a key to be in camelCase, not snake_case. So it wants backupConfig. This is configured in rancher/rancher, the CLI just uses it. But rancher_kubernetes_engine_config itself should be in camelCase first to be unmarshalled at all. Basically, all keys should be in camelCase.

maxsokolovsky · 2021-11-24T18:29:03Z

Opened a PR for the change, but a draft currently, in light of the 2.6.3 release. Will mark as ready for review once 2.6.3 is out, targeting 2.6.4.

maxsokolovsky · 2022-02-18T21:51:33Z

QA Testing

Root cause

Improper config deserialization - wrong top-level field was used, so all fields were ignored.

What was fixed, or what changes have occurred

Now the necessary value is used to deserialize the RKE config.

Areas or cases that should be tested

What areas could experience regressions?

Not all fields can be properly handled, unfortunately. Otherwise, there would be a breaking change needed. Some fields need to be in a certain case, so users need to check those.

For example, for all keys inside rancher_kubernetes_engine_config or any other object at any depth, like .services.etcd.backup_config, the CLI expects a key to be in camelCase, not snake_case. So it wants backupConfig. This is configured in rancher/rancher, the CLI just uses it. But rancher_kubernetes_engine_config itself should be in camelCase first to be deserialized at all. Basically, all keys should be in camelCase.

Steps

Provision a Rancher cluster of any recent 2.5 or 2.6 version.
Get an API Key and use it with the Rancher CLI to login to the cluster.
Get the following example RKE config for a new cluster:

default_pod_security_policy_template_id: restricted
docker_root_dir: /apps/docker
enable_cluster_alerting: false
enable_cluster_monitoring: false
enable_network_policy: false
local_cluster_auth_endpoint:
  enabled: true
name: 'test'

rancher_kubernetes_engine_config:
  addon_job_timeout: 30
  authentication:
    strategy: x509
  dns:
    nodelocal:
      ip_address: ''
      node_selector: null
      update_strategy: {}
  ignore_docker_version: true

  ingress:
    provider: nginx
  kubernetes_version: v1.17.5-rancher1-1
  monitoring:
    provider: metrics-server
    replicas: 1

  network:
    mtu: 0
    options:
      flannel_backend_type: vxlan
    plugin: canal
  private_registries:
    - is_default: true
      url: harbor.davcor.co

  services:
    etcd:
      backup_config:
        enabled: true
        interval_hours: 12
        retention: 6
        safe_timestamp: false
      creation: 12h
      extra_args:
        election-timeout: '5000'
        heartbeat-interval: '500'
      gid: 0
      retention: 72h
      snapshot: false
      uid: 0
    kube_api:
      always_pull_images: false
      pod_security_policy: true
      service_node_port_range: 30000-32767
      secrets_encryption_config:
        enabled: true
      event_rate_limit:
        enabled: true
      audit_log:
        enabled: true
    scheduler:
      extra_args:
        profiling: "false"
        address: "127.0.0.1"
    kube_controller:
      extra_args:
        profiling: "false"
        address: "127.0.0.1"
        feature-gates: "RotateKubeletServerCertificate=true"
    kubelet:
      extra_args:
        authorization-mode: "Webhook"
        streaming-connection-idle-timeout: "1800s"
        protect-kernel-defaults: "true"
        make-iptables-util-chains: "true"
        event-qps: "0"
        anonymous-auth: "false"
        feature-gates: "RotateKubeletServerCertificate=true"
        tls-cipher-suites: "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256"
  ssh_agent_auth: false
  upgrade_strategy:
    drain: false
    max_unavailable_controlplane: '1'
    max_unavailable_worker: 10%
    node_drain_input:
      delete_local_data: false
      force: false
      grace_period: -1
      ignore_daemon_sets: true
      timeout: 120
windows_prefered_cluster: false

Use the older version of the Rancher CLI without the fix. Run rancher cluster create test-bad --rke-config cluster.yml.
Now inspect the YAML in the browser for the new pending cluster. Notice that none of the settings is present in the config. Because deserialization failed, there are default keys and values. For example, addon_job_timeout is set to 45 and not 30, as in the config. Note also that services are missing settings for all components: kube-controller, kube-api and others.
Now use the version of the CLI that has the fix: rancher cluster create test-fixed --rke-config cluster.yml.
Inspect the YAML and notice that most, although not all, settings have been applied. Notice how the services have their settings applied.

timhaneunsoo · 2022-02-23T21:08:38Z

Test Environment:

Rancher version: v2.6-head e55a04c
Rancher cluster type: HA
Docker version: 20.10

Downstream cluster type: rancher cli

Testing:

Tested this issue with the following steps:

Provision a Rancher cluster
Get an API Key and use it with the Rancher CLI to login to the cluster.
Get the following example RKE config for a new cluster: (Used cluster.yml provided in previous comment)
Use the older version of the Rancher CLI without the fix. Run rancher cluster create test-bad --rke-config cluster.yml.
Now inspect the YAML in the browser for the new pending cluster. Notice that none of the settings is present in the config. Because deserialization failed, there are default keys and values. For example, addon_job_timeout is set to 45 and not 30, as in the config. Note also that services are missing settings for all components: kube-controller, kube-api and others.
Now use the version of the CLI that has the fix: rancher cluster create test-fixed --rke-config cluster.yml.
Inspect the YAML and notice that most, although not all, settings have been applied. Notice how the services have their settings applied.

Result - Low Pass

After creating the cluster, the YAML was inspected and it seems only backupConfig service had the YAML settings applied. The other settings do not seem to have the settings applied as mentioned in the testing steps. The following settings were not set with the values from the cluster.yml file.

dockerRootDir: /var/lib/docker 
localClusterAuthEndpoint:
    enabled: false 
rancherKubernetesEngineConfig:
    addonJobTimeout: 45
ingress: {}
kubernetesVersion: v1.22.6-rancher1-2
network:
    mtu: 0
    options:
      flannel_backend_type: vxlan
kubeApi: {}
kubeController: {}
kubelet: {}
kubeproxy: {}
scheduler: {}
node_drain_input:
      delete_local_data: false
      force: false
      grace_period: -1
      ignore_daemon_sets: true
      timeout: 120

Below is the full YAML file after creating cluster using rancher v2.6.4-rc1 cli with the configurations given in the previous comment.

apiVersion: management.cattle.io/v3
kind: Cluster
metadata:
  annotations:
    authz.management.cattle.io/creator-role-bindings: '{"created":["cluster-owner"],"required":["cluster-owner"]}'
    field.cattle.io/creatorId: u-gndx8
    lifecycle.cattle.io/create.cluster-agent-controller-cleanup: "true"
    lifecycle.cattle.io/create.cluster-scoped-gc: "true"
    lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: "true"
  creationTimestamp: "2022-02-23T19:52:28Z"
  finalizers:
  - wrangler.cattle.io/mgmt-cluster-remove
  - controller.cattle.io/cluster-agent-controller-cleanup
  - controller.cattle.io/cluster-scoped-gc
  - controller.cattle.io/cluster-provisioner-controller
  - controller.cattle.io/mgmt-cluster-rbac-remove
  generateName: c-
  generation: 13
  labels:
    cattle.io/creator: norman
  managedFields:
  - apiVersion: management.cattle.io/v3
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:field.cattle.io/creatorId: {}
        f:generateName: {}
        f:labels:
          .: {}
          f:cattle.io/creator: {}
    manager: Go-http-client
    operation: Update
    time: "2022-02-23T19:52:28Z"
  - apiVersion: management.cattle.io/v3
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:authz.management.cattle.io/creator-role-bindings: {}
          f:lifecycle.cattle.io/create.cluster-agent-controller-cleanup: {}
          f:lifecycle.cattle.io/create.cluster-scoped-gc: {}
          f:lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: {}
        f:finalizers:
          .: {}
          v:"controller.cattle.io/cluster-agent-controller-cleanup": {}
          v:"controller.cattle.io/cluster-provisioner-controller": {}
          v:"controller.cattle.io/cluster-scoped-gc": {}
          v:"controller.cattle.io/mgmt-cluster-rbac-remove": {}
          v:"wrangler.cattle.io/mgmt-cluster-remove": {}
      f:spec: {}
      f:status: {}
    manager: rancher
    operation: Update
    time: "2022-02-23T19:52:29Z"
  name: c-gbdgg
  resourceVersion: "385297"
  uid: 4f3c326a-4786-4e15-9d62-f2d8291c1619
spec:
  agentImageOverride: ""
  answers: {}
  description: ""
  desiredAgentImage: ""
  desiredAuthImage: ""
  displayName: test-fixed
  dockerRootDir: /var/lib/docker
  enableClusterAlerting: false
  enableClusterMonitoring: false
  enableNetworkPolicy: false
  fleetWorkspaceName: fleet-default
  internal: false
  localClusterAuthEndpoint:
    enabled: false
  rancherKubernetesEngineConfig:
    addonJobTimeout: 45
    authentication:
      strategy: x509
    authorization: {}
    bastionHost: {}
    cloudProvider: {}
    enableCriDockerd: false
    ignoreDockerVersion: true
    ingress: {}
    kubernetesVersion: v1.22.6-rancher1-2
    monitoring: {}
    network:
      plugin: canal
    restore: {}
    rotateEncryptionKey: false
    services:
      etcd:
        backupConfig:
          enabled: true
          intervalHours: 12
          retention: 6
          s3BackupConfig: null
      kubeApi: {}
      kubeController: {}
      kubelet: {}
      kubeproxy: {}
      scheduler: {}
    sshAgentAuth: false
    systemImages: {}
    upgradeStrategy:
      drain: false
      maxUnavailableControlplane: "1"
      maxUnavailableWorker: 10%
  windowsPreferedCluster: false
status:
  agentImage: ""
  aksStatus:
    privateRequiresTunnel: null
    rbacEnabled: null
    upstreamSpec: null
  allocatable:
    cpu: "0"
    memory: "0"
    pods: "0"
  appliedEnableNetworkPolicy: false
  appliedPodSecurityPolicyTemplateId: ""
  appliedSpec:
    agentImageOverride: ""
    answers: {}
    description: ""
    desiredAgentImage: ""
    desiredAuthImage: ""
    displayName: ""
    enableClusterAlerting: false
    enableClusterMonitoring: false
    enableNetworkPolicy: null
    internal: false
    localClusterAuthEndpoint:
      enabled: false
    windowsPreferedCluster: false
  authImage: ""
  capabilities:
    ingressCapabilities:
    - {}
    loadBalancerCapabilities: {}
    nodePortRange: 30000-32767
  capacity:
    cpu: "0"
    memory: "0"
    pods: "0"
  conditions:
  - status: "True"
    type: Pending
  - lastUpdateTime: "2022-02-23T19:52:29Z"
    message: waiting for etcd, controlplane and worker nodes to be registered
    reason: Provisioning
    status: Unknown
    type: Provisioned
  - lastUpdateTime: "2022-02-23T19:52:29Z"
    message: Waiting for API to be available
    status: Unknown
    type: Waiting
  - lastUpdateTime: "2022-02-23T19:52:28Z"
    status: "True"
    type: BackingNamespaceCreated
  - lastUpdateTime: "2022-02-23T19:52:28Z"
    status: "True"
    type: DefaultProjectCreated
  - lastUpdateTime: "2022-02-23T19:52:28Z"
    status: "True"
    type: SystemProjectCreated
  - lastUpdateTime: "2022-02-23T19:52:28Z"
    status: "True"
    type: InitialRolesPopulated
  - lastUpdateTime: "2022-02-23T19:52:28Z"
    status: "True"
    type: CreatorMadeOwner
  - lastUpdateTime: "2022-02-23T19:52:29Z"
    status: "True"
    type: NoDiskPressure
  - lastUpdateTime: "2022-02-23T19:52:29Z"
    status: "True"
    type: NoMemoryPressure
  - lastUpdateTime: "2022-02-23T19:52:29Z"
    status: "True"
    type: SecretsMigrated
  - lastUpdateTime: "2022-02-23T19:52:39Z"
    status: "False"
    type: Connected
  driver: rancherKubernetesEngine
  eksStatus:
    managedLaunchTemplateID: ""
    managedLaunchTemplateVersions: null
    privateRequiresTunnel: null
    securityGroups: null
    subnets: null
    upstreamSpec: null
    virtualNetwork: ""
  gkeStatus:
    privateRequiresTunnel: null
    upstreamSpec: null
  limits:
    cpu: "0"
    memory: "0"
    pods: "0"
  provider: ""
  requested:
    cpu: "0"
    memory: "0"
    pods: "0"

maxsokolovsky · 2022-02-23T21:41:26Z

@timhaneunsoo, just to make sure - for the config file, did you use the example from my comment or from the issue description?
I just tried on v2.6-head using my example, and the services got deserialized properly. And I tried the example in the description and got the same behavior as you - only backupConfig in etcd got deserialized.

timhaneunsoo · 2022-02-23T23:13:49Z

After review, the fix was found in rancher cli v2.6.4-rc2. Upon testing with the updated cli version, the fix is confirmed and testing is Pass.
Here is the correct YAML below:

apiVersion: management.cattle.io/v3
kind: Cluster
metadata:
  annotations:
    authz.management.cattle.io/creator-role-bindings: '{"created":["cluster-owner"],"required":["cluster-owner"]}'
    field.cattle.io/creatorId: u-8fsf6
    lifecycle.cattle.io/create.cluster-agent-controller-cleanup: "true"
    lifecycle.cattle.io/create.cluster-scoped-gc: "true"
    lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: "true"
  creationTimestamp: "2022-02-23T23:00:37Z"
  finalizers:
  - wrangler.cattle.io/mgmt-cluster-remove
  - controller.cattle.io/cluster-agent-controller-cleanup
  - controller.cattle.io/cluster-scoped-gc
  - controller.cattle.io/cluster-provisioner-controller
  - controller.cattle.io/mgmt-cluster-rbac-remove
  generateName: c-
  generation: 14
  labels:
    cattle.io/creator: norman
  managedFields:
  - apiVersion: management.cattle.io/v3
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:field.cattle.io/creatorId: {}
        f:generateName: {}
        f:labels:
          .: {}
          f:cattle.io/creator: {}
    manager: Go-http-client
    operation: Update
    time: "2022-02-23T23:00:37Z"
  - apiVersion: management.cattle.io/v3
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:authz.management.cattle.io/creator-role-bindings: {}
          f:lifecycle.cattle.io/create.cluster-agent-controller-cleanup: {}
          f:lifecycle.cattle.io/create.cluster-scoped-gc: {}
          f:lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: {}
        f:finalizers:
          .: {}
          v:"controller.cattle.io/cluster-agent-controller-cleanup": {}
          v:"controller.cattle.io/cluster-provisioner-controller": {}
          v:"controller.cattle.io/cluster-scoped-gc": {}
          v:"controller.cattle.io/mgmt-cluster-rbac-remove": {}
          v:"wrangler.cattle.io/mgmt-cluster-remove": {}
      f:spec: {}
      f:status: {}
    manager: rancher
    operation: Update
    time: "2022-02-23T23:00:38Z"
  name: c-w664p
  resourceVersion: "6888"
  uid: 7a6b076f-6e0c-4fda-8397-eb58d3da55c9
spec:
  agentImageOverride: ""
  answers: {}
  defaultPodSecurityPolicyTemplateName: restricted
  description: ""
  desiredAgentImage: ""
  desiredAuthImage: ""
  displayName: test-fixed
  dockerRootDir: /apps/docker
  enableClusterAlerting: false
  enableClusterMonitoring: false
  enableNetworkPolicy: false
  fleetWorkspaceName: fleet-default
  internal: false
  localClusterAuthEndpoint:
    enabled: true
  rancherKubernetesEngineConfig:
    addonJobTimeout: 30
    authentication:
      strategy: x509|webhook
    authorization: {}
    bastionHost: {}
    cloudProvider: {}
    dns:
      nodelocal:
        updateStrategy: {}
    enableCriDockerd: false
    ignoreDockerVersion: true
    ingress:
      defaultBackend: true
      defaultIngressClass: true
      provider: nginx
    kubernetesVersion: v1.17.5-rancher1-1
    monitoring:
      provider: metrics-server
      replicas: 1
    network:
      plugin: canal
    privateRegistries:
    - url: harbor.davcor.co
    restore: {}
    rotateEncryptionKey: false
    services:
      etcd:
        backupConfig:
          enabled: true
          intervalHours: 12
          retention: 6
          s3BackupConfig: null
          timeout: 300
        creation: 12h
        extraArgs:
          election-timeout: "5000"
          heartbeat-interval: "500"
        retention: 72h
        snapshot: false
      kubeApi:
        auditLog:
          enabled: true
        eventRateLimit:
          enabled: true
        podSecurityPolicy: true
        secretsEncryptionConfig:
          enabled: true
        serviceNodePortRange: 30000-32767
      kubeController:
        extraArgs:
          address: 127.0.0.1
          feature-gates: RotateKubeletServerCertificate=true
          profiling: "false"
      kubelet:
        extraArgs:
          anonymous-auth: "false"
          authorization-mode: Webhook
          event-qps: "0"
          feature-gates: RotateKubeletServerCertificate=true
          make-iptables-util-chains: "true"
          protect-kernel-defaults: "true"
          streaming-connection-idle-timeout: 1800s
          tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256
      kubeproxy: {}
      scheduler:
        extraArgs:
          address: 127.0.0.1
          profiling: "false"
    sshAgentAuth: false
    systemImages: {}
    upgradeStrategy:
      drain: false
      maxUnavailableControlplane: "1"
      maxUnavailableWorker: 10%
      nodeDrainInput:
        gracePeriod: -1
        ignoreDaemonSets: true
        timeout: 120
  windowsPreferedCluster: false
status:
  agentImage: ""
  aksStatus:
    privateRequiresTunnel: null
    rbacEnabled: null
    upstreamSpec: null
  allocatable:
    cpu: "0"
    memory: "0"
    pods: "0"
  appliedEnableNetworkPolicy: false
  appliedPodSecurityPolicyTemplateId: ""
  appliedSpec:
    agentImageOverride: ""
    answers: {}
    description: ""
    desiredAgentImage: ""
    desiredAuthImage: ""
    displayName: ""
    enableClusterAlerting: false
    enableClusterMonitoring: false
    enableNetworkPolicy: null
    internal: false
    localClusterAuthEndpoint:
      enabled: false
    windowsPreferedCluster: false
  authImage: ""
  capabilities:
    ingressCapabilities:
    - customDefaultBackend: false
      ingressProvider: nginx
    loadBalancerCapabilities: {}
    nodePortRange: 30000-32767
    pspEnabled: true
  capacity:
    cpu: "0"
    memory: "0"
    pods: "0"
  conditions:
  - status: "True"
    type: Pending
  - lastUpdateTime: "2022-02-23T23:00:38Z"
    message: waiting for etcd, controlplane and worker nodes to be registered
    reason: Provisioning
    status: Unknown
    type: Provisioned
  - lastUpdateTime: "2022-02-23T23:00:38Z"
    message: Waiting for API to be available
    status: Unknown
    type: Waiting
  - lastUpdateTime: "2022-02-23T23:00:37Z"
    status: "True"
    type: BackingNamespaceCreated
  - lastUpdateTime: "2022-02-23T23:00:37Z"
    status: "True"
    type: DefaultProjectCreated
  - lastUpdateTime: "2022-02-23T23:00:37Z"
    status: "True"
    type: SystemProjectCreated
  - lastUpdateTime: "2022-02-23T23:00:37Z"
    status: "True"
    type: InitialRolesPopulated
  - lastUpdateTime: "2022-02-23T23:00:37Z"
    status: "True"
    type: CreatorMadeOwner
  - lastUpdateTime: "2022-02-23T23:00:38Z"
    status: "True"
    type: NoDiskPressure
  - lastUpdateTime: "2022-02-23T23:00:38Z"
    status: "True"
    type: NoMemoryPressure
  - lastUpdateTime: "2022-02-23T23:00:38Z"
    status: "True"
    type: SecretsMigrated
  - lastUpdateTime: "2022-02-23T23:00:38Z"
    status: "False"
    type: Connected
  driver: rancherKubernetesEngine
  eksStatus:
    managedLaunchTemplateID: ""
    managedLaunchTemplateVersions: null
    privateRequiresTunnel: null
    securityGroups: null
    subnets: null
    upstreamSpec: null
    virtualNetwork: ""
  gkeStatus:
    privateRequiresTunnel: null
    upstreamSpec: null
  limits:
    cpu: "0"
    memory: "0"
    pods: "0"
  provider: ""
  requested:
    cpu: "0"
    memory: "0"
    pods: "0"

Currently the CLI download found in the Rancher UI v2.6-head is v2.6.4-rc1 so I ran into the error while testing. The CLI download will be updated to the latest version before the release.

mickaeldecastro changed the title ~~Rancher cli loading rke config~~ Rancher cli loading rke config doesn't work Feb 21, 2020

mickaeldecastro changed the title ~~Rancher cli loading rke config doesn't work~~ [cli] Loading rke config doesn't work Feb 21, 2020

janeczku added kind/bug Issues that are defects reported by users or that we know have reached a real release area/cli labels Apr 9, 2020

maggieliu modified the milestones: v2.4.4, v2.4 - Backlog Apr 9, 2020

maggieliu added the [zube]: Release Candidates label Apr 9, 2020

maggieliu added internal [zube]: Team Red Backlog labels Apr 10, 2020

zube bot removed the [zube]: Release Candidates label Apr 10, 2020

deniseschannon removed this from the v2.4 - Backlog milestone Jan 29, 2021

zaggash marked this as a duplicate of #27982 Mar 3, 2021

Jono-SUSE-Rancher added [zube]: RT - Sprint Ready and removed [zube]: Team Red Backlog labels Mar 30, 2021

maxsokolovsky self-assigned this Nov 12, 2021

maxsokolovsky added [zube]: Working and removed [zube]: RT - Sprint Ready labels Nov 12, 2021

Jono-SUSE-Rancher added this to the v2.x - Backlog milestone Nov 17, 2021

maxsokolovsky added [zube]: Review and removed [zube]: Working labels Nov 29, 2021

deniseschannon removed this from the v2.x - Backlog milestone Dec 1, 2021

deniseschannon added this to the v2.6.4 milestone Dec 1, 2021

deniseschannon added the team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support label Dec 1, 2021

deniseschannon mentioned this issue Dec 1, 2021

Transform all keys to JSON format for RKE config values rancher/cli#310

Merged

deniseschannon modified the milestones: v2.6.4, v2.6.4 - Triaged Dec 1, 2021

sowmyav27 assigned timhaneunsoo Feb 2, 2022

snasovich self-assigned this Feb 17, 2022

maxsokolovsky added [zube]: To Test and removed [zube]: Review labels Feb 18, 2022

sowmyav27 added [zube]: QA Next up QA/S and removed [zube]: To Test labels Feb 21, 2022

timhaneunsoo added [zube]: QA Working and removed [zube]: QA Next up labels Feb 23, 2022

timhaneunsoo added [zube]: Reopened and removed [zube]: QA Working labels Feb 23, 2022

timhaneunsoo closed this as completed Feb 23, 2022

timhaneunsoo added [zube]: Done and removed [zube]: Reopened labels Feb 23, 2022

zube bot removed the [zube]: Done label May 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cli] Loading rke config doesn't work #25416

[cli] Loading rke config doesn't work #25416

mickaeldecastro commented Feb 14, 2020 •

edited by deniseschannon

janeczku commented Apr 9, 2020 •

edited

zaggash commented Mar 3, 2021 •

edited

zaggash commented Mar 3, 2021

betweenclouds commented Jun 10, 2021

maxsokolovsky commented Nov 18, 2021

maxsokolovsky commented Nov 18, 2021 •

edited

maxsokolovsky commented Nov 18, 2021

maxsokolovsky commented Nov 24, 2021

maxsokolovsky commented Feb 18, 2022

timhaneunsoo commented Feb 23, 2022

maxsokolovsky commented Feb 23, 2022

timhaneunsoo commented Feb 23, 2022

[cli] Loading rke config doesn't work #25416

[cli] Loading rke config doesn't work #25416

Comments

mickaeldecastro commented Feb 14, 2020 • edited by deniseschannon

janeczku commented Apr 9, 2020 • edited

zaggash commented Mar 3, 2021 • edited

zaggash commented Mar 3, 2021

betweenclouds commented Jun 10, 2021

maxsokolovsky commented Nov 18, 2021

maxsokolovsky commented Nov 18, 2021 • edited

maxsokolovsky commented Nov 18, 2021

maxsokolovsky commented Nov 24, 2021

maxsokolovsky commented Feb 18, 2022

QA Testing

Root cause

What was fixed, or what changes have occurred

Areas or cases that should be tested

What areas could experience regressions?

Steps

timhaneunsoo commented Feb 23, 2022

Test Environment:

Testing:

maxsokolovsky commented Feb 23, 2022

timhaneunsoo commented Feb 23, 2022

mickaeldecastro commented Feb 14, 2020 •

edited by deniseschannon

janeczku commented Apr 9, 2020 •

edited

zaggash commented Mar 3, 2021 •

edited

maxsokolovsky commented Nov 18, 2021 •

edited