Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] K3s worker only node does not work with system-default-agent args #45607

Open
orangedeng opened this issue May 27, 2024 · 5 comments
Open
Assignees
Labels
area/k3s area/kdm JIRA To be used in correspondence with the internal ticketing system. kind/bug Issues that are defects reported by users or that we know have reached a real release

Comments

@orangedeng
Copy link
Contributor

Rancher Server Setup

  • Rancher version: v2.7.12(-ent)/v2.8.3(-ent)
  • Installation option (Docker install/Helm Chart): docker

Information about the Cluster

  • Kubernetes version: v1.26.15+k3s1
  • Cluster Type (Local/Downstream): Downstream custom k3s

User Information
N/A

Describe the bug
When using registry.rancher.com or registry.cn-hangzhou.aliyuncs.com as the system-default-registry, creating a k3s cluster and adding the worker only node fails. The configuration delivered to the worker is as follows:

{
  "docker": false,
  "node-label": [
    "cattle.io/os=linux",
    "rke.cattle.io/machine=24fce476-fc09-4fce-9f65-29c496c5453c"
  ],
  "private-registry": "/etc/rancher/k3s/registries.yaml",
  "protect-kernel-defaults": false,
  "selinux": false,
  "server": "https://192.168.110.200:6443",
  "system-default-registry": "registry.cn-hangzhou.aliyuncs.com",
  "token": "wndn5jb5swklscqjhnhlbh9kmrpwh95d8h49l9w6k5d5fskshxhtvs"
}

Because k3s agent does not support the system-default-registry parameter, the service startup fails.

To Reproduce
As described

Result

Expected Result
Worker should be added successfully

Screenshots
N/A

Additional context
It is mostly the KDM problem as the system-default-args exists in k3s agent args.

@orangedeng orangedeng added the kind/bug Issues that are defects reported by users or that we know have reached a real release label May 27, 2024
@skanakal
Copy link
Contributor

skanakal commented May 31, 2024

k3s-io/k3s#7890

SURE-8459

@skanakal skanakal added the JIRA To be used in correspondence with the internal ticketing system. label May 31, 2024
@jiaqiluo jiaqiluo added this to the v2.8.x milestone Jun 21, 2024
@jiaqiluo jiaqiluo self-assigned this Jun 21, 2024
@snasovich snasovich modified the milestones: v2.8.x, v2.x - Backlog Jun 21, 2024
@jiaqiluo
Copy link
Member

jiaqiluo commented Jun 24, 2024

Causes

Two reasons cause the bug:

1/ When we enable and set the registry on a DS K3s cluster, Rancher UI sets the following machineSelectorConfig on the Cluster.provisioning.cattle.io/v1 object, the config is applied to all nodes because of the lack of the machineLabelSelector:

    machineSelectorConfig:
      - config:
          protect-kernel-defaults: false
          system-default-registry: testing.registry.com

2/ system-default-registry, an argument that is supported by k3s-server , is defined under the agentArgs list for k3s-agent in the Kontainer-driver-metadata.

As a result, the arg system-default-registry is sent to the worker-only node, and the k3s-agent.service on the worker node fails to start properly.

Workaround

Instead of leaving it to Rancher UI to configure the registry on the cluster, we need to edit the cluster as YAML to add the following configuration:

    machineSelectorConfig:
      - config:
          docker: false
          protect-kernel-defaults: false
          selinux: false
      - config:
          system-default-registry: registry.hub.docker.com
        machineLabelSelector:
          matchExpressions:
            - key: rke.cattle.io/control-plane-role
              operator: In
              values:
                - 'true'

What it does is to set the system-default-registry argument on the control plane nodes where k3s-server runs.

Note: if your registry needs a credential to log in, you can use the UI to set the registry, but instead of clicking "Save" to apply, click "Edit as YAML", UI should configure the registries.config for you, then modify the machineSelectorConfig section to make sure the system-default-registry is set to the control plane nodes.

@jiaqiluo
Copy link
Member

jiaqiluo commented Jun 24, 2024

To properly fix this bug, we need to move the misplaced argument system-default-registry from the agentArgs to the serverArgs for k3s in the KDM.
We could not modify the templates of the released versions, so the fix will be applied to only new versions.
We also could not modify the UI to append the "missing" machineLabelSelector as the effect on the exiting clusters will be unmanageable especially when considering the DS clusters could be also managed via TF-rancher-provider.

The same error happens on RKE2's template in KDM, but it does not produce the same error on the DS RKE2 cluster because the rke2-agent does not return an error when seeing an unknown argument.

@jiaqiluo
Copy link
Member

Hi @caroline-suse-rancher, the fix would be to be in the k3s template in KDM, which to my knowledge is maintained by the RKE2/K3s team, therefore assigning this issue to you.

@xiaoluhong
Copy link

@jiaqiluo @caroline-suse-rancher This issue is over 90 days old, any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/k3s area/kdm JIRA To be used in correspondence with the internal ticketing system. kind/bug Issues that are defects reported by users or that we know have reached a real release
Projects
None yet
Development

No branches or pull requests

6 participants