Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot apply KubeSchedulerConfiguration using config file on k3s startup #6301

Closed
TimoVerbrugghe opened this issue Oct 19, 2022 · 12 comments
Closed

Comments

@TimoVerbrugghe
Copy link

TimoVerbrugghe commented Oct 19, 2022

Environmental Info:
K3s Version:
k3s version v1.24.6+k3s1 (a8e0c66)
go version go1.18.6

Node(s) CPU architecture, OS, and Version:
Running 3 nodes in Ubuntu server VMs (1 physical machine, ryzen 5 3600, each VM has 6 gb of ram)
Linux demoa 5.15.0-52-generic #58-Ubuntu SMP Thu Oct 13 08:03:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
3 master nodes which can also act as worker nodes.

Describe the bug:
Trying to apply the following configuration file for the kubescheduler, which is stored in /var/lib/scheduler/scheduler-config.yaml (as defined here: https://kubernetes.io/docs/reference/config-api/kube-scheduler-config.v1/)

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  pluginConfig:
  - name: PodTopologySpread
    args:
      defaultConstraints:
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: ScheduleAnyway

Booting k3s using a config file that has the following:

kube-scheduler-arg:
- "config=/var/lib/scheduler/scheduler-config.yaml"

However, when starting up, the k3s service unit goes to a failed status and the following shows up in the logs:

Error: no kind "KubeSchedulerConfiguration" is registered for version "kubescheduler.config.k8s.io/v1" in scheme "k8s.io/apimachinery@v1.24.6-k3s1/pkg/r>

Steps To Reproduce:

  • Create k3s config file and place it at /etc/rancher/k3s/config.yaml
  • Create scheduler config file and place it at /var/lib/scheduler/scheduler-config.yaml (see above)
  • Run k3s with curl -sfL https://get.k3s.io | sh -
  • Redo previous 3 steps on the 2 other nodes, expect without "cluster-init:true" in the k3s config file
cluster-init: true
tls-san:
- 192.168.0.20
write-kubeconfig-mode: '644'

# Setting IPs for pods & services (needed for tailscale routing)
cluster-cidr: 10.42.0.0/16
service-cidr: 10.43.0.0/16

# Disable traefik & servicelb -> Will install traefik manually & using metallb/kube-vip
disable:
- traefik
- servicelb

kubelet-arg:
- "feature-gates=GracefulNodeShutdown=true"
- "feature-gates=MixedProtocolLBService=true"
- "node-status-update-frequency=4s"

kube-controller-manager-arg:
- "node-monitor-period=4s"
- "node-monitor-grace-period=16s"
- "pod-eviction-timeout=20s"

kube-apiserver-arg:
- "default-not-ready-toleration-seconds=20"
- "default-unreachable-toleration-seconds=20" 

kube-scheduler-arg:
- "config=/var/lib/scheduler/scheduler-config.yaml"

etcd-expose-metrics: true

Expected behavior:
K3s server starts up and kube-scheduler configuration is applied

Actual behavior:
k3s server fails to start with error Error: no kind "KubeSchedulerConfiguration" is registered for version "kubescheduler.config.k8s.io/v1" in scheme "k8s.io/apimachinery@v1.24.6-k3s1/pkg

Additional context / logs:
Let me know if I need to provide anything further, but it either seems a bug or a configuration error on my end, but wouldn't know what then :s

@brandond
Copy link
Contributor

brandond commented Oct 19, 2022

You're using K3s 1.24 but trying to follow the Kubernetes 1.25 docs. Either upgrade to K3s 1.25, or reference the correct documentation version:
https://v1-24.docs.kubernetes.io/docs/reference/config-api/kube-scheduler-config.v1beta3/#kubescheduler-config-k8s-io-v1beta3-KubeSchedulerConfiguration

@TimoVerbrugghe
Copy link
Author

Good catch, thanks! However, I don't understand why I'm running 1.24: this cluster was installed today with the commands and config file above, how come it didn't install k3s 1.25 then? Any specific config I need to set to get the latest version?

@brandond
Copy link
Contributor

We haven't released 1.25 to the stable channel yet, only latest. See https://docs.k3s.io/upgrades/manual#release-channels

@TimoVerbrugghe
Copy link
Author

TimoVerbrugghe commented Oct 20, 2022

Mmm, just nuked my cluster and tried to start with clean slate.

Installed k3s with curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=latest sh - and the config file mentioned above, but now I get the following error when starting k3s server

Error: json: cannot unmarshal object into Go struct field KubeSchedulerConfiguration.profiles of type []v1.KubeSchedulerProfile

Then k3s just tries to restart indefinitely...

If I comment out the kube-scheduler-arg portion of the config file, then k3s starts successfully

EDIT: Same thing is happening on a fresh install in the stable channel, with my scheduler-config.yaml file pointing to v1beta3 struct to make it 1.24 compatible.

@brandond
Copy link
Contributor

brandond commented Oct 20, 2022

From reading the docs you linked to, it appears that the profiles value is supposed to be a list of profiles. This is also made clear by the error saying that it can't unmarshal your value to []v1.KubeSchedulerProfile (a list of profiles). You just have a single profile as the value, not contained in a list.

If you're having a hard time constructing yaml from the docs alone, you might want to find a working example config somewhere, and start from that.

@TimoVerbrugghe
Copy link
Author

TimoVerbrugghe commented Oct 20, 2022

Agreed, this is indeed still new to me. So really appreciate your input already on this :).

Uninstalled k3s on all nodes using the uninstall script. Testing for now only on 1 node.

Allright so starting over:

apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: KubeSchedulerConfiguration

profiles:
  - schedulerName: default-scheduler
    pluginConfig:
      - name: PodTopologySpread
        args:
          defaultConstraints:
            - maxSkew: 1
              topologyKey: topology.kubernetes.io/zone
              whenUnsatisfiable: ScheduleAnyway
          defaultingType: List

I change topologyKey to kubernetes.io/hostname, since that is the maxSkew default value I want to overwrite (current default is 3 as mentioned here in the kubernetes docs)

I place this file in /var/lib/scheduler/scheduler-config.yaml

I define my config file, where I load in the scheduler-config.yaml:

cluster-init: true
tls-san:
- 192.168.0.20
write-kubeconfig-mode: '644'

# Setting IPs for pods & services (needed for tailscale routing)
cluster-cidr: 10.42.0.0/16
service-cidr: 10.43.0.0/16

# Disable traefik & servicelb -> Will install traefik manually & using metallb/kube-vip
disable:
- traefik
- servicelb

kubelet-arg:
- "feature-gates=GracefulNodeShutdown=true"
- "feature-gates=MixedProtocolLBService=true"
- "node-status-update-frequency=4s"

kube-controller-manager-arg:
- "node-monitor-period=4s"
- "node-monitor-grace-period=16s"
- "pod-eviction-timeout=20s"

kube-apiserver-arg:
- "default-not-ready-toleration-seconds=20"
- "default-unreachable-toleration-seconds=20" 

kube-scheduler-arg:
- "config=/var/lib/scheduler/scheduler-config.yaml"

etcd-expose-metrics: true

Place that config file in /etc/rancher/k3s/config.yaml.

I install k3s with curl -sfL https://get.k3s.io | sh -

k3s gets in a reboot loop with following error (retrieved from journalctl -xu k3s:

Oct 20 09:40:55 demoa k3s[29477]: W1020 09:40:55.437548   29477 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
Oct 20 09:40:55 demoa k3s[29477]: W1020 09:40:55.437569   29477 client_config.go:622] error creating inClusterConfig, falling back to default config: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined
**Oct 20 09:40:55 demoa k3s[29477]: Error: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable**

If I comment out the following in my config file:

kube-scheduler-arg:
- "config=/var/lib/scheduler/scheduler-config.yaml"

Then k3s starts successfully. So unable to change the defaults using an example config file in the kubernetes docs.

Let me know if I need to provide anything more of config or logs.

@brandond
Copy link
Contributor

According to https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/, using --config disables many of the CLI flags, including the --kubeconfig flag. So essentially, if you want to use the config file, you need to ensure that you've also set in the config file all the parameters that K3s sets using CLI flags.

@TimoVerbrugghe
Copy link
Author

TimoVerbrugghe commented Oct 20, 2022

Yeah I saw that in the docs, but all the parameters that mention it say DEPRECATED in front of them, so thought that k3s weren't using them. Allright so then a potential feature request to move from using options to run kube-scheduler to move over to a config file setup that users can modify if needed I presume?

@brandond
Copy link
Contributor

The CLI flags have been deprecated for several years and upstream has made no actual moves to remove them. They have declined to add any new flags though; use of new features usually requires a config file but migrating wholesale to config files instead of CLI flags is a ways out on our radar. For example, the current support we have for configuring components via kube-scheduler-arg and the like will be completely broken since everyone will need to migrate to a config file with a strong schema instead.

@TimoVerbrugghe
Copy link
Author

TimoVerbrugghe commented Oct 20, 2022

Allright understood. Potential compromise to mention somewhere in the docs which flags you are using with kube-scheduler so that if people want to, can create their own config files out of them? I could deduce it from the logs, but would be difficult to do if flag usage (and especially the default values) would change between k3s releases.

@brandond
Copy link
Contributor

brandond commented Oct 20, 2022

We don't change them very frequently, but the values will depend on your configuration. Pulling them out of the logs is probably the best way to go, to ensure that the values are correct for your specific cluster. It shouldn't be too hard since they are printed out during startup.

@caroline-suse-rancher
Copy link
Contributor

Closing this as it appears the issue has been resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants