Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to configure a second scheduler to run as static Pod(s) #22802

Closed
burvilc opened this issue Jul 28, 2020 · 6 comments
Closed

Document how to configure a second scheduler to run as static Pod(s) #22802

burvilc opened this issue Jul 28, 2020 · 6 comments
Labels
language/en Issues or PRs related to English language lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@burvilc
Copy link

burvilc commented Jul 28, 2020

This is a Bug Report

Problem:
This page shows how to implement the solution (a second scheduler) with a deployment. I'd think that since the existing scheduler is by default a static pod, wouldn't it make sense to at least also document the process for a static pod?

Proposed Solution:
Update the documentation for changes needed for a static pod.

Page to Update:
https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/

1.18.x

Below is what I've posted at https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/learn/lecture/14298694#questions/11881944, where I tried to adapt these instructions for a static pod, but am having issues.

I’ve been trying to implement multiple schedulers using the official Kubernetes documentation at https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/; following the steps in that link work fine. However, when I try to adapt it for a static pod (like in the Kodekloud lab; I think it makes a lot more sense to do this as a static pod and not a deployment), I run into problems.

I seem to get stuck when trying to use the service account, which is needed to get information on the endpoints, i.e. I think the nodes, as part of the leader election. Note that I realize I’m using leader-elect=true, which is different from the lab. Not only is setting it to false a single point of failure, but the Kubernetes documentation says to do the opposite, i.e. it starts out with it false, but then says to change it. Since I think the CKA exam will be tested based on the official documentation, I’m trying to get this to work. I’ve documented my steps here. Please review this and think about what I’m missing. I’ve listed the specific questions I have that I’d like answers to at the end of this post.

ASSUMPTIONS

My assumptions regarding the content in https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/ and adapting it for a static pod:

  1. The section under ‘Package the scheduler’ can be skipped if I want to use the same image as my current default scheduler; I just use the current image and not a new one.

  2. Setting up the cluster role, cluster role binding and service account are the same.

  3. Path of kube-scheduler as well as image should mirror what’s in manifest of currently running yaml

=================================================================

ATTEMPT #1: BASELINE - STEPS IN K8S DOCUMENTATION WORK

So, I first try going through the steps in https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/, with adaptations for assumption #3 above. I open https://kodekloud.com/courses/certified-kubernetes-administrator-with-practice-tests-labs/lectures/12038844 and the associated quiz tab.

I copy the yaml from the URL, with the changes for the path of kube-scheduler and image:

master $ grep -A 5 command my-scheduler.yml
   - command:
    - kube-scheduler
    - --address=0.0.0.0
    - --leader-elect=false
    - --scheduler-name=my-scheduler
    image: k8s.gcr.io/kube-scheduler:v1.18.0

I create the elements in the yaml, which seem to run fine:

master $ kubectl create -f my-scheduler.yml
serviceaccount/my-scheduler created
clusterrolebinding.rbac.authorization.k8s.io/my-scheduler-as-kube-scheduler created
clusterrolebinding.rbac.authorization.k8s.io/my-scheduler-as-volume-scheduler created
deployment.apps/my-scheduler created
master $ kubectl get all
NAME         TYPE    CLUSTER-IP  EXTERNAL-IP  PORT(S)  AGE
service/kubernetes  ClusterIP  10.96.0.1  <none>    443/TCP  3m45s
master $ kubectl get all -n kube-system | grep my
pod/my-scheduler-79d8f64fb8-96hgf       1/1   Running  0     28s
deployment.apps/my-scheduler       1/1   1      1      28s
replicaset.apps/my-scheduler-79d8f64fb8       1     1     1    28s

I update the nginx yaml in the lab and run it.

master $ vi nginx-pod.yaml
master $ kubectl create -f nginx-pod.yaml
pod/nginx created

The pod seems to have been created OK, and assigned using the alternate my-scheduler scheduler.

master $ kubectl get pods
NAME  READY  STATUS  RESTARTS  AGE
nginx  1/1   Running  0     17s
master $ kubectl describe pod nginx | grep my
 Normal Scheduled <unknown> my-scheduler   Successfully assigned default/nginx to node01


This gives a baseline, showing that the steps on the Kubernetes page works fine. Note that in this case, we have leader-elect set to true, not false.

==========================================================

ATTEMPT #2: CHANGE STEPS TO STATIC POD - PROBLEM WITH SERVICEACCOUNT

Here, I try to adapt the previous steps to use a static pod and not a deployment. I open (i.e. refresh the page to start over) https://kodekloud.com/courses/certified-kubernetes-administrator-with-practice-tests-labs/lectures/12038844 and the associated quiz tab.

  1. I copy the yaml and change it.
master $ cd /etc/kubernetes/manifests/
master $ ls
etcd.yaml kube-apiserver.yaml kube-controller-manager.yaml kube-scheduler.yaml
master $ cp -p kube-scheduler.yaml my-scheduler.yml
master $ vi my-scheduler.yml

  1. I set up the service account, cluster role, and cluster role binding needed for leader-elect=true to work. These are steps before the deployment (see description below the output).
master $ cd ~
master $ vi acct-setup.yml
master $ kubectl create -f acct-setup.yml
serviceaccount/my-scheduler created
clusterrolebinding.rbac.authorization.k8s.io/my-scheduler-as-kube-scheduler created
clusterrolebinding.rbac.authorization.k8s.io/my-scheduler-as-volume-scheduler created

The following is what’s in acct-setup.yml. It’s the first part of the yaml in the Kubernetes documentation, with the yaml for the deployment removed, since we want to deploy a static POD and not a deployment. I’d think these other steps for the service account, cluster role, and cluster role bindings should be the same.

apiVersion: v1
kind: ServiceAccount
metadata:
 name: my-scheduler
 namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
 name: my-scheduler-as-kube-scheduler
subjects:
- kind: ServiceAccount
 name: my-scheduler
 namespace: kube-system
roleRef:
 kind: ClusterRole
 name: system:kube-scheduler
 apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
 name: my-scheduler-as-volume-scheduler
subjects:
- kind: ServiceAccount
 name: my-scheduler
 namespace: kube-system
roleRef:
 kind: ClusterRole
 name: system:volume-scheduler
 apiGroup: rbac.authorization.k8s.io

I also edit the cluster role as specified.

master $ kubectl edit clusterrole system:kube-scheduler
clusterrole.rbac.authorization.k8s.io/system:kube-scheduler edited
master $ pwd
/etc/kubernetes/manifests

The changes seem to have taken effect:
master $ kubectl describe clusterrole system:kube-scheduler
Name:     system:kube-scheduler
Labels:    kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
 Resources                 Non-Resource URLs Resource Names  Verbs
 ---------                 ----------------- --------------  -----
<snip>
 endpoints                 []         [kube-scheduler] [get update]
 endpoints                 []         [my-scheduler]  [get update]
 leases.coordination.k8s.io         []         [kube-scheduler] [get update]
 leases.coordination.k8s.io         []         [my-scheduler]  [get update]
 pods/status                []         []        [patch update]
master $ kubectl describe clusterrolebindings |less
master $ kubectl describe clusterrolebindings |grep -A 7 my-
Name:     my-scheduler-as-kube-scheduler
Labels:    <none>
Annotations: <none>
Role:
 Kind: ClusterRole
 Name: system:kube-scheduler
Subjects:
 Kind      Name     Namespace
--
 ServiceAccount my-scheduler kube-system


Name:     my-scheduler-as-volume-scheduler
Labels:    <none>
Annotations: <none>
Role:
 Kind: ClusterRole
 Name: system:volume-scheduler
Subjects:
 Kind      Name     Namespace
--
 ServiceAccount my-scheduler kube-system
  1. I set up the pod with leader-elect=false, and it doesn’t even run. I simply take the portion of the deployment yaml from the baseline and put it in the pod’s yaml, so I think it should run. How can I check into why the static pod is not being created? Note that kubectl logs my-scheduler-master returns nothing, since there is no pod for which to create logs, and kubectl get events also shows nothing. Note also the significant differences with the answer key in the lab, specifically the a) command options, b) image, c) ports used.
master $ cat my-scheduler.yml
apiVersion: v1kind: Pod
metadata:
 creationTimestamp: null
 labels:
  component: my-scheduler
  tier: control-plane
 name: my-scheduler
 namespace: kube-system
spec:
 serviceAccountName: my-scheduler
 containers:
 - command:
  - kube-scheduler
  - --address=0.0.0.0
  - --leader-elect=false
  - --scheduler-name=my-scheduler
  image: k8s.gcr.io/kube-scheduler:v1.18.0
  livenessProbe:
   httpGet:
    path: /healthz
    port: 10251
   initialDelaySeconds: 15
  name: kube-second-scheduler
  readinessProbe:
   httpGet:
    path: /healthz
    port: 10251
  resources:
   requests:
    cpu: '0.1'
  securityContext:
   privileged: false
  volumeMounts: []
 hostNetwork: false
 hostPID: false
 volumes: []
master $ cp -p my-scheduler.yml /etc/kubernetes/manifests/
master $ watch kubectl get pods -n kube-system
master $ date; kubectl get pods -n kube-system | grep my
Thu Jul 23 01:42:52 UTC 2020
master $ date; kubectl get pods -n kube-system | grep my
Thu Jul 23 01:43:09 UTC 2020
  1. When I remove the line serviceAccountName from the yaml, it does get picked up and started, but is now running into errors on ports. While it would be logical to change ports to avoid the conflict, I can’t see why, as I’m essentially doing the same thing as in the working baseline above, with the only difference that I’m deploying a static pod and not a deployment (I think; or at least that’s my intention). Obviously, I’m missing something here, otherwise this would work, so any thoughts as to what I'm missing would be welcomed. Note that I haven’t even gotten to specifying anything about the lock object namespace nor the lock object name; I’m just mentioning the service account here. I’d think if I can’t even get the mention of the service account to work, the lock objects won’t, either.

I remove the serviceAccountName entry:

master $ cd /etc/kubernetes/manifests/
master $ grep service my-scheduler.yml
 serviceAccountName: my-scheduler
master $ vi my-scheduler.yml
master $ grep service my-scheduler.yml

And I see it’s restarted, but with an error.

master $ date; kubectl get pods -n kube-system | grep my
Thu Jul 23 01:46:04 UTC 2020
my-scheduler-master            0/1   CrashLoopBackOff  1     8s
master $ date; kubectl describe pod my-scheduler -n kube-system |less
Thu Jul 23 01:46:39 UTC 2020
master $ date; kubectl describe pod my-scheduler -n kube-system |tail -15
Thu Jul 23 01:48:24 UTC 2020
 PodScheduled   True
Volumes:      <none>
QoS Class:     Burstable
Node-Selectors:   <none>
Tolerations:    :NoExecute
Events:
 Type   Reason     Age          From       Message
 ----   ------     ----         ----       -------
 Normal  SandboxChanged 2m28s         kubelet, master Pod sandbox changed, it will be killed and re-created.
 Warning Unhealthy    2m8s         kubelet, master Readiness probe failed: Get http://10.244.0.6:10251/healthz: dial tcp 10.244.0.6:10251: connect: connection refused
 Normal  Created     99s (x4 over 2m36s)  kubelet, master Created container kube-second-scheduler
 Normal  Started     98s (x4 over 2m36s)  kubelet, master Started container kube-second-scheduler
 Warning BackOff     63s (x10 over 2m25s) kubelet, master Back-off restarting failed container
 Normal  Pulled     48s (x5 over 2m36s)  kubelet, master Container image "k8s.gcr.io/kube-scheduler:v1.18.0" already present on machine

=============================================

ATTEMPT #3: EVEN WITH A WORKING CONFIGURATION, IT’S DIFFERENT

Specifically, it’s different from what’s in the lab as well as in the Kubernetes documentation.

That said, as an exercise in finding something that will actually work, I did change the port (and a couple other settings) and came up with the following yaml, which worked for me, i.e. it deployed a second scheduler as a static pod that I was used to deploy the nginx pod as specified in the quiz/lab. I’ve given an sdiff of my working yaml and what’s in the answer key for the lab of https://kodekloud.com/courses/certified-kubernetes-administrator-with-practice-tests-labs/lectures/12038844.

Some key differences in the output below:

  1. address is set to 0.0.0.0 and not 127.0.0.1 (this is done in favor of the official documentation; see URL I mentioned above)

  2. I have —port and —secure-port defined differently; I think the way it’s defined in the answer key allows sensitive information to be passed in clear text, which is a huge security vulnerability and not something to be done in production. Likewise, I’ve removed scheme; I think Kubernetes just figures this out to be https as a result.

  3. I have updated the image name to be different; I think the one in the answer key is old.

That said, again, this implementation of multiple schedulers is not what’s in the official Kubernetes documentation, and so not what I would think would be on the CKA exam.

master $ sdiff /etc/kubernetes/manifests/my-scheduler.yml /var/answers/my-scheduler.yaml
apiVersion: v1                         apiVersion: v1
kind: Pod                            kind: Pod
metadata:                            metadata:
                               >  annotations:
                               >   scheduler.alpha.kubernetes.io/critical-pod: ""
 creationTimestamp: null                     creationTimestamp: null
 labels:                             labels:
  component: my-scheduler                 |   component: my-scheduler
  tier: control-plane                       tier: control-plane
 name: my-scheduler                       name: my-scheduler
 namespace: kube-system                     namespace: kube-system
spec:             spec:
 containers:                           containers:
 - command:                           - command:
  - kube-scheduler                        - kube-scheduler
  - --authentication-kubeconfig=/etc/kubernetes/scheduler.c |    - --address=127.0.0.1
  - --authorization-kubeconfig=/etc/kubernetes/scheduler.co <
  - --address=0.0.0.0                    <
  - --scheduler-name=my-scheduler              <
  - --kubeconfig=/etc/kubernetes/scheduler.conf          - --kubeconfig=/etc/kubernetes/scheduler.conf
  - --port=0                        <
  - --secure-port=10282                    <
  - --leader-elect=false                     - --leader-elect=false
  image: k8s.gcr.io/kube-scheduler:v1.18.0         |   - --port=10282
                               >   - --scheduler-name=my-scheduler
                               >   - --secure-port=0
                               >   image: k8s.gcr.io/kube-scheduler-amd64:v1.16.0
  imagePullPolicy: IfNotPresent                  imagePullPolicy: IfNotPresent
  livenessProbe:                         livenessProbe:
   failureThreshold: 8                       failureThreshold: 8
   httpGet:                            httpGet:
    host: 127.0.0.1                         host: 127.0.0.1
    path: /healthz                         path: /healthz
     port: 10282                      |     port: 10282
                               >     scheme: HTTP
   initialDelaySeconds: 15                     initialDelaySeconds: 15
   timeoutSeconds: 15                       timeoutSeconds: 15
  name: my-scheduler                    |   name: kube-scheduler
  resources:                           resources:
   requests:                            requests:
    cpu: 100m                            cpu: 100m
  volumeMounts:                          volumeMounts:
  - mountPath: /etc/kubernetes/scheduler.conf           - mountPath: /etc/kubernetes/scheduler.conf
   name: kubeconfig                        name: kubeconfig
   readOnly: true                         readOnly: true
 hostNetwork: true                        hostNetwork: true
 priorityClassName: system-cluster-critical           priorityClassName: system-cluster-critical
 volumes:                            volumes:
 - hostPath:                           - hostPath:
   path: /etc/kubernetes/scheduler.conf              path: /etc/kubernetes/scheduler.conf
   type: FileOrCreate                       type: FileOrCreate
  name: kubeconfig                        name: kubeconfig
status: {}                           status: {}

=================================

MY QUESTIONS

  1. Why isn’t the static pod based on the YAML from the kubernetes documentation not starting?

  2. Why is it running into a problem related to the service account?

  3. With the following specific deviations between the lab material and the official Kubernetes website, are there plans to update the course? Or am I missing something? I'd say 3.3 and 3.4 are especially significant.
    3.1. command options for container
    3.2. image for container
    3.3. ports used, for liveness probe and otherwise
    3.4. setting leader-elect to true and not false

In summary, while the lab and the Kubernetes documentation are similar at a high level, they are significantly different in at least two areas (a and b below). I have 2 implementations/configurations (attempts 1 and 2 above) that don’t match what’s in the multiple schedulers lab; the most critical differences are:

a) use of insecure HTTP and not HTTPS
b) setting leader-elect to false and not true

@sftim
Copy link
Contributor

sftim commented Aug 10, 2020

/retitle Document how to configure a second scheduler to run as static Pod(s)

@k8s-ci-robot k8s-ci-robot changed the title Issue with k8s.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/ Document how to configure a second scheduler to run as static Pod(s) Aug 10, 2020
@sftim
Copy link
Contributor

sftim commented Aug 10, 2020

/sig scheduling
/language en

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. language/en Issues or PRs related to English language labels Aug 10, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 8, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 8, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language/en Issues or PRs related to English language lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

No branches or pull requests

4 participants