New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-scheduler die because of lost lease while kube-apiserver is still working #70730

Open
gbjuno opened this Issue Nov 7, 2018 · 3 comments

Comments

Projects
None yet
3 participants
@gbjuno

gbjuno commented Nov 7, 2018

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):
kube-scheduler, scheduler, dies, lost, master

What happened:
The kube-scheduler process dies in a cluster which lost two of three master nodes while kube-apiserver is still working. All the three master nodes were running etcd/kube-apiserver/controller/scheduler components.

Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: I1107 14:37:56.340805   30766 leaderelection.go:243] lock is held by 10-2-146-46_e3be14f7-e256-11e8-ae98-0050569e4dac and has not yet expired
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: I1107 14:37:56.340874   30766 round_trippers.go:384] GET http://10.2.146.250:8080/api/v1/namespaces/kube-system/endpoints/kube-scheduler
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: I1107 14:37:56.340882   30766 round_trippers.go:391] Request Headers:
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: I1107 14:37:56.340886   30766 round_trippers.go:394]     Accept: application/vnd.kubernetes.protobuf, */*
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: I1107 14:37:56.340891   30766 round_trippers.go:394]     User-Agent: kube-scheduler/v1.10.8 (linux/amd64) kubernetes/7eab6a4/leader-election
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: I1107 14:37:56.342930   30766 round_trippers.go:409] Response Status: 200 OK in 2 milliseconds
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: I1107 14:37:56.342949   30766 round_trippers.go:412] Response Headers:
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: I1107 14:37:56.342955   30766 round_trippers.go:415]     Content-Type: application/vnd.kubernetes.protobuf
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: I1107 14:37:56.342959   30766 round_trippers.go:415]     Date: Wed, 07 Nov 2018 06:37:56 GMT
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: I1107 14:37:56.342964   30766 round_trippers.go:415]     Content-Length: 415
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: I1107 14:37:56.343034   30766 request.go:872] Response Body:
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000000  6b 38 73 00 0a 0f 0a 02  76 31 12 09 45 6e 64 70  |k8s.....v1..Endp|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000010  6f 69 6e 74 73 12 83 03  0a 80 03 0a 0e 6b 75 62  |oints........kub|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000020  65 2d 73 63 68 65 64 75  6c 65 72 12 00 1a 0b 6b  |e-scheduler....k|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000030  75 62 65 2d 73 79 73 74  65 6d 22 37 2f 61 70 69  |ube-system"7/api|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000040  2f 76 31 2f 6e 61 6d 65  73 70 61 63 65 73 2f 6b  |/v1/namespaces/k|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000050  75 62 65 2d 73 79 73 74  65 6d 2f 65 6e 64 70 6f  |ube-system/endpo|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000060  69 6e 74 73 2f 6b 75 62  65 2d 73 63 68 65 64 75  |ints/kube-schedu|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000070  6c 65 72 2a 24 61 64 31  65 61 39 64 32 2d 65 32  |ler*$ad1ea9d2-e2|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000080  33 64 2d 31 31 65 38 2d  38 66 30 38 2d 30 30 35  |3d-11e8-8f08-005|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000090  30 35 36 39 65 66 34 61  33 32 05 31 36 39 38 36  |0569ef4a32.16986|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 000000a0  38 00 42 08 08 ac b9 89  df 05 10 00 62 ea 01 0a  |8.B.........b...|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 000000b0  28 63 6f 6e 74 72 6f 6c  2d 70 6c 61 6e 65 2e 61  |(control-plane.a|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 000000c0  6c 70 68 61 2e 6b 75 62  65 72 6e 65 74 65 73 2e  |lpha.kubernetes [truncated 1029 chars]
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: I1107 14:37:56.343119   30766 leaderelection.go:243] lock is held by 10-2-146-46_e3be14f7-e256-11e8-ae98-0050569e4dac and has not yet expired
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: I1107 14:37:56.343175   30766 leaderelection.go:203] failed to renew lease kube-system/kube-scheduler: timed out waiting for the condition
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: E1107 14:37:56.343201   30766 server.go:612] lost master
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: F1107 14:37:56.343230   30766 helpers.go:119] error: lost lease
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: I1107 14:37:56.344868   30766 request.go:872] Request Body:
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000000  6b 38 73 00 0a 0b 0a 02  76 31 12 05 45 76 65 6e  |k8s.....v1..Even|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000010  74 12 aa 02 0a 3c 0a 1f  6b 75 62 65 2d 73 63 68  |t....<..kube-sch|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000020  65 64 75 6c 65 72 2e 31  35 36 34 63 34 32 65 37  |eduler.1564c42e7|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000030  38 65 31 62 37 64 33 12  00 1a 0b 6b 75 62 65 2d  |8e1b7d3....kube-|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000040  73 79 73 74 65 6d 22 00  2a 00 32 00 38 00 42 00  |system".*.2.8.B.|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000050  7a 00 12 5b 0a 09 45 6e  64 70 6f 69 6e 74 73 12  |z..[..Endpoints.|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000060  0b 6b 75 62 65 2d 73 79  73 74 65 6d 1a 0e 6b 75  |.kube-system..ku|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000070  62 65 2d 73 63 68 65 64  75 6c 65 72 22 24 61 64  |be-scheduler"$ad|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000080  31 65 61 39 64 32 2d 65  32 33 64 2d 31 31 65 38  |1ea9d2-e23d-11e8|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 00000090  2d 38 66 30 38 2d 30 30  35 30 35 36 39 65 66 34  |-8f08-0050569ef4|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 000000a0  61 33 2a 02 76 31 32 05  31 36 39 38 36 3a 00 1a  |a3*.v12.16986:..|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 000000b0  0e 4c 65 61 64 65 72 45  6c 65 63 74 69 6f 6e 22  |.LeaderElection"|
Nov  7 14:37:56 10-2-146-48 kube-scheduler[30766]: 000000c0  40 31 30 2d 32 2d 31 34  36 2d 34 38 5f 37 33 32  |@10-2-146-48_73 [truncated 621 chars]

What you expected to happen:
kube-scheduler should renew the leader and continue running

How to reproduce it (as minimally and precisely as possible):
kill two of three master nodes, kube-scheduler running in the healthy one dies.

Anything else we need to know?:
At the same time , kube-apiserver is still working.

root@10-2-146-48:/var/log/upstart# curl http://10.2.146.250:8080/api/v1/namespaces/kube-system/endpoints/kube-scheduler
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-scheduler",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-scheduler",
    "uid": "ad1ea9d2-e23d-11e8-8f08-0050569ef4a3",
    "resourceVersion": "16986",
    "creationTimestamp": "2018-11-07T03:31:56Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"10-2-146-46_e3be14f7-e256-11e8-ae98-0050569e4dac\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2018-11-07T06:32:47Z\",\"renewTime\":\"2018-11-07T06:36:04Z\",\"leaderTransitions\":6}"
    }
  }

Environment:

  • Kubernetes version (use kubectl version): v1.10.8
  • Cloud provider or hardware configuration: ubuntu
  • OS (e.g. from /etc/os-release): ubuntu 16.04
  • Kernel (e.g. uname -a): Linux 10-2-146-48 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: binary run
  • Others:
    etcd cluster (5 nodes);
    kube-apiserver/controller/scheduler(3 nodes + vip + haproxy in master-backup mode serving kube-apiserver)

/kind bug

@gbjuno

This comment has been minimized.

gbjuno commented Nov 7, 2018

/sig Scheduling

@zjj2wry

This comment has been minimized.

Member

zjj2wry commented Nov 7, 2018

/cc
@gbjuno can you provider kubectl version information, thank。and this can be reproduce in your env?

@gbjuno

This comment has been minimized.

gbjuno commented Nov 8, 2018

root@10-2-146-46:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.8", GitCommit:"7eab6a49736cc7b01869a15f9f05dc5b49efb9fc", GitTreeState:"clean", BuildDate:"2018-09-25T05:25:55Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.8", GitCommit:"7eab6a49736cc7b01869a15f9f05dc5b49efb9fc", GitTreeState:"clean", BuildDate:"2018-09-25T05:23:43Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment