Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kube-scheduler dies with "Schedulercache is corrupted" #46347

Closed
pizzarabe opened this issue May 24, 2017 · 25 comments
Closed

Kube-scheduler dies with "Schedulercache is corrupted" #46347

pizzarabe opened this issue May 24, 2017 · 25 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@pizzarabe
Copy link

BUG REPORT

Kubernetes version (use kubectl version):

kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:44:38Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.3+coreos.1", GitCommit:"bc000d3336a0b11155ac222193e6f24b6dcb5cd1", GitTreeState:"clean", BuildDate:"2017-05-19T00:19:21Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Self-hosted:
  • OS
    Container Linux by CoreOS 1353.7.0 (Master + most nodes)
    and
    Container Linux by CoreOS 1298.5.0 (some nodes)
  • Kernel
    1353.7.0: (4.9.24-coreos #1 SMP Wed Apr 26 21:44:23 UTC 2017 x86_64 Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz GenuineIntel GNU/Linux)
    1298.5.0: (Linux alien8 4.9.9-coreos-r1 #1 SMP Tue Feb 28 00:06:10 UTC 2017 x86_64 Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz GenuineIntel GNU/Linux)
  • Install tools:
    Hyperkube (https://coreos.com/kubernetes/docs/latest/getting-started.html)
  • Others:
  • The underlying Hypervisor is ESXi 5.5
  • Kubelet is running in rkt, other kubernetes components (incl. Scheduler) are running in docker
  • The etcd cluster (3 Nodes) is v3.0.10, running inside of rkt containers (using the CoreOS etcd-member.service)

What happened:

After some testing with my private image registry (using the docs at https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/) I had some problems with the kube-scheduler.
Creating a new pod e.g.

apiVersion: v1
kind: Pod
metadata:
  name: privateimage-test
spec:
  containers:
    - name: private-container
      image: server:4567/namespace/container:tag
  imagePullSecrets:
    - name: privateregkey

kills the kube-scheduler

I0524 13:22:55.458837       1 event.go:217] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"privateimage-test", UID:"ff287e49-4083-11e7-8c95-005056b70482", APIVersion:"v1", ResourceVersion:"4460349", FieldPath:""}): type: 'Normal' reason: 'Scheduled' Successfully assigned privateimage-test to alien6
E0524 13:29:33.440433       1 cache.go:290] Pod default/privateimage-test removed from a different node than previously added to.
F0524 13:29:33.440454       1 cache.go:291] Schedulercache is corrupted and can badly affect scheduling decisions

the kube-scheduler is restarted by kubelet and the pod is started after that.

Some output from kubelet

May 24 15:29:34 alien1 kubelet-wrapper[1634]: E0524 13:29:34.085188    1634 event.go:259] Could not construct reference to: '&v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-scheduler-alien1", UID:"cd159dc8e8cca207146c55f953a24533", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{kube-scheduler}"}' due to: 'object does not implement the List interfaces'. Will not report event: 'Warning' 'Unhealthy' 'Liveness probe failed: Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: getsockopt: connection refused'
May 24 15:29:34 alien1 kubelet-wrapper[1634]: I0524 13:29:34.407447    1634 kuberuntime_manager.go:458] Container {Name:kube-scheduler Image:quay.io/coreos/hyperkube:v1.6.3_coreos.1 Command:[/hyperkube scheduler --master=http://127.0.0.1:8080 --leader-elect=true] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:10251,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:15,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
May 24 15:29:34 alien1 kubelet-wrapper[1634]: I0524 13:29:34.407623    1634 kuberuntime_manager.go:742] checking backoff for container "kube-scheduler" in pod "kube-scheduler-alien1_kube-system(cd159dc8e8cca207146c55f953a24533)"
May 24 15:29:34 alien1 kubelet-wrapper[1634]: I0524 13:29:34.407815    1634 kuberuntime_manager.go:752] Back-off 1m20s restarting failed container=kube-scheduler pod=kube-scheduler-alien1_kube-system(cd159dc8e8cca207146c55f953a24533)
May 24 15:29:34 alien1 kubelet-wrapper[1634]: E0524 13:29:34.407895    1634 pod_workers.go:182] Error syncing pod cd159dc8e8cca207146c55f953a24533 ("kube-scheduler-alien1_kube-system(cd159dc8e8cca207146c55f953a24533)"), skipping: failed to "StartContainer" for "kube-scheduler" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=kube-scheduler pod=kube-scheduler-alien1_kube-system(cd159dc8e8cca207146c55f953a24533)"
May 24 15:29:44 alien1 kubelet-wrapper[1634]: I0524 13:29:44.085354    1634 kuberuntime_manager.go:458] Container {Name:kube-scheduler Image:quay.io/coreos/hyperkube:v1.6.3_coreos.1 Command:[/hyperkube scheduler --master=http://127.0.0.1:8080 --leader-elect=true] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:10251,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:15,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
May 24 15:29:44 alien1 kubelet-wrapper[1634]: I0524 13:29:44.085464    1634 kuberuntime_manager.go:742] checking backoff for container "kube-scheduler" in pod "kube-scheduler-alien1_kube-system(cd159dc8e8cca207146c55f953a24533)"
May 24 15:29:44 alien1 kubelet-wrapper[1634]: I0524 13:29:44.085588    1634 kuberuntime_manager.go:752] Back-off 1m20s restarting failed container=kube-scheduler pod=kube-scheduler-alien1_kube-system(cd159dc8e8cca207146c55f953a24533)
May 24 15:29:44 alien1 kubelet-wrapper[1634]: E0524 13:29:44.085627    1634 pod_workers.go:182] Error syncing pod cd159dc8e8cca207146c55f953a24533 ("kube-scheduler-alien1_kube-system(cd159dc8e8cca207146c55f953a24533)"), skipping: failed to "StartContainer" for "kube-scheduler" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=kube-scheduler pod=kube-scheduler-alien1_kube-system(cd159dc8e8cca207146c55f953a24533)"

The journal entry right before that shows:
May 24 15:29:33 alien1 dockerd[1757]: time="2017-05-24T15:29:33.460151779+02:00" level=error msg="Error closing logger: invalid argument"

Creating a new pod with sth. like /kubectl run busybox --image busybox /bin/sh does not kill the scheduler!

What you expected to happen:
The Scheduler should not die :)

How to reproduce it :

Not sure, the cluster worked ~ 30days without problems. The problem started after working with my private image registry. Before that I updated k8s from 1.6.1 to .1.6.3.

Anything else we need to know:

I restarted the Master after the problem first occurred (that helped before with some other problems)

@k82cn
Copy link
Member

k82cn commented May 24, 2017

@kubernetes/sig-scheduling-bugs

@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label May 24, 2017
@davidopp
Copy link
Member

@wojtek-t @bsalamat

@timothysc
Copy link
Member

There are some PR's around this that came in recently, /cc @smarterclayton

@wojtek-t
Copy link
Member

@pizzarabe - can you please provide the whole scheduler log? This is definitely some kind of race, and without logs it will be hard to debug (I think we fixed some issue with it in 1.5.x, but it seems there is something more here...)

@smarterclayton
Copy link
Contributor

we were mutating the shared cache from a threaded environment, so my changes should in theory fix that.

@timothysc
Copy link
Member

xref: #46223

@timothysc timothysc added the kind/bug Categorizes issue or PR as related to a bug. label May 24, 2017
@bsalamat
Copy link
Member

Given this issue, it is important to backport Clayton's fix to 1.6 and 1.5.

@smarterclayton
Copy link
Contributor

smarterclayton commented May 24, 2017 via email

@wojtek-t
Copy link
Member

@smarterclayton @timothysc - can you clarify why do you think this PR can help with this problem? I looked into it again, and I don't see any place where we didn't do copy and we are doing now. I agree it fixes watch semantics, but this is unrelated to this issue.

@smarterclayton
Copy link
Contributor

smarterclayton commented May 25, 2017 via email

@wojtek-t
Copy link
Member

@wojtek-t
Copy link
Member

The problem I can see is that if it's just about status (and we are not mutating spec), we are not using Status in the cache...

@k82cn
Copy link
Member

k82cn commented May 26, 2017

According to the log E0524 13:29:33.440433 1 cache.go:290] Pod default/privateimage-test removed from a different node than previously added to., the issues is triggered by "assigned non-terminated" Pods which is readonly in scheduler.

And it's said

Not sure, the cluster worked ~ 30days without problems. The problem started after working with my private image registry. Before that I updated k8s from 1.6.1 to .1.6.3.

So it maybe related with #45453 , but no direct evidence :(.

@wojtek-t
Copy link
Member

@k82cn in the past we used to have a problem (that would cause exactly this behavior) that:

  • the pod failed scheduling for the first time
  • then we were trying to schedule it again, we managed to find a node
  • we were assuming it and were sending binding to apiserver
  • in the meantime because of some race we were trying to schedule it one more time
  • we were finding a different node for it, and sending binding
  • the second binding actually succeeded
  • and the pod ended up being assumed to a different node that it actually was finally binded to

I thought that this is already fixed, but maybe we should revisit if this is really fixed...

@k82cn
Copy link
Member

k82cn commented May 26, 2017

@wojtek-t , got the case :) Let me revisit it to see whether any clue here.

@pizzarabe
Copy link
Author

@wojtek-t
sure, here is the output from kube-scheduler --v=4 (but I am not sure, if this is more useful then the previous log)

$ docker ps -a
3d358c583a73        quay.io/coreos/hyperkube@sha256:30afab2ccb3d48e77d253249fd085023e68633731a2f0b62f0b89dcd227836bd        "/hyperkube scheduler"   44 seconds ago       Up 44 seconds                                     k8s_kube-scheduler_kube-scheduler-alien1_kube-system_75cd12d030f068637ff8094530e04049_1
02518c502df1        quay.io/coreos/hyperkube@sha256:30afab2ccb3d48e77d253249fd085023e68633731a2f0b62f0b89dcd227836bd        "/hyperkube scheduler"   About a minute ago   Exited (255) 44 seconds ago                   k8s_kube-scheduler_kube-scheduler-alien1_kube-system_75cd12d030f068637ff8094530e04049_0

[The dead scheduler]:

$ docker logs 02518c502df1
I0526 07:42:16.897427       1 factory.go:300] Creating scheduler from algorithm provider 'DefaultProvider'
I0526 07:42:16.897522       1 factory.go:346] Creating scheduler with fit predicates 'map[MaxAzureDiskVolumeCount:{} NoDiskConflict:{} GeneralPredicates:{} CheckNodeMemoryPressure:{} CheckNodeDiskPressure:{} NoVolumeZoneConflict:{} MaxEBSVolumeCount:{} MaxGCEPDVolumeCount:{} MatchInterPodAffinity:{} PodToleratesNodeTaints:{}]' and priority functions 'map[BalancedResourceAllocation:{} NodePreferAvoidPodsPriority:{} NodeAffinityPriority:{} TaintTolerationPriority:{} SelectorSpreadPriority:{} InterPodAffinityPriority:{} LeastRequestedPriority:{}]
I0526 07:42:16.897705       1 reflector.go:198] Starting reflector *v1.Pod (0s) from k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:451
I0526 07:42:16.897817       1 reflector.go:236] Listing and watching *v1.Pod from k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:451
I0526 07:42:16.897920       1 leaderelection.go:179] attempting to acquire leader lease...
I0526 07:42:16.898283       1 reflector.go:198] Starting reflector *v1.Pod (0s) from k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:454
I0526 07:42:16.898327       1 reflector.go:236] Listing and watching *v1.Pod from k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:454
I0526 07:42:16.898440       1 reflector.go:198] Starting reflector *v1.Service (0s) from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:16.898958       1 reflector.go:198] Starting reflector *v1.Node (0s) from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:16.899033       1 reflector.go:236] Listing and watching *v1.Service from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:16.899161       1 reflector.go:236] Listing and watching *v1.Node from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:16.900623       1 reflector.go:198] Starting reflector *v1.PersistentVolumeClaim (0s) from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:16.900661       1 reflector.go:236] Listing and watching *v1.PersistentVolumeClaim from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:16.900838       1 reflector.go:198] Starting reflector *v1beta1.StatefulSet (0s) from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:16.900925       1 reflector.go:236] Listing and watching *v1beta1.StatefulSet from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:16.901626       1 reflector.go:198] Starting reflector *v1beta1.ReplicaSet (0s) from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:16.901663       1 reflector.go:236] Listing and watching *v1beta1.ReplicaSet from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:16.901764       1 reflector.go:198] Starting reflector *v1.PersistentVolume (0s) from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:16.901802       1 reflector.go:236] Listing and watching *v1.PersistentVolume from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:16.903348       1 reflector.go:198] Starting reflector *v1.ReplicationController (0s) from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:16.903384       1 reflector.go:236] Listing and watching *v1.ReplicationController from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:16.908259       1 leaderelection.go:189] successfully acquired lease kube-system/kube-scheduler
I0526 07:42:16.908463       1 event.go:217] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"kube-scheduler", UID:"404f58d4-40d6-11e7-8c95-005056b70482", APIVersion:"v1", ResourceVersion:"4732494", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' alien1 became leader
I0526 07:42:18.914381       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:20.921037       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:22.927073       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:24.933276       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:26.939491       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:28.945485       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:30.951875       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:32.958256       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:34.964190       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:36.970310       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:38.976780       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:40.983164       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:42.989471       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:44.995575       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:47.001928       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:49.008236       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:51.014911       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:53.021414       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:55.027536       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:42:56.335172       1 factory.go:461] About to try and schedule pod privateimage-test
E0526 07:42:56.335185       1 cache.go:290] Pod default/privateimage-test removed from a different node than previously added to.
I0526 07:42:56.335194       1 scheduler.go:156] Attempting to schedule pod: default/privateimage-test
F0526 07:42:56.335199       1 cache.go:291] Schedulercache is corrupted and can badly affect scheduling decisions

[The restarted scheduler]:

$ docker logs 3d358c583a73
I0526 07:42:57.111661       1 factory.go:300] Creating scheduler from algorithm provider 'DefaultProvider'
I0526 07:42:57.111769       1 factory.go:346] Creating scheduler with fit predicates 'map[NoVolumeZoneConflict:{} MaxEBSVolumeCount:{} GeneralPredicates:{} PodToleratesNodeTaints:{} CheckNodeMemoryPressure:{} CheckNodeDiskPressure:{} MaxGCEPDVolumeCount:{} MaxAzureDiskVolumeCount:{} MatchInterPodAffinity:{} NoDiskConflict:{}]' and priority functions 'map[SelectorSpreadPriority:{} InterPodAffinityPriority:{} LeastRequestedPriority:{} BalancedResourceAllocation:{} NodePreferAvoidPodsPriority:{} NodeAffinityPriority:{} TaintTolerationPriority:{}]
I0526 07:42:57.111919       1 reflector.go:198] Starting reflector *v1.Pod (0s) from k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:451
I0526 07:42:57.112005       1 reflector.go:236] Listing and watching *v1.Pod from k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:451
I0526 07:42:57.112123       1 leaderelection.go:179] attempting to acquire leader lease...
I0526 07:42:57.112603       1 reflector.go:198] Starting reflector *v1.Service (0s) from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:57.112671       1 reflector.go:198] Starting reflector *v1beta1.ReplicaSet (0s) from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:57.112671       1 reflector.go:236] Listing and watching *v1.Service from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:57.112748       1 reflector.go:236] Listing and watching *v1beta1.ReplicaSet from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:57.112787       1 reflector.go:198] Starting reflector *v1beta1.StatefulSet (0s) from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:57.112832       1 reflector.go:236] Listing and watching *v1beta1.StatefulSet from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:57.112757       1 reflector.go:198] Starting reflector *v1.Pod (0s) from k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:454
I0526 07:42:57.113230       1 reflector.go:236] Listing and watching *v1.Pod from k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:454
I0526 07:42:57.113581       1 reflector.go:198] Starting reflector *v1.PersistentVolume (0s) from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:57.113638       1 reflector.go:236] Listing and watching *v1.PersistentVolume from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:57.114138       1 reflector.go:198] Starting reflector *v1.Node (0s) from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:57.114311       1 reflector.go:236] Listing and watching *v1.Node from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:57.115015       1 reflector.go:198] Starting reflector *v1.PersistentVolumeClaim (0s) from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:57.115343       1 reflector.go:236] Listing and watching *v1.PersistentVolumeClaim from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:57.116031       1 reflector.go:198] Starting reflector *v1.ReplicationController (0s) from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:57.116071       1 reflector.go:236] Listing and watching *v1.ReplicationController from k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70
I0526 07:42:57.122205       1 leaderelection.go:189] successfully acquired lease kube-system/kube-scheduler
I0526 07:42:57.122482       1 factory.go:461] About to try and schedule pod privateimage-test
I0526 07:42:57.122530       1 scheduler.go:156] Attempting to schedule pod: default/privateimage-test
I0526 07:42:57.122569       1 factory.go:492] Ignoring node alien1 since it is unschedulable
I0526 07:42:57.122527       1 event.go:217] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"kube-scheduler", UID:"404f58d4-40d6-11e7-8c95-005056b70482", APIVersion:"v1", ResourceVersion:"4732579", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' alien1 became leader
I0526 07:42:57.123942       1 factory.go:584] Attempting to bind privateimage-test to alien8
I0526 07:42:57.128634       1 event.go:217] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"privateimage-test", UID:"eed52c47-41e6-11e7-8c95-005056b70482", APIVersion:"v1", ResourceVersion:"4732574", FieldPath:""}): type: 'Normal' reason: 'Scheduled' Successfully assigned privateimage-test to alien8
I0526 07:42:59.128393       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler
I0526 07:43:01.137046       1 leaderelection.go:204] succesfully renewed lease kube-system/kube-scheduler

Not sure if this helps anybody but here is the kube-scheduler manifest:

apiVersion: v1
kind: Pod
metadata:
  name: kube-scheduler
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-scheduler
    image: quay.io/coreos/hyperkube:v1.6.3_coreos.1
    command:
    - /hyperkube
    - scheduler
    - --master=http://127.0.0.1:8080
    - --leader-elect=true
    - --v=4
    livenessProbe:
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10251
      initialDelaySeconds: 15
      timeoutSeconds: 1

@wojtek-t
Copy link
Member

@pizzarabe - thanks a lot for those logs

Looking into those from dead scheduler, they look extremely strange to me.
It seems that the pod that cause problems was the first pod we were trying to even schedule (since the scheduler started). Also the fact that we are removing a pod from the cache before it was even scheduled seems extremely strange:

I0526 07:42:56.335172       1 factory.go:461] About to try and schedule pod privateimage-test
E0526 07:42:56.335185       1 cache.go:290] Pod default/privateimage-test removed from a different node than previously added to.
I0526 07:42:56.335194       1 scheduler.go:156] Attempting to schedule pod: default/privateimage-test
F0526 07:42:56.335199       1 cache.go:291] Schedulercache is corrupted and can badly affect scheduling decisions

@wojtek-t
Copy link
Member

or maybe it's because of flushing? the logs from cache.go are Error/Fatal so are flushed immediately, the other are Info, so are not flushed immediately. So maybe the order is a red herring?

@pizzarabe
Copy link
Author

pizzarabe commented May 26, 2017

@wojtek-t

It seems that the pod that cause problems was the first pod we were trying to even schedule (since the scheduler started).

Yeah, this is my test cluster, I reproduced the error before that, you can see the scheduler was started ~ 1 min before killing it with the kubectl create -f my.yaml again.

Also the fact that we are removing a pod from the cache before it was even scheduled seems extremely strange

I removed the pod with kubectl delete pod default/privateimage-test before crating it again with kubectl create -f
Edit:
E0526 07:42:56.335185 1 cache.go:290] Pod default/privateimage-test removed from a different node than previously added to.

This was not caused by kubectl delete pod (timestamp does not match the delete command)

@k82cn
Copy link
Member

k82cn commented May 26, 2017

It seems both "assigned non-terminated ListerWatcher" and "un-assigned non-terminated ListerWatcher" got Pod default/privateimage-test, @wojtek-t , do you think apiserver's log will help to see what happens to this Pod?

@tinselspoon
Copy link

I noticed you said your etcd version was v3.0.10 - could you be running into #45506 where the etcd version is too old to work correctly with Kubernetes? I had that issue occur on my cluster which caused all sorts of strange behaviours, as pod deletions were not correctly observed by the Kubernetes master components since they weren't getting the DELETED events.

If it is this issue you should see in your apiserver logs lines like expected type *api.Pod, but watch event object had type <nil>. Solution was to update the version of etcd3.

@bsalamat
Copy link
Member

@pizzarabe Are you saying that you know the reproduction steps? If so, it would help a lot if you cloud give us the steps so that we can investigate more and see if the issue still exists in presence of recent fixes.

@pizzarabe
Copy link
Author

@tinselspoon Yes, my Kube-Api Log says thing like reflector.go:363] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: expected type *api.Pod, but watch event object had type <nil>
I run etcd 3.10 bc it is the current Version in CoreOS stable :/

@bsalamat I can reproduce the scheduler to die (creating a pod with the manifest I mentioned before).
I don't know what I did to cause that bug in the first place. I can reinstall the whole cluster (incl. etcd) but I am not sure if I can get the server in the same state :(

@wojtek-t
Copy link
Member

@pizzarabe - if you are using etcd in 3.0.10 version, this is definitely possible. You should upgrade to at least 3.0.12 (prefferably to 3.0.17).
With 3.0.10 a bunch of different things may not work in your cluster (not only scheduler).

We should wait if this is reproducible also with 3.0.17 etcd version. I suspect it's not, and this is etcd issue that we are aware of (and was fixed in higher versions).

@pizzarabe
Copy link
Author

@wojtek-t upgrading to 3.0.17 or 3.0.12 seems to fix that problem, after downgrading to 3.0.10 the problem occurs again.
So I guess this is a duplicate of #45506. I will close that issue, feel free to re-open it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

No branches or pull requests

9 participants