BUG REPORT
I'm running ElasticSearch in a statefulset and in the range of 36 hours all 3 pods except for the first one (-0) have disappeared. The first one is in a CrashLoopBackoff state.
I would expect once a statefulset has correctly started the separate pods are no longer dependent on the first one running correctly.
Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e
6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:57:25Z", GoVersion:"go1.7.4
", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0
950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:52:01Z", GoVersion:"go1.7.4
", Compiler:"gc", Platform:"linux/amd64"}
Environment:
- GKE GCI image, version 1.5.2:
What happened:
All statefulset replicas disappeared, except for the first on which was in CrashLoopBackoff state.
What you expected to happen:
The second and third node to stay operational even if the first node fails.
How to reproduce it (as minimally and precisely as possible):
Haven't been able to do so, but it has happened a couple of times in a week.
Anything else we need to know:
The statefulset runs in a GKE cluster on top of preemptible nodes. To avoid the preemptibles from mass expiring I'm stopping and deleting them randomly before the 24 hours are up. This should spread deletion and making it less likely that more than one host is deleted at a time.
There's also
- a poddisruptionbudget for the sfs with minAvailable at n-1
- a headless service
- a regular clusterip service for access to the nodes
- a service across multiple statefulsets to prevent elasticsearch master, data and client nodes to live on the same hosts.
BUG REPORT
I'm running ElasticSearch in a statefulset and in the range of 36 hours all 3 pods except for the first one (-0) have disappeared. The first one is in a CrashLoopBackoff state.
I would expect once a statefulset has correctly started the separate pods are no longer dependent on the first one running correctly.
Kubernetes version (use
kubectl version):Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e
6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:57:25Z", GoVersion:"go1.7.4
", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0
950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:52:01Z", GoVersion:"go1.7.4
", Compiler:"gc", Platform:"linux/amd64"}
Environment:
What happened:
All statefulset replicas disappeared, except for the first on which was in CrashLoopBackoff state.
What you expected to happen:
The second and third node to stay operational even if the first node fails.
How to reproduce it (as minimally and precisely as possible):
Haven't been able to do so, but it has happened a couple of times in a week.
Anything else we need to know:
The statefulset runs in a GKE cluster on top of preemptible nodes. To avoid the preemptibles from mass expiring I'm stopping and deleting them randomly before the 24 hours are up. This should spread deletion and making it less likely that more than one host is deleted at a time.
There's also