Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k delete sts simple-cluster-rep0 : the sts will not be re-created #177

Open
ymettier opened this issue Oct 30, 2019 · 0 comments

Comments

@ymettier
Copy link

@ymettier ymettier commented Oct 30, 2019

Hello,

I run a simple cluster. When I delete a statefulset.apps, it will not reappear.

Config

Noticeable parameters in my config :

  replicationFactor: 3
  isolationGroups:
  - name: group1
    numInstances: 2
  - name: group2
    numInstances: 2
  - name: group3
    numInstances: 2

M3-operator is v2.0.0

image: quay.io/m3db/m3db-operator:v0.2.0
imageID: docker-pullable://quay.io/m3db/m3db-operator@sha256:ec8069d4f2ef34c3710470cd71e776e5968c488dcdda78302694f74acea350f4

K8S is 1.15.3

$ k version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

The problem

Before all, it works :

$ k get pods|grep simple-cluster
simple-cluster-rep0-0                                  1/1     Running   0          11m
simple-cluster-rep0-1                                  1/1     Running   0          10m
simple-cluster-rep1-0                                  1/1     Running   0          10m
simple-cluster-rep1-1                                  1/1     Running   0          9m27s
simple-cluster-rep2-0                                  1/1     Running   0          8m51s
simple-cluster-rep2-1                                  1/1     Running   0          8m25s
$ k get sts|grep simple-cluster
simple-cluster-rep0                                  2/2     11m
simple-cluster-rep1                                  2/2     10m
simple-cluster-rep2                                  2/2     9m30s

Then I remove sts simple-cluster-rep2; it works.

k delete sts simple-cluster-rep2

It works means :

  • the pods rep2-0 and rep2-1 will terminate and disappear
  • the sts rep2 will disappear
  • the sts rep2 will re-appear
  • the pods rep2-0 then rep2-1 will be created
  • everything works as before.

Then I remove sts simple-cluster-rep0; it does not do what I expect.

k delete sts simple-cluster-rep0

What happened :

  • the pods rep0-0 and rep0-1 will terminate and disappear (as expected)
  • the sts rep0 will disappear (as expected)
  • the sts rep0 will re-appear (as expected)
  • then: nothing (not as expected)
k logs m3db-operator-0
[...]
{"level":"error","ts":1572448020.1069672,"msg":"statefulsets.apps \"simple-cluster-rep2\" already exists","controller":"m3db-cluster-controller"}
E1030 15:07:00.106997       1 controller.go:289] error syncing cluster 'monitoring/simple-cluster': statefulsets.apps "simple-cluster-rep2" already exists
{"level":"error","ts":1572448068.117082,"msg":"statefulsets.apps \"simple-cluster-rep2\" already exists","controller":"m3db-cluster-controller"}
E1030 15:07:48.117121       1 controller.go:289] error syncing cluster 'monitoring/simple-cluster': statefulsets.apps "simple-cluster-rep2" already exists
{"level":"error","ts":1572448128.126928,"msg":"statefulsets.apps \"simple-cluster-rep2\" already exists","controller":"m3db-cluster-controller"}
E1030 15:08:48.126978       1 controller.go:289] error syncing cluster 'monitoring/simple-cluster': statefulsets.apps "simple-cluster-rep2" already exists
{"level":"error","ts":1572448188.1496735,"msg":"statefulsets.apps \"simple-cluster-rep2\" already exists","controller":"m3db-cluster-controller"}
E1030 15:09:48.149715       1 controller.go:289] error syncing cluster 'monitoring/simple-cluster': statefulsets.apps "simple-cluster-rep2" already exists

Playing a little more...

While sts rep1 and rep2 are up&running, I remove rep1

k delete sts simple-cluster-rep0

Everything works as expected :

  • rep1 (sts and pods) disappear then reappear
  • rep0 will never reappear
k logs m3db-operator-0
[...]
{"level":"info","ts":1572448244.4354398,"msg":"processing statefulset","controller":"m3db-cluster-controller","name":"simple-cluster-rep1"}
{"level":"info","ts":1572448244.5221298,"msg":"created statefulset","controller":"m3db-cluster-controller","name":"simple-cluster-1"}
{"level":"info","ts":1572448244.5221558,"msg":"successfully synced item","controller":"m3db-cluster-controller","key":"monitoring/simple-cluster"}
{"level":"info","ts":1572448244.5223668,"msg":"processing statefulset","controller":"m3db-cluster-controller","name":"simple-cluster-rep1"}
{"level":"info","ts":1572448244.5326107,"msg":"waiting for statefulset to be ready","controller":"m3db-cluster-controller","name":"simple-cluster-rep1","ready":0}
{"level":"info","ts":1572448244.5326319,"msg":"successfully synced item","controller":"m3db-cluster-controller","key":"monitoring/simple-cluster"}
{"level":"info","ts":1572448248.116284,"msg":"waiting for statefulset to be ready","controller":"m3db-cluster-controller","name":"simple-cluster-rep1","ready":0}
{"level":"info","ts":1572448248.1165593,"msg":"successfully synced item","controller":"m3db-cluster-controller","key":"monitoring/simple-cluster"}
{"level":"info","ts":1572448254.924703,"msg":"processing statefulset","controller":"m3db-cluster-controller","name":"simple-cluster-rep1"}
{"level":"info","ts":1572448254.9366693,"msg":"updated pod ID","controller":"m3db-cluster-controller","pod":"simple-cluster-rep1-0","id":{"name":"simple-cluster-rep1-0"}}
{"level":"info","ts":1572448254.9396503,"msg":"Event(v1.ObjectReference{Kind:\"Pod\", Namespace:\"monitoring\", Name:\"simple-cluster-rep1-0\", UID:\"9eb4947f-5a17-459d-9759-705182c9922f\", APIVersion:\"v1\", ResourceVersion:\"15290\", FieldPath:\"\"}): type: 'Normal' reason: 'SuccessfulSync' updated pod simple-cluster-rep1-0 with ID annotation","controller":"m3db-cluster-controller"}
{"level":"info","ts":1572448254.9433367,"msg":"waiting for statefulset to be ready","controller":"m3db-cluster-controller","name":"simple-cluster-rep1","ready":0}
{"level":"info","ts":1572448254.943711,"msg":"successfully synced item","controller":"m3db-cluster-controller","key":"monitoring/simple-cluster"}
{"level":"info","ts":1572448286.937064,"msg":"processing statefulset","controller":"m3db-cluster-controller","name":"simple-cluster-rep1"}
{"level":"info","ts":1572448286.9619033,"msg":"updated pod ID","controller":"m3db-cluster-controller","pod":"simple-cluster-rep1-1","id":{"name":"simple-cluster-rep1-1"}}
{"level":"info","ts":1572448286.9650936,"msg":"Event(v1.ObjectReference{Kind:\"Pod\", Namespace:\"monitoring\", Name:\"simple-cluster-rep1-1\", UID:\"86fdeb26-a6b3-458b-9af1-76efa32a492a\", APIVersion:\"v1\", ResourceVersion:\"15403\", FieldPath:\"\"}): type: 'Normal' reason: 'SuccessfulSync' updated pod simple-cluster-rep1-1 with ID annotation","controller":"m3db-cluster-controller"}
{"level":"info","ts":1572448286.9712403,"msg":"waiting for statefulset to be ready","controller":"m3db-cluster-controller","name":"simple-cluster-rep1","ready":1}
{"level":"info","ts":1572448286.9793472,"msg":"successfully synced item","controller":"m3db-cluster-controller","key":"monitoring/simple-cluster"}
{"level":"info","ts":1572448308.1086965,"msg":"waiting for statefulset to be ready","controller":"m3db-cluster-controller","name":"simple-cluster-rep1","ready":1}
{"level":"info","ts":1572448308.1087277,"msg":"successfully synced item","controller":"m3db-cluster-controller","key":"monitoring/simple-cluster"}
{"level":"info","ts":1572448323.8477747,"msg":"processing statefulset","controller":"m3db-cluster-controller","name":"simple-cluster-rep1"}

Because removing rep1 did not fix it (and because I have no sensitive data yet in my cluster), I remove both rep1 and rep2. At that time, rep0 is still missing.

k delete sts simple-cluster-rep0
k delete sts simple-cluster-rep0

Guess what ? the operator re-created all. Now it's working again. But I have lost my data !

Edit : my data were on the disk and when m3 re-spawned, it found its data and was able to re-use them. No data loss. Only service lost while the cluster was re-spawning. Thanks M3 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.