Scylla manager controller deletes and creates tasks even without hitting conflicts #1752

rzetelskik · 2024-02-26T15:09:05Z

What happened?

There's an issue with manager task synchronisation in scylla manager controller. On each reconciliation iteration, the controller goes through the tasks gathered from current manager state, and deletes the ones which are missing from ScyllaCluster status.

scylla-operator/pkg/controller/manager/sync_action.go

Line 273 in c8901da

if _, definedInStatus := s.statusIDNameMapping[id]; !definedInStatus {

What that means in practice is that the tasks can be deleted right after they've been scheduled and saved in the object status. And it doesn't even require the status update to fail, it's enough that the next key in queue is the previous generation of the object without the updated status. This can result in many iterations of task creations and deletions.

Example logs:

2024-02-26T14:50:26.589360595Z I0226 14:50:26.589344       1 manager/sync.go:93] "Started syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-g6xgc-k
v4qg/basic-sh88x" startTime="2024-02-26 14:50:26.589328928 +0000 UTC m=+236.630081103"
2024-02-26T14:50:26.614357881Z I0226 14:50:26.614311       1 manager/sync.go:134] "Executing action" action="add task &{ClusterID: Enabled:true ID: Name:weekl
y backup Properties:map[location:[gcs:so-f2w8df7xf4c556cwbv64v5x8z96dxbf848cd62f5xfx5fxc25427c484d567] retention:1] Schedule:0xc0004d98f0 Tags:[] Type:backup}
"
2024-02-26T14:50:31.461267373Z I0226 14:50:31.461196       1 manager/sync.go:144] "Updating cluster status" new={"observedGeneration":2,"racks":{"us-east-1a":
{"version":"5.4.0","members":1,"readyMembers":1,"updatedMembers":1,"stale":false}},"managerId":"ed7d66c3-beee-4910-bc83-c06a41feebb7","backups":[{"name":"week
ly backup","startDate":"now","interval":"7d","numRetries":0,"location":["gcs:so-f2w8df7xf4c556cwbv64v5x8z96dxbf848cd62f5xfx5fxc25427c484d567"],"retention":1,"
id":"1d81d2e6-5330-434b-afbe-497c7f85b0dc","error":""}],"conditions":[{"type":"ServiceAccountControllerDegraded","status":"False","observedGeneration":2,"last
TransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"ServiceAccountControllerProgressing","status":"False","observedGeneration"
:2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerDegraded","status":"False","observedGeneratio
n":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerProgressing","status":"False","observedGene
ration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerDegraded","status":"False","observedGen
eration":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerProgressing","status":"False","observe
dGeneration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerDegraded","status":"False","obser
vedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerProgressing","status":"False","
observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerAvailable","status":"True"
,"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"ServiceControllerDegraded","status":"False","
observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"ServiceControllerProgressing","status":"False",
"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"PDBControllerDegraded","status":"False","obser
vedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"PDBControllerProgressing","status":"False","observed
Generation":2,"lastTransitionTime":"2024-02-26T14:49:41Z","reason":"AsExpected","message":""},{"type":"IngressControllerDegraded","status":"False","observedGe
neration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"IngressControllerProgressing","status":"False","observedG
eneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerDegraded","status":"False","observedGenerat
ion":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerProgressing","status":"False","observedGeneration
":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Available","status":"True","observedGeneration":2,"lastTransition
Time":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Progressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-2
6T14:50:11Z","reason":"AsExpected","message":""},{"type":"Degraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reaso
n":"AsExpected","message":""}]} old={"observedGeneration":2,"racks":{"us-east-1a":{"version":"5.4.0","members":1,"readyMembers":1,"updatedMembers":1,"stale":f
alse}},"managerId":"ed7d66c3-beee-4910-bc83-c06a41feebb7","conditions":[{"type":"ServiceAccountControllerDegraded","status":"False","observedGeneration":2,"la
stTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"ServiceAccountControllerProgressing","status":"False","observedGeneratio
n":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerDegraded","status":"False","observedGenerat
ion":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerProgressing","status":"False","observedGe
neration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerDegraded","status":"False","observedG
eneration":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerProgressing","status":"False","obser
vedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerDegraded","status":"False","obs
ervedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerProgressing","status":"False"
,"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerAvailable","status":"Tru
e","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"ServiceControllerDegraded","status":"False"
,"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"ServiceControllerProgressing","status":"False
","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"PDBControllerDegraded","status":"False","obs
ervedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"PDBControllerProgressing","status":"False","observ
edGeneration":2,"lastTransitionTime":"2024-02-26T14:49:41Z","reason":"AsExpected","message":""},{"type":"IngressControllerDegraded","status":"False","observed
Generation":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"IngressControllerProgressing","status":"False","observe
dGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerDegraded","status":"False","observedGener
ation":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerProgressing","status":"False","observedGenerati
on":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Available","status":"True","observedGeneration":2,"lastTransiti
onTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Progressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02
-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Degraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","rea
son":"AsExpected","message":""}]}
2024-02-26T14:50:31.483211042Z I0226 14:50:31.483141       1 manager/sync.go:95] "Finished syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-g6xgc-
kv4qg/basic-sh88x" duration="4.893795132s"
2024-02-26T14:50:31.483265030Z I0226 14:50:31.483205       1 manager/sync.go:93] "Started syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-g6xgc-k
v4qg/basic-sh88x" startTime="2024-02-26 14:50:31.48318843 +0000 UTC m=+241.523940608"
2024-02-26T14:50:31.483533961Z I0226 14:50:31.483463       1 manager/controller.go:203] "Enqueuing object" Operation="Update" GVK="scylla.scylladb.com/v1, Kin
d=ScyllaCluster" Ref="e2e-test-scyllacluster-g6xgc-kv4qg/basic-sh88x" UID="1599cb90-be6b-4e3d-aa23-8c1c713dc4be"
2024-02-26T14:50:31.502044070Z I0226 14:50:31.501976       1 manager/sync.go:134] "Executing action" action="delete task \"1d81d2e6-5330-434b-afbe-497c7f85b0d
c\""
2024-02-26T14:50:31.508502912Z I0226 14:50:31.508422       1 manager/sync.go:134] "Executing action" action="add task &{ClusterID: Enabled:true ID: Name:weekl
y backup Properties:map[location:[gcs:so-f2w8df7xf4c556cwbv64v5x8z96dxbf848cd62f5xfx5fxc25427c484d567] retention:1] Schedule:0xc0006bf290 Tags:[] Type:backup}
"
2024-02-26T14:50:34.874670720Z I0226 14:50:34.874489       1 manager/sync.go:144] "Updating cluster status" new={"observedGeneration":2,"racks":{"us-east-1a":
{"version":"5.4.0","members":1,"readyMembers":1,"updatedMembers":1,"stale":false}},"managerId":"ed7d66c3-beee-4910-bc83-c06a41feebb7","backups":[{"name":"week
ly backup","startDate":"now","interval":"7d","numRetries":0,"location":["gcs:so-f2w8df7xf4c556cwbv64v5x8z96dxbf848cd62f5xfx5fxc25427c484d567"],"retention":1,"
id":"3b0080a9-acf0-412c-a9ef-3385cb338ccb","error":""}],"conditions":[{"type":"ServiceAccountControllerDegraded","status":"False","observedGeneration":2,"last
TransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"ServiceAccountControllerProgressing","status":"False","observedGeneration"
:2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerDegraded","status":"False","observedGeneratio
n":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerProgressing","status":"False","observedGene
ration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerDegraded","status":"False","observedGen
eration":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerProgressing","status":"False","observe
dGeneration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerDegraded","status":"False","obser
vedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerProgressing","status":"False","
observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerAvailable","status":"True"
,"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"ServiceControllerDegraded","status":"False","
observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"ServiceControllerProgressing","status":"False",
"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"PDBControllerDegraded","status":"False","obser
vedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"PDBControllerProgressing","status":"False","observed
Generation":2,"lastTransitionTime":"2024-02-26T14:49:41Z","reason":"AsExpected","message":""},{"type":"IngressControllerDegraded","status":"False","observedGe
neration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"IngressControllerProgressing","status":"False","observedG
eneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerDegraded","status":"False","observedGenerat
ion":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerProgressing","status":"False","observedGeneration
":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Available","status":"True","observedGeneration":2,"lastTransition
Time":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Progressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-2
6T14:50:11Z","reason":"AsExpected","message":""},{"type":"Degraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reaso
n":"AsExpected","message":""}]} old={"observedGeneration":2,"racks":{"us-east-1a":{"version":"5.4.0","members":1,"readyMembers":1,"updatedMembers":1,"stale":f
alse}},"managerId":"ed7d66c3-beee-4910-bc83-c06a41feebb7","conditions":[{"type":"ServiceAccountControllerDegraded","status":"False","observedGeneration":2,"la
stTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"ServiceAccountControllerProgressing","status":"False","observedGeneratio
n":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerDegraded","status":"False","observedGenerat
ion":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerProgressing","status":"False","observedGe
neration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerDegraded","status":"False","observedG
eneration":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerProgressing","status":"False","obser
vedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerDegraded","status":"False","obs
ervedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerProgressing","status":"False"
,"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerAvailable","status":"Tru
e","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"ServiceControllerDegraded","status":"False"
,"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"ServiceControllerProgressing","status":"False
","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"PDBControllerDegraded","status":"False","obs
ervedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"PDBControllerProgressing","status":"False","observ
edGeneration":2,"lastTransitionTime":"2024-02-26T14:49:41Z","reason":"AsExpected","message":""},{"type":"IngressControllerDegraded","status":"False","observed
Generation":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"IngressControllerProgressing","status":"False","observe
dGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerDegraded","status":"False","observedGener
ation":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerProgressing","status":"False","observedGenerati
on":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Available","status":"True","observedGeneration":2,"lastTransiti
onTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Progressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02
-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Degraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","rea
son":"AsExpected","message":""}]}
2024-02-26T14:50:34.881386452Z I0226 14:50:34.881332       1 manager/sync.go:95] "Finished syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-g6xgc-
kv4qg/basic-sh88x" duration="3.398127274s"
2024-02-26T14:50:34.881386452Z I0226 14:50:34.881366       1 manager/controller.go:148] "Hit conflict, will retry in a bit" Key="e2e-test-scyllacluster-g6xgc-
kv4qg/basic-sh88x" Error="Operation cannot be fulfilled on scyllaclusters.scylla.scylladb.com \"basic-sh88x\": the object has been modified; please apply your
 changes to the latest version and try again"
2024-02-26T14:50:34.881428335Z I0226 14:50:34.881416       1 manager/sync.go:93] "Started syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-g6xgc-k
v4qg/basic-sh88x" startTime="2024-02-26 14:50:34.881397513 +0000 UTC m=+244.922149689"
2024-02-26T14:50:34.901923584Z I0226 14:50:34.901842       1 manager/sync.go:134] "Executing action" action="delete task \"3b0080a9-acf0-412c-a9ef-3385cb338cc
b\""
2024-02-26T14:50:34.908518016Z I0226 14:50:34.908436       1 manager/sync.go:134] "Executing action" action="add task &{ClusterID: Enabled:true ID: Name:weekl
y backup Properties:map[location:[gcs:so-f2w8df7xf4c556cwbv64v5x8z96dxbf848cd62f5xfx5fxc25427c484d567] retention:1] Schedule:0xc00035ab70 Tags:[] Type:backup}
"
2024-02-26T14:50:49.915788733Z E0226 14:50:49.915663       1 manager/sync.go:137] "Failed to execute action" err="Post \"http://scylla-manager/api/v1/cluster/
ed7d66c3-beee-4910-bc83-c06a41feebb7/tasks\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" action="add task &{ClusterID: Enable
d:true ID: Name:weekly backup Properties:map[location:[gcs:so-f2w8df7xf4c556cwbv64v5x8z96dxbf848cd62f5xfx5fxc25427c484d567] retention:1] Schedule:0xc00035ab70
 Tags:[] Type:backup}"
2024-02-26T14:50:49.916091671Z I0226 14:50:49.915830       1 manager/sync.go:144] "Updating cluster status" new={"observedGeneration":2,"racks":{"us-east-1a":
{"version":"5.4.0","members":1,"readyMembers":1,"updatedMembers":1,"stale":false}},"managerId":"ed7d66c3-beee-4910-bc83-c06a41feebb7","backups":[{"name":"week
ly backup","startDate":"now","interval":"7d","numRetries":0,"location":["gcs:so-f2w8df7xf4c556cwbv64v5x8z96dxbf848cd62f5xfx5fxc25427c484d567"],"retention":1,"
id":"00000000-0000-0000-0000-000000000000","error":"Post \"http://scylla-manager/api/v1/cluster/ed7d66c3-beee-4910-bc83-c06a41feebb7/tasks\": context deadline
 exceeded (Client.Timeout exceeded while awaiting headers)"}],"conditions":[{"type":"ServiceAccountControllerDegraded","status":"False","observedGeneration":2
,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"ServiceAccountControllerProgressing","status":"False","observedGener
ation":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerDegraded","status":"False","observedGen
eration":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerProgressing","status":"False","observ
edGeneration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerDegraded","status":"False","obser
vedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerProgressing","status":"False","o
bservedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerDegraded","status":"False",
"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerProgressing","status":"Fa
lse","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerAvailable","status":
"True","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"ServiceControllerDegraded","status":"Fa
lse","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"ServiceControllerProgressing","status":"F
alse","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"PDBControllerDegraded","status":"False",
"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"PDBControllerProgressing","status":"False","ob
servedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:41Z","reason":"AsExpected","message":""},{"type":"IngressControllerDegraded","status":"False","obse
rvedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"IngressControllerProgressing","status":"False","obs
ervedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerDegraded","status":"False","observedG
eneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerProgressing","status":"False","observedGene
ration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Available","status":"True","observedGeneration":2,"lastTran
sitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Progressing","status":"False","observedGeneration":2,"lastTransitionTime":"202
4-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Degraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z",
"reason":"AsExpected","message":""}]} old={"observedGeneration":2,"racks":{"us-east-1a":{"version":"5.4.0","members":1,"readyMembers":1,"updatedMembers":1,"st
ale":false}},"managerId":"ed7d66c3-beee-4910-bc83-c06a41feebb7","backups":[{"name":"weekly backup","startDate":"now","interval":"7d","numRetries":0,"location"
:["gcs:so-f2w8df7xf4c556cwbv64v5x8z96dxbf848cd62f5xfx5fxc25427c484d567"],"retention":1,"id":"1d81d2e6-5330-434b-afbe-497c7f85b0dc","error":""}],"conditions":[
{"type":"ServiceAccountControllerDegraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message"
:""},{"type":"ServiceAccountControllerProgressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","
message":""},{"type":"RoleBindingControllerDegraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected"
,"message":""},{"type":"RoleBindingControllerProgressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpe
cted","message":""},{"type":"AgentTokenControllerDegraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExp
ected","message":""},{"type":"AgentTokenControllerProgressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"A
sExpected","message":""},{"type":"StatefulSetControllerDegraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":
"AsExpected","message":""},{"type":"StatefulSetControllerProgressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","rea
son":"AsExpected","message":""},{"type":"StatefulSetControllerAvailable","status":"True","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","r
eason":"AsExpected","message":""},{"type":"ServiceControllerDegraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","rea
son":"AsExpected","message":""},{"type":"ServiceControllerProgressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","re
ason":"AsExpected","message":""},{"type":"PDBControllerDegraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":
"AsExpected","message":""},{"type":"PDBControllerProgressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:41Z","reason":"As
Expected","message":""},{"type":"IngressControllerDegraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsEx
pected","message":""},{"type":"IngressControllerProgressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsE
xpected","message":""},{"type":"JobControllerDegraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpecte
d","message":""},{"type":"JobControllerProgressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected",
"message":""},{"type":"Available","status":"True","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"typ
e":"Progressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Degraded","s
tatus":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""}]}
2024-02-26T14:50:49.927692242Z I0226 14:50:49.926441       1 manager/sync.go:95] "Finished syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-g6xgc-
kv4qg/basic-sh88x" duration="15.045027419s"

What did you expect to happen?

The manager tasks should not be deleted after they've been created, at least not when the status update succeeded.

How can we reproduce it (as minimally and precisely as possible)?

Deploy the operator and manager
Create a ScyllaCluster, fill it with some data
Schedule a backup

Scylla Operator version

master

Kubernetes platform name and version

n/a

Please attach the must-gather archive.

scylla-operator-must-gather-rc5jh7tf6fdn.tar.gz

Anything else we need to know?

While this isn't particularly dangerous, it's quite an annoyance for scripting. As a user, I'd expect that I can wait for the status to be updated with manager task ID to consider it scheduled. Unfortunately, due to this behaviour, the status of the particular task can be overwritten many times, and the previously reported tasks deleted.

rzetelskik · 2024-02-26T15:09:43Z

Most likely related or a duplicate of #1729

rzetelskik · 2024-02-26T15:20:34Z

@scylladb/sig-operator we should discuss how to fix the logic behind task reconciliation. I'm wondering whether there's any reason we don't adopt manager tasks based on the matching names, instead of the IDs?

rzetelskik · 2024-02-28T12:53:31Z

An additional risk is losing the task history, including e.g. its retention,

rzetelskik · 2024-02-28T12:54:19Z

Since this is a prerequisite for #1671, I'm raising the priority and moving it to the current sprint.

/priority critical-urgent

vponomaryov · 2024-03-06T13:41:56Z

This bug was reproduced in QA weekly test run here: https://jenkins.scylladb.com/job/scylla-operator/job/operator-master/job/functional/job/functional-eks-test/5/
1.12 is affected and should be target for fixing it before it is released.

tnozicka · 2024-03-06T14:02:21Z

Thanks @vponomaryov for reporting it. To the best of our knowledge this is a conceptual issue that has always been there and therefore is not a regression. As mentioned above the impact is small and eventually recovers on its own. Please put a sleep or something into the suite you have to not fail. We are tracking it here to fix it.

rzetelskik added kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 26, 2024

tnozicka added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Feb 26, 2024

scylla-operator-bot bot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Feb 26, 2024

tnozicka assigned rzetelskik Feb 26, 2024

scylla-operator-bot bot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Feb 28, 2024

rzetelskik mentioned this issue Mar 7, 2024

Extend Scylla Manager E2E tests with backup and manual restore procedure #1742

Merged

rzetelskik removed the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Mar 19, 2024

This was referenced Mar 19, 2024

Fix manager task reconciliation and status synchronisation #1850

Merged

Missing mapstructure tag in BackupTaskStatus #1852

Closed

Support cron and timezone in manager task integration #1851

Merged

rzetelskik mentioned this issue Apr 15, 2024

Make manager tasks' interval and start date optional and non-defaulted #1868

Merged

tnozicka mentioned this issue Apr 16, 2024

Manager integration stability and correctness [Part 1] #1897

Closed

This was referenced Apr 19, 2024

[Flake] Scylla Manager integration [It] should register cluster and sync repair tasks #1694

Closed

Manager controller recreates clusters when manager cluster ID is missing from status #1902

Closed

rzetelskik mentioned this issue Apr 30, 2024

Extend Scylla Manager integration e2e coverage with error propagation testing #1907

Closed

scylla-operator-bot bot closed this as completed in #1850 May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scylla manager controller deletes and creates tasks even without hitting conflicts #1752

Scylla manager controller deletes and creates tasks even without hitting conflicts #1752

rzetelskik commented Feb 26, 2024 •

edited

Loading

rzetelskik commented Feb 26, 2024

rzetelskik commented Feb 26, 2024

rzetelskik commented Feb 28, 2024

rzetelskik commented Feb 28, 2024

vponomaryov commented Mar 6, 2024

tnozicka commented Mar 6, 2024

Scylla manager controller deletes and creates tasks even without hitting conflicts #1752

Scylla manager controller deletes and creates tasks even without hitting conflicts #1752

Comments

rzetelskik commented Feb 26, 2024 • edited Loading

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Scylla Operator version

Kubernetes platform name and version

Please attach the must-gather archive.

Anything else we need to know?

rzetelskik commented Feb 26, 2024

rzetelskik commented Feb 26, 2024

rzetelskik commented Feb 28, 2024

rzetelskik commented Feb 28, 2024

vponomaryov commented Mar 6, 2024

tnozicka commented Mar 6, 2024

rzetelskik commented Feb 26, 2024 •

edited

Loading