Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scylla manager controller deletes and creates tasks even without hitting conflicts #1752

Closed
Tracked by #1897
rzetelskik opened this issue Feb 26, 2024 · 6 comments · Fixed by #1850
Closed
Tracked by #1897
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@rzetelskik
Copy link
Member

rzetelskik commented Feb 26, 2024

What happened?

There's an issue with manager task synchronisation in scylla manager controller. On each reconciliation iteration, the controller goes through the tasks gathered from current manager state, and deletes the ones which are missing from ScyllaCluster status.

if _, definedInStatus := s.statusIDNameMapping[id]; !definedInStatus {

What that means in practice is that the tasks can be deleted right after they've been scheduled and saved in the object status. And it doesn't even require the status update to fail, it's enough that the next key in queue is the previous generation of the object without the updated status. This can result in many iterations of task creations and deletions.

Example logs:

2024-02-26T14:50:26.589360595Z I0226 14:50:26.589344       1 manager/sync.go:93] "Started syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-g6xgc-k
v4qg/basic-sh88x" startTime="2024-02-26 14:50:26.589328928 +0000 UTC m=+236.630081103"
2024-02-26T14:50:26.614357881Z I0226 14:50:26.614311       1 manager/sync.go:134] "Executing action" action="add task &{ClusterID: Enabled:true ID: Name:weekl
y backup Properties:map[location:[gcs:so-f2w8df7xf4c556cwbv64v5x8z96dxbf848cd62f5xfx5fxc25427c484d567] retention:1] Schedule:0xc0004d98f0 Tags:[] Type:backup}
"
2024-02-26T14:50:31.461267373Z I0226 14:50:31.461196       1 manager/sync.go:144] "Updating cluster status" new={"observedGeneration":2,"racks":{"us-east-1a":
{"version":"5.4.0","members":1,"readyMembers":1,"updatedMembers":1,"stale":false}},"managerId":"ed7d66c3-beee-4910-bc83-c06a41feebb7","backups":[{"name":"week
ly backup","startDate":"now","interval":"7d","numRetries":0,"location":["gcs:so-f2w8df7xf4c556cwbv64v5x8z96dxbf848cd62f5xfx5fxc25427c484d567"],"retention":1,"
id":"1d81d2e6-5330-434b-afbe-497c7f85b0dc","error":""}],"conditions":[{"type":"ServiceAccountControllerDegraded","status":"False","observedGeneration":2,"last
TransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"ServiceAccountControllerProgressing","status":"False","observedGeneration"
:2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerDegraded","status":"False","observedGeneratio
n":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerProgressing","status":"False","observedGene
ration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerDegraded","status":"False","observedGen
eration":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerProgressing","status":"False","observe
dGeneration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerDegraded","status":"False","obser
vedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerProgressing","status":"False","
observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerAvailable","status":"True"
,"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"ServiceControllerDegraded","status":"False","
observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"ServiceControllerProgressing","status":"False",
"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"PDBControllerDegraded","status":"False","obser
vedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"PDBControllerProgressing","status":"False","observed
Generation":2,"lastTransitionTime":"2024-02-26T14:49:41Z","reason":"AsExpected","message":""},{"type":"IngressControllerDegraded","status":"False","observedGe
neration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"IngressControllerProgressing","status":"False","observedG
eneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerDegraded","status":"False","observedGenerat
ion":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerProgressing","status":"False","observedGeneration
":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Available","status":"True","observedGeneration":2,"lastTransition
Time":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Progressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-2
6T14:50:11Z","reason":"AsExpected","message":""},{"type":"Degraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reaso
n":"AsExpected","message":""}]} old={"observedGeneration":2,"racks":{"us-east-1a":{"version":"5.4.0","members":1,"readyMembers":1,"updatedMembers":1,"stale":f
alse}},"managerId":"ed7d66c3-beee-4910-bc83-c06a41feebb7","conditions":[{"type":"ServiceAccountControllerDegraded","status":"False","observedGeneration":2,"la
stTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"ServiceAccountControllerProgressing","status":"False","observedGeneratio
n":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerDegraded","status":"False","observedGenerat
ion":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerProgressing","status":"False","observedGe
neration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerDegraded","status":"False","observedG
eneration":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerProgressing","status":"False","obser
vedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerDegraded","status":"False","obs
ervedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerProgressing","status":"False"
,"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerAvailable","status":"Tru
e","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"ServiceControllerDegraded","status":"False"
,"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"ServiceControllerProgressing","status":"False
","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"PDBControllerDegraded","status":"False","obs
ervedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"PDBControllerProgressing","status":"False","observ
edGeneration":2,"lastTransitionTime":"2024-02-26T14:49:41Z","reason":"AsExpected","message":""},{"type":"IngressControllerDegraded","status":"False","observed
Generation":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"IngressControllerProgressing","status":"False","observe
dGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerDegraded","status":"False","observedGener
ation":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerProgressing","status":"False","observedGenerati
on":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Available","status":"True","observedGeneration":2,"lastTransiti
onTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Progressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02
-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Degraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","rea
son":"AsExpected","message":""}]}
2024-02-26T14:50:31.483211042Z I0226 14:50:31.483141       1 manager/sync.go:95] "Finished syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-g6xgc-
kv4qg/basic-sh88x" duration="4.893795132s"
2024-02-26T14:50:31.483265030Z I0226 14:50:31.483205       1 manager/sync.go:93] "Started syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-g6xgc-k
v4qg/basic-sh88x" startTime="2024-02-26 14:50:31.48318843 +0000 UTC m=+241.523940608"
2024-02-26T14:50:31.483533961Z I0226 14:50:31.483463       1 manager/controller.go:203] "Enqueuing object" Operation="Update" GVK="scylla.scylladb.com/v1, Kin
d=ScyllaCluster" Ref="e2e-test-scyllacluster-g6xgc-kv4qg/basic-sh88x" UID="1599cb90-be6b-4e3d-aa23-8c1c713dc4be"
2024-02-26T14:50:31.502044070Z I0226 14:50:31.501976       1 manager/sync.go:134] "Executing action" action="delete task \"1d81d2e6-5330-434b-afbe-497c7f85b0d
c\""
2024-02-26T14:50:31.508502912Z I0226 14:50:31.508422       1 manager/sync.go:134] "Executing action" action="add task &{ClusterID: Enabled:true ID: Name:weekl
y backup Properties:map[location:[gcs:so-f2w8df7xf4c556cwbv64v5x8z96dxbf848cd62f5xfx5fxc25427c484d567] retention:1] Schedule:0xc0006bf290 Tags:[] Type:backup}
"
2024-02-26T14:50:34.874670720Z I0226 14:50:34.874489       1 manager/sync.go:144] "Updating cluster status" new={"observedGeneration":2,"racks":{"us-east-1a":
{"version":"5.4.0","members":1,"readyMembers":1,"updatedMembers":1,"stale":false}},"managerId":"ed7d66c3-beee-4910-bc83-c06a41feebb7","backups":[{"name":"week
ly backup","startDate":"now","interval":"7d","numRetries":0,"location":["gcs:so-f2w8df7xf4c556cwbv64v5x8z96dxbf848cd62f5xfx5fxc25427c484d567"],"retention":1,"
id":"3b0080a9-acf0-412c-a9ef-3385cb338ccb","error":""}],"conditions":[{"type":"ServiceAccountControllerDegraded","status":"False","observedGeneration":2,"last
TransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"ServiceAccountControllerProgressing","status":"False","observedGeneration"
:2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerDegraded","status":"False","observedGeneratio
n":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerProgressing","status":"False","observedGene
ration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerDegraded","status":"False","observedGen
eration":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerProgressing","status":"False","observe
dGeneration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerDegraded","status":"False","obser
vedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerProgressing","status":"False","
observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerAvailable","status":"True"
,"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"ServiceControllerDegraded","status":"False","
observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"ServiceControllerProgressing","status":"False",
"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"PDBControllerDegraded","status":"False","obser
vedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"PDBControllerProgressing","status":"False","observed
Generation":2,"lastTransitionTime":"2024-02-26T14:49:41Z","reason":"AsExpected","message":""},{"type":"IngressControllerDegraded","status":"False","observedGe
neration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"IngressControllerProgressing","status":"False","observedG
eneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerDegraded","status":"False","observedGenerat
ion":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerProgressing","status":"False","observedGeneration
":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Available","status":"True","observedGeneration":2,"lastTransition
Time":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Progressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-2
6T14:50:11Z","reason":"AsExpected","message":""},{"type":"Degraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reaso
n":"AsExpected","message":""}]} old={"observedGeneration":2,"racks":{"us-east-1a":{"version":"5.4.0","members":1,"readyMembers":1,"updatedMembers":1,"stale":f
alse}},"managerId":"ed7d66c3-beee-4910-bc83-c06a41feebb7","conditions":[{"type":"ServiceAccountControllerDegraded","status":"False","observedGeneration":2,"la
stTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"ServiceAccountControllerProgressing","status":"False","observedGeneratio
n":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerDegraded","status":"False","observedGenerat
ion":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerProgressing","status":"False","observedGe
neration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerDegraded","status":"False","observedG
eneration":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerProgressing","status":"False","obser
vedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerDegraded","status":"False","obs
ervedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerProgressing","status":"False"
,"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerAvailable","status":"Tru
e","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"ServiceControllerDegraded","status":"False"
,"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"ServiceControllerProgressing","status":"False
","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"PDBControllerDegraded","status":"False","obs
ervedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"PDBControllerProgressing","status":"False","observ
edGeneration":2,"lastTransitionTime":"2024-02-26T14:49:41Z","reason":"AsExpected","message":""},{"type":"IngressControllerDegraded","status":"False","observed
Generation":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"IngressControllerProgressing","status":"False","observe
dGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerDegraded","status":"False","observedGener
ation":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerProgressing","status":"False","observedGenerati
on":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Available","status":"True","observedGeneration":2,"lastTransiti
onTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Progressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02
-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Degraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","rea
son":"AsExpected","message":""}]}
2024-02-26T14:50:34.881386452Z I0226 14:50:34.881332       1 manager/sync.go:95] "Finished syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-g6xgc-
kv4qg/basic-sh88x" duration="3.398127274s"
2024-02-26T14:50:34.881386452Z I0226 14:50:34.881366       1 manager/controller.go:148] "Hit conflict, will retry in a bit" Key="e2e-test-scyllacluster-g6xgc-
kv4qg/basic-sh88x" Error="Operation cannot be fulfilled on scyllaclusters.scylla.scylladb.com \"basic-sh88x\": the object has been modified; please apply your
 changes to the latest version and try again"
2024-02-26T14:50:34.881428335Z I0226 14:50:34.881416       1 manager/sync.go:93] "Started syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-g6xgc-k
v4qg/basic-sh88x" startTime="2024-02-26 14:50:34.881397513 +0000 UTC m=+244.922149689"
2024-02-26T14:50:34.901923584Z I0226 14:50:34.901842       1 manager/sync.go:134] "Executing action" action="delete task \"3b0080a9-acf0-412c-a9ef-3385cb338cc
b\""
2024-02-26T14:50:34.908518016Z I0226 14:50:34.908436       1 manager/sync.go:134] "Executing action" action="add task &{ClusterID: Enabled:true ID: Name:weekl
y backup Properties:map[location:[gcs:so-f2w8df7xf4c556cwbv64v5x8z96dxbf848cd62f5xfx5fxc25427c484d567] retention:1] Schedule:0xc00035ab70 Tags:[] Type:backup}
"
2024-02-26T14:50:49.915788733Z E0226 14:50:49.915663       1 manager/sync.go:137] "Failed to execute action" err="Post \"http://scylla-manager/api/v1/cluster/
ed7d66c3-beee-4910-bc83-c06a41feebb7/tasks\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" action="add task &{ClusterID: Enable
d:true ID: Name:weekly backup Properties:map[location:[gcs:so-f2w8df7xf4c556cwbv64v5x8z96dxbf848cd62f5xfx5fxc25427c484d567] retention:1] Schedule:0xc00035ab70
 Tags:[] Type:backup}"
2024-02-26T14:50:49.916091671Z I0226 14:50:49.915830       1 manager/sync.go:144] "Updating cluster status" new={"observedGeneration":2,"racks":{"us-east-1a":
{"version":"5.4.0","members":1,"readyMembers":1,"updatedMembers":1,"stale":false}},"managerId":"ed7d66c3-beee-4910-bc83-c06a41feebb7","backups":[{"name":"week
ly backup","startDate":"now","interval":"7d","numRetries":0,"location":["gcs:so-f2w8df7xf4c556cwbv64v5x8z96dxbf848cd62f5xfx5fxc25427c484d567"],"retention":1,"
id":"00000000-0000-0000-0000-000000000000","error":"Post \"http://scylla-manager/api/v1/cluster/ed7d66c3-beee-4910-bc83-c06a41feebb7/tasks\": context deadline
 exceeded (Client.Timeout exceeded while awaiting headers)"}],"conditions":[{"type":"ServiceAccountControllerDegraded","status":"False","observedGeneration":2
,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"ServiceAccountControllerProgressing","status":"False","observedGener
ation":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerDegraded","status":"False","observedGen
eration":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"RoleBindingControllerProgressing","status":"False","observ
edGeneration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerDegraded","status":"False","obser
vedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message":""},{"type":"AgentTokenControllerProgressing","status":"False","o
bservedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerDegraded","status":"False",
"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerProgressing","status":"Fa
lse","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"StatefulSetControllerAvailable","status":
"True","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"ServiceControllerDegraded","status":"Fa
lse","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"ServiceControllerProgressing","status":"F
alse","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"PDBControllerDegraded","status":"False",
"observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"PDBControllerProgressing","status":"False","ob
servedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:41Z","reason":"AsExpected","message":""},{"type":"IngressControllerDegraded","status":"False","obse
rvedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"IngressControllerProgressing","status":"False","obs
ervedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerDegraded","status":"False","observedG
eneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""},{"type":"JobControllerProgressing","status":"False","observedGene
ration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Available","status":"True","observedGeneration":2,"lastTran
sitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Progressing","status":"False","observedGeneration":2,"lastTransitionTime":"202
4-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Degraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z",
"reason":"AsExpected","message":""}]} old={"observedGeneration":2,"racks":{"us-east-1a":{"version":"5.4.0","members":1,"readyMembers":1,"updatedMembers":1,"st
ale":false}},"managerId":"ed7d66c3-beee-4910-bc83-c06a41feebb7","backups":[{"name":"weekly backup","startDate":"now","interval":"7d","numRetries":0,"location"
:["gcs:so-f2w8df7xf4c556cwbv64v5x8z96dxbf848cd62f5xfx5fxc25427c484d567"],"retention":1,"id":"1d81d2e6-5330-434b-afbe-497c7f85b0dc","error":""}],"conditions":[
{"type":"ServiceAccountControllerDegraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected","message"
:""},{"type":"ServiceAccountControllerProgressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpected","
message":""},{"type":"RoleBindingControllerDegraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExpected"
,"message":""},{"type":"RoleBindingControllerProgressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"AsExpe
cted","message":""},{"type":"AgentTokenControllerDegraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:11Z","reason":"AsExp
ected","message":""},{"type":"AgentTokenControllerProgressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:31Z","reason":"A
sExpected","message":""},{"type":"StatefulSetControllerDegraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":
"AsExpected","message":""},{"type":"StatefulSetControllerProgressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","rea
son":"AsExpected","message":""},{"type":"StatefulSetControllerAvailable","status":"True","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","r
eason":"AsExpected","message":""},{"type":"ServiceControllerDegraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","rea
son":"AsExpected","message":""},{"type":"ServiceControllerProgressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","re
ason":"AsExpected","message":""},{"type":"PDBControllerDegraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":
"AsExpected","message":""},{"type":"PDBControllerProgressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:41Z","reason":"As
Expected","message":""},{"type":"IngressControllerDegraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsEx
pected","message":""},{"type":"IngressControllerProgressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsE
xpected","message":""},{"type":"JobControllerDegraded","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpecte
d","message":""},{"type":"JobControllerProgressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected",
"message":""},{"type":"Available","status":"True","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"typ
e":"Progressing","status":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:50:11Z","reason":"AsExpected","message":""},{"type":"Degraded","s
tatus":"False","observedGeneration":2,"lastTransitionTime":"2024-02-26T14:49:21Z","reason":"AsExpected","message":""}]}
2024-02-26T14:50:49.927692242Z I0226 14:50:49.926441       1 manager/sync.go:95] "Finished syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-g6xgc-
kv4qg/basic-sh88x" duration="15.045027419s" 

What did you expect to happen?

The manager tasks should not be deleted after they've been created, at least not when the status update succeeded.

How can we reproduce it (as minimally and precisely as possible)?

  • Deploy the operator and manager
  • Create a ScyllaCluster, fill it with some data
  • Schedule a backup

Scylla Operator version

master

Kubernetes platform name and version

n/a

Please attach the must-gather archive.

scylla-operator-must-gather-rc5jh7tf6fdn.tar.gz

Anything else we need to know?

While this isn't particularly dangerous, it's quite an annoyance for scripting. As a user, I'd expect that I can wait for the status to be updated with manager task ID to consider it scheduled. Unfortunately, due to this behaviour, the status of the particular task can be overwritten many times, and the previously reported tasks deleted.

@rzetelskik rzetelskik added kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 26, 2024
@rzetelskik
Copy link
Member Author

Most likely related or a duplicate of #1729

@rzetelskik
Copy link
Member Author

@scylladb/sig-operator we should discuss how to fix the logic behind task reconciliation. I'm wondering whether there's any reason we don't adopt manager tasks based on the matching names, instead of the IDs?

@tnozicka tnozicka added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Feb 26, 2024
@scylla-operator-bot scylla-operator-bot bot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Feb 26, 2024
@rzetelskik
Copy link
Member Author

An additional risk is losing the task history, including e.g. its retention,

@rzetelskik
Copy link
Member Author

Since this is a prerequisite for #1671, I'm raising the priority and moving it to the current sprint.

/priority critical-urgent

@scylla-operator-bot scylla-operator-bot bot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Feb 28, 2024
@vponomaryov
Copy link
Contributor

This bug was reproduced in QA weekly test run here: https://jenkins.scylladb.com/job/scylla-operator/job/operator-master/job/functional/job/functional-eks-test/5/
1.12 is affected and should be target for fixing it before it is released.

@tnozicka
Copy link
Member

tnozicka commented Mar 6, 2024

Thanks @vponomaryov for reporting it. To the best of our knowledge this is a conceptual issue that has always been there and therefore is not a regression. As mentioned above the impact is small and eventually recovers on its own. Please put a sleep or something into the suite you have to not fail. We are tracking it here to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants