K8ssandra-operator memory leak after failed MedusaBackupJob #1312

adziura-tcloud · 2024-05-13T07:08:51Z

What happened?
After upgrading K8ssandra-operator from 1.7.0 to 1.16.0 I noticed periodic restarts of k8ssandra-operator due to memory leak.
It happened twice right after failed MedusaBackupJob

spec:
  backupType: full
  cassandraDatacenter: gke-dc1
status:
  finished:
  - example-test-1-gke-dc1-r1-sts-2
  - example-test-1-gke-dc1-r1-sts-3
  - example-test-1-gke-dc1-r1-sts-1
  inProgress:
  - example-test-1-gke-dc1-r1-sts-0
  startTime: "2024-05-12T00:30:15Z"

MedusaBackupJob failed because of POD OOM restart (we are using pretty small instances on test env)

Also MedusaBackupSchedule is not working after failed backup - not creating new MedusaBackupJobs

Did you expect to see something different?
Failed backup should not affect operator and next backups.

How to reproduce it (as minimally and precisely as possible):
I think killing one POD during the backup should work

K8ssandra Operator version: 1.16.0
Kubernetes version information: 1.29
Kubernetes cluster kind: GKE
K8ssandra Operator Logs: I see a lot of MedusaBackupJob is still being processed INFO messages

2024-05-13T07:07:34.916Z	INFO	MedusaBackupJob is still being processed	{"controller": "medusabackupjob", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaBackupJob", "MedusaBackupJob": {"name":"daily-1715473800","namespace":"cassandra"}, "namespace": "cassandra", "name": "daily-1715473800", "reconcileID": "ace193e0-a3b6-4935-80f7-736b7fbae2a0", "medusabackupjob": "cassandra/daily-1715473800", "Backup": {"namespace": "cassandra", "name": "daily-1715473800"}}

The text was updated successfully, but these errors were encountered:

adziura-tcloud added the bug Something isn't working label May 13, 2024

rzvoncek self-assigned this Jun 18, 2024

adejanovski added the in-progress Issues in the state 'in-progress' label Jun 18, 2024

rzvoncek linked a pull request Jun 19, 2024 that will close this issue

Make gRPC service report backups as FAILED if lose their callback future thelastpickle/cassandra-medusa#786

Open

adejanovski added review Issues in the state 'review' and removed in-progress Issues in the state 'in-progress' labels Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8ssandra-operator memory leak after failed MedusaBackupJob #1312

K8ssandra-operator memory leak after failed MedusaBackupJob #1312

adziura-tcloud commented May 13, 2024

K8ssandra-operator memory leak after failed MedusaBackupJob #1312

K8ssandra-operator memory leak after failed MedusaBackupJob #1312

Comments

adziura-tcloud commented May 13, 2024