Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8ssandra-operator memory leak after failed MedusaBackupJob #1312

Open
adziura-tcloud opened this issue May 13, 2024 · 0 comments · May be fixed by thelastpickle/cassandra-medusa#786
Open
Assignees
Labels
bug Something isn't working review Issues in the state 'review'

Comments

@adziura-tcloud
Copy link

What happened?
After upgrading K8ssandra-operator from 1.7.0 to 1.16.0 I noticed periodic restarts of k8ssandra-operator due to memory leak.
It happened twice right after failed MedusaBackupJob

spec:
  backupType: full
  cassandraDatacenter: gke-dc1
status:
  finished:
  - example-test-1-gke-dc1-r1-sts-2
  - example-test-1-gke-dc1-r1-sts-3
  - example-test-1-gke-dc1-r1-sts-1
  inProgress:
  - example-test-1-gke-dc1-r1-sts-0
  startTime: "2024-05-12T00:30:15Z"

MedusaBackupJob failed because of POD OOM restart (we are using pretty small instances on test env)

Also MedusaBackupSchedule is not working after failed backup - not creating new MedusaBackupJobs

Did you expect to see something different?
Failed backup should not affect operator and next backups.

How to reproduce it (as minimally and precisely as possible):
I think killing one POD during the backup should work

  • K8ssandra Operator version: 1.16.0

  • Kubernetes version information: 1.29

  • Kubernetes cluster kind: GKE

  • K8ssandra Operator Logs: I see a lot of MedusaBackupJob is still being processed INFO messages

2024-05-13T07:07:34.916Z	INFO	MedusaBackupJob is still being processed	{"controller": "medusabackupjob", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaBackupJob", "MedusaBackupJob": {"name":"daily-1715473800","namespace":"cassandra"}, "namespace": "cassandra", "name": "daily-1715473800", "reconcileID": "ace193e0-a3b6-4935-80f7-736b7fbae2a0", "medusabackupjob": "cassandra/daily-1715473800", "Backup": {"namespace": "cassandra", "name": "daily-1715473800"}}
@adziura-tcloud adziura-tcloud added the bug Something isn't working label May 13, 2024
@rzvoncek rzvoncek self-assigned this Jun 18, 2024
@adejanovski adejanovski added the in-progress Issues in the state 'in-progress' label Jun 18, 2024
@adejanovski adejanovski added review Issues in the state 'review' and removed in-progress Issues in the state 'in-progress' labels Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working review Issues in the state 'review'
Projects
Status: Review
3 participants