Skip to content

fix(k8s): delete orphaned StatefulSets before recreating to avoid PVC mismatch#21786

Merged
jujubot merged 3 commits into
juju:3.6from
marceloneppel:3.6-fix-orphaned-statefulset-reuse
Feb 18, 2026
Merged

fix(k8s): delete orphaned StatefulSets before recreating to avoid PVC mismatch#21786
jujubot merged 3 commits into
juju:3.6from
marceloneppel:3.6-fix-orphaned-statefulset-reuse

Conversation

@marceloneppel
Copy link
Copy Markdown
Contributor

@marceloneppel marceloneppel commented Feb 13, 2026

When an application is force-removed with --force --no-wait, the StatefulSet may not be fully cleaned up. On redeployment, Juju reuses the orphaned StatefulSet via the update path, but Kubernetes does not allow modifying volumeClaimTemplates on an existing StatefulSet. Since PR #20795 changed the StorageUniqueID source of truth from the StatefulSet annotation to the application document, a new unique ID is generated on redeploy, causing a mismatch between the volume mount names and the existing PVC names. This prevents pods from starting.

The fix detects orphaned StatefulSets by comparing the app.juju.is/uuid annotation against the expected StorageUniqueID. When a mismatch is found and the StatefulSet is confirmed to be owned by Juju (via app.kubernetes.io/managed-by label and model.juju.is/id annotation), it is deleted and recreated with the correct volumeClaimTemplates.

Checklist

  • Code style: imports ordered, good names, simple structure, etc
  • Comments saying why design decisions were made
  • Go unit tests, with comments saying what you're testing
  • Integration tests
  • doc.go added or updated in changed packages

QA steps

  1. Bootstrap a k8s controller: juju bootstrap microk8s micro
  2. Add a model: juju add-model test
  3. Deploy: juju deploy postgresql-k8s --trust --channel 16/edge --revision 726
  4. Wait for active, then force-remove: juju remove-application postgresql-k8s --destroy-storage --force --no-prompt --no-wait
  5. Verify orphaned StatefulSet exists: kubectl get statefulset -n test
  6. Redeploy: juju deploy postgresql-k8s --trust --channel 16/edge --revision 726
  7. Without the fix, the app is stuck at "agent initialising". With the fix, the controller logs deleting orphaned statefulset and the app reaches active.

Documentation changes

No user-facing workflow changes. The fix is transparent — orphaned StatefulSets are automatically cleaned up during redeployment.

Links

Issue: Fixes #21722.

Jira card: JUJU-9161

… mismatch

When a Juju-managed StatefulSet has a different storage unique ID than
expected (e.g. leftover from a force-removed deployment), delete and
recreate it instead of attempting an in-place update. Kubernetes does
not allow modifying volumeClaimTemplates on existing StatefulSets,
which causes PVC name mismatches.

Refs: juju#21722

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Copy link
Copy Markdown
Member

@wallyworld wallyworld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting the patch, much appreciated.

I've suggested a couple of tweaks.

Comment thread internal/provider/kubernetes/application/application.go Outdated
Comment thread internal/provider/kubernetes/application/application_test.go Outdated
Comment thread internal/provider/kubernetes/application/application.go Outdated
Comment thread internal/provider/kubernetes/application/application.go Outdated
Replace the 3-second polling loop in waitForStatefulSetDeletion with an
event-driven Kubernetes informer watcher. Extract orphan detection logic
from Ensure into a new shouldDeleteExistingStatefulSet method, and add a
watchStatefulSet helper for creating the underlying notify watcher.
Update tests to use a mock watcher instead of clock advancement, and
standardize context usage across production and test code.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@marceloneppel
Copy link
Copy Markdown
Contributor Author

Thanks for the review, @wallyworld! I've addressed all your comments in the latest push. Appreciate the suggestions!

Copy link
Copy Markdown
Member

@wallyworld wallyworld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for implementing this

Comment thread internal/provider/kubernetes/application/application.go Outdated
Comment thread internal/provider/kubernetes/application/application.go Outdated
Comment thread internal/provider/kubernetes/application/application.go Outdated
Skip the waitForStatefulSetDeletion call when the delete returned
NotFound, avoiding unnecessary polling. Also rename UID to UUID for
consistency and use the JujuFieldManager constant instead of a
hardcoded "juju" string.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@adisazhar123 adisazhar123 self-requested a review February 18, 2026 03:59
Copy link
Copy Markdown
Member

@adisazhar123 adisazhar123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix. I have QAed it too and it looks good.

@adisazhar123
Copy link
Copy Markdown
Member

/merge

@jujubot jujubot merged commit f1306c1 into juju:3.6 Feb 18, 2026
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants