Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JUJU-2955] Sidecar teardown logic refinement to respect charm hooks. #15313

Merged
merged 32 commits into from May 29, 2023

Conversation

tlm
Copy link
Member

@tlm tlm commented Mar 20, 2023

With the introduction of sidecar charms and Pebble Juju had no logic to tear down charms. Specifically when a pod/unit inside of a Statefulset for Kubernetes was going away through the receiving of (SIGTERM) it would be given at most 5 seconds to run the charms tear down hooks. This also applied to cases of using juju scale-application as this subsequently just trigger a scale within Kubernetes.

Obviously in a lot of situations this isn't enough time for a charm to clean up after itself and push the state of the application into one that can be safely shut down. As part of this PR we are introducing the ability to use juju scale-application to now take individual units of an application into a "zombie" state by where tear down hooks are run and given an infinite amount of time to execute.

Once the desired number of units have entered a zombie/dead state then Juju will command the Kubernetes deployment resource to scale down.

To make sure that the zombie pod starts to be de-considered for traffic after tear down the charms readiness probe is set to failing.

This PR also fixes storage reattaching for sidecar applications.

Depends on juju/description#128

Checklist

  • Code style: imports ordered, good names, simple structure, etc
  • Comments saying why design decisions were made
  • Go unit tests, with comments saying what you're testing
  • Integration tests, with comments saying what you're testing
  • doc.go added or updated in changed packages

QA steps

  • create a sidecar charm that sleeps on remove hook
  • bootstrap k8s
  • juju deploy mycharm
  • juju scale-application 3
  • wait for units to settle
  • juju scale-application 0
  • wait for units to start running the remove hook
  • juju scale-application 3
  • observe that the units get removed/pods get removed, then the pods come back and new units come back.
  • verify storage is reattached to the same unit
  • verify all hooks have run before the unit and pod are removed

Things to also try:

  • during scale down, delete a pod from the application, it should come back and continue the hooks.
  • add a sleep in storage-attached or storage-detached hook and ensure the pod can be deleted and comes back during that hook and it reruns successfully.
  • manually scale the stateful set in kubernetes, notice that the unwanted pods block in the init container.

Documentation changes

N/A

Bug reference

https://bugs.launchpad.net/juju/+bug/1951415
https://bugs.launchpad.net/juju/+bug/1995466
https://bugs.launchpad.net/juju/+bug/2018147

Copy link
Member

@wallyworld wallyworld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments to get started....

api/agent/lifeflag/client.go Outdated Show resolved Hide resolved
api/controller/caasapplicationprovisioner/client.go Outdated Show resolved Hide resolved
apiserver/facades/agent/lifeflag/register.go Outdated Show resolved Hide resolved
worker/caasapplicationprovisioner/application.go Outdated Show resolved Hide resolved
worker/uniter/uniter.go Outdated Show resolved Hide resolved
worker/uniter/uniter.go Show resolved Hide resolved
worker/uniter/operation/errors.go Show resolved Hide resolved
worker/simplesignalhandler/manifold.go Show resolved Hide resolved
@tlm tlm changed the title Early draft for tear down work. [JUJU-2955] Early draft for tear down work. Mar 21, 2023
@hpidcock hpidcock added the 2.9 label Mar 23, 2023
@tlm tlm force-pushed the teardown-2.0 branch 2 times, most recently from fdfedf4 to 226d1ff Compare March 29, 2023 02:23
@tlm tlm marked this pull request as ready for review March 30, 2023 07:49
With Juju moving to sidecar charms running under Pebble it is
necessary to make sure that when units for an application are being
added or removed that they have their teardown hooks run to completion.

Once all units needing to be removed have become dead Juju will issue the scale down command to Kubernetes.

This commit does not deal with the units that can become dead through direct control of Kubernetes scale bypassing Juju.

It won't stop new units from coming up before the desired scale has been reached. This will be rectified in a separate commit.

Bug: https://bugs.launchpad.net/juju/+bug/1951415
api/common/lifeflag/facade.go Outdated Show resolved Hide resolved
api/controller/lifeflag/client.go Outdated Show resolved Hide resolved
cmd/containeragent/unit/manifolds.go Outdated Show resolved Hide resolved
cmd/jujud/agent/checkconnection.go Outdated Show resolved Hide resolved
worker/simplesignalhandler/manifold_test.go Outdated Show resolved Hide resolved
worker/simplesignalhandler/package.go Outdated Show resolved Hide resolved
worker/simplesignalhandler/signalwatcher.go Show resolved Hide resolved
worker/stateconverter/converter.go Outdated Show resolved Hide resolved
worker/uniter/operation/errors.go Show resolved Hide resolved
@tlm tlm changed the title [JUJU-2955] Early draft for tear down work. [JUJU-2955] Start of teardown work. Apr 3, 2023
@hpidcock hpidcock changed the title [JUJU-2955] Start of teardown work. [JUJU-2955] Sidecar teardown logic refinement to respect charm hooks. Apr 13, 2023
@hpidcock
Copy link
Member

/build

@hpidcock hpidcock added the bug The PR addresses a bug label May 1, 2023
Copy link
Member

@wallyworld wallyworld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments so far....

api/common/lifeflag/facade_test.go Outdated Show resolved Hide resolved
api/controller/caasapplicationprovisioner/client.go Outdated Show resolved Hide resolved
api/agent/uniter/uniter.go Show resolved Hide resolved
state/errors/application.go Show resolved Hide resolved
state/storage.go Show resolved Hide resolved
worker/caasapplicationprovisioner/ops.go Outdated Show resolved Hide resolved
worker/caasapplicationprovisioner/ops.go Show resolved Hide resolved
worker/caasapplicationprovisioner/ops.go Outdated Show resolved Hide resolved
hpidcock and others added 9 commits May 29, 2023 09:55
PR Feedback for side tear down work. Have renamed the life flag
ErrNotFound to ErrEntityNotFound to better reflect the error and it's
purpose.
CAASApplicationProvisioner now has a facade method for destroying units.
This commit adds missing unit tests to the client side of the Facade to
assert behaviour.
Update the error name used for life flag worker and client to be more
specific as to it's purpose.
worker/uniter/runner/context/context.go Outdated Show resolved Hide resolved
worker/uniter/storage/attachments.go Show resolved Hide resolved
worker/uniter/runner/context/contextfactory.go Outdated Show resolved Hide resolved
@hpidcock
Copy link
Member

/merge

@jujubot jujubot merged commit 243fd66 into juju:2.9 May 29, 2023
18 of 19 checks passed
@hpidcock hpidcock mentioned this pull request Jun 2, 2023
jujubot added a commit that referenced this pull request Jun 2, 2023
#15693

Forward ports:
- #15636
- #15650
- #15652
- #15653
- #15633
- #15657
- #15660
- #15661
- #15662
- #15663
- #15313
- #15542
- #15674

Conflicts:
- cmd/containeragent/unit/manifolds.go
- cmd/containeragent/unit/manifolds_test.go
- tests/suites/machine/machine.sh
- tests/suites/refresh/refresh.sh
@wallyworld wallyworld mentioned this pull request Jun 9, 2023
jujubot added a commit that referenced this pull request Jun 9, 2023
#15719

Merge 3.2

No conflicts. Upgrade steps were deleted.

#15636 [from jack-w-shaw/JUJU-3822_replace_ubuntu_w…](1a2759d)
#15650 [from SimonRichardson/fix-machine-test-logs-aws](c316413)
#15652 [from barrettj12/ojson](4436840)
#15653 [from barrettj12/teardown-logs](ee886dd)
#15633 [from anvial/JUJU-3821-fix-test-deploy-aks-c…](0bb6df2)
#15657 [from anvial/JUJU-3842-fix-deploy-aks-clean-…](f16dd80)
#15660 [from hpidcock/ignore-delete-network-interface](686b6f2)
#15661 [from hpidcock/fix-data-race-getosfromseries](f639f09)
#15662 [from hpidcock/fix-relation-departing-test](68ef945)
#15663 [from hpidcock/fix-test-branch](c14bcb3)
#15313 [from tlm/teardown-2.0](243fd66)
#15542 [from masnax/project-fix](8ba03ec)
#15676 [from wallyworld/newer-clients-migrate](7c1f884)
#15672 [from hpidcock/bump-juju-description-v3.0.15](acec126)
#15683 [from hpidcock/fix-persistent-storage-test](460dd21358378e7d1204348c37a9ba26e49b5871)https://github.com/juju/juju/pull/15673 [from barrettj12/check-merge](5c253c3)
#15677 [from barrettj12/invalid-offer](3cb3f8b)
#15654 [from ycliuhw/fix/backendRefCount](4e5ae3c)
#15692 [from wallyworld/fix-secrets-cmr](a1fb0c4)
#15709 [from wallyworld/hook-secret-revison](840bc09)
#15701 [from hpidcock/fix-upgrade-podspec-sidecar](d465c93)
#15681 [from anvial/JUJU-3882-fix-test-deploy-test-…](f54c1ad)
#15573 [from jack-w-shaw/JUJU-3447_implement_ModelF…](f7082ca)
#15714 [from wallyworld/offer-consume=sameapp](6390036)

## QA steps

See PRs
@wallyworld wallyworld mentioned this pull request Jun 9, 2023
jujubot added a commit that referenced this pull request Jun 9, 2023
#15721

Merge 3.3


#15636 [from jack-w-shaw/JUJU-3822_replace_ubuntu_w…](1a2759d)
#15650 [from SimonRichardson/fix-machine-test-logs-aws](c316413)
#15652 [from barrettj12/ojson](4436840)
#15653 [from barrettj12/teardown-logs](ee886dd)
#15633 [from anvial/JUJU-3821-fix-test-deploy-aks-c…](0bb6df2)
#15657 [from anvial/JUJU-3842-fix-deploy-aks-clean-…](f16dd80)
#15660 [from hpidcock/ignore-delete-network-interface](686b6f2)
#15661 [from hpidcock/fix-data-race-getosfromseries](f639f09)
#15662 [from hpidcock/fix-relation-departing-test](68ef945)
#15663 [from hpidcock/fix-test-branch](c14bcb3)
#15313 [from tlm/teardown-2.0](243fd66)
#15542 [from masnax/project-fix](8ba03ec)
#15676 [from wallyworld/newer-clients-migrate](7c1f884)
#15672 [from hpidcock/bump-juju-description-v3.0.15](acec126)
#15683 [from hpidcock/fix-persistent-storage-test](460dd21358378e7d1204348c37a9ba26e49b5871)https://github.com/juju/juju/pull/15673 [from barrettj12/check-merge](5c253c3)
#15677 [from barrettj12/invalid-offer](3cb3f8b)
#15654 [from ycliuhw/fix/backendRefCount](4e5ae3c)
#15692 [from wallyworld/fix-secrets-cmr](a1fb0c4)
#15709 [from wallyworld/hook-secret-revison](840bc09)
#15701 [from hpidcock/fix-upgrade-podspec-sidecar](d465c93)
#15681 [from anvial/JUJU-3882-fix-test-deploy-test-…](f54c1ad)
#15573 [from jack-w-shaw/JUJU-3447_implement_ModelF…](f7082ca)
#15714 [from wallyworld/offer-consume=sameapp](6390036)

Conflicts
```
# Conflicts:
# apiserver/facades/client/client/client_test.go
# featuretests/package_test.go
#
```

## QA steps

See PRs
jujubot added a commit that referenced this pull request Sep 13, 2023
#16231

Destroying a k8s controller or model would immediately move unit storage attachments from alive to dying to dead/removed instead of just marking them as dying and letting the uniter remove the attachments gracefully.

This change makes model destruction to be more graceful on k8s units with storage attached, allowing them to correctly run their storage-detaching hooks.

The reason this is happening now is due to [this](#15313) fix for [this storage bug](https://bugs.launchpad.net/juju/+bug/2018147).

## QA steps

This should not work before this change and should cleanup correctly with this change.
```
$ juju bootstrap k8s
$ juju add-model a
$ juju deploy prometheus-k8s --trust
# wait for it to settle
$ juju destroy-controller k8s --destroy-all-models --destroy-storage # or release-storage
```

## Documentation changes

N/A

## Bug reference

https://bugs.launchpad.net/juju/+bug/2035058
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.9 bug The PR addresses a bug
Projects
None yet
6 participants