-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge 2.9 to 3.1 with sidecar teardown #15674
Conversation
With Juju moving to sidecar charms running under Pebble it is necessary to make sure that when units for an application are being added or removed that they have their teardown hooks run to completion. Once all units needing to be removed have become dead Juju will issue the scale down command to Kubernetes. This commit does not deal with the units that can become dead through direct control of Kubernetes scale bypassing Juju. It won't stop new units from coming up before the desired scale has been reached. This will be rectified in a separate commit. Bug: https://bugs.launchpad.net/juju/+bug/1951415
run_refresh_channel_no_new_revision is intended to test charms are correctly refreshed to a new channel even if the revision doesn't change. Use a test charm purpose built for this, as we cannot guarentee this will be the case for any other charm
…tu_with_juju_qa_fixed_rev juju#15636 run_refresh_channel_no_new_revision is intended to test charms are correctly refreshed to a new channel even if the revision doesn't change. Use a test charm purpose built for this, as we cannot guarantee this will be the case for any other charm ## Checklist - [x] Code style: imports ordered, good names, simple structure, etc - ~[ ] Comments saying why design decisions were made~ - ~[ ] Go unit tests, with comments saying what you're testing~ - [x] [Integration tests](https://github.com/juju/juju/tree/main/tests), with comments saying what you're testing - ~[ ] [doc.go](https://discourse.charmhub.io/t/readme-in-packages/451) added or updated in changed packages~ ## QA steps ```sh ./main.sh -v -c aws -p ec2 refresh test_basic ```
That is because new psql does not have TLS built-in. Fix is simple, integrate psql with certificates charm. However, MM should not have container crash but go to the blocked state with clear error message to a user until TLS is available. Thx to https://bugs.launchpad.net/charm-k8s-mattermost/+bug/1997540 Fixes static-analisys (formatting).
These charms managed by Juju team, and better fit CI goals on long distance. Removes unused dummy-sink-k8s charm.
If we're using postgres charm it prevents us from tearing down because of storage issues. This test doesn't need to use postgres, we should just us a simpler charm for now.
…ogs-aws juju#15650 If we're using postgres charm it prevents us from tearing down because of storage issues. This test doesn't need to use postgres, we should just use a simpler charm for now. ## Checklist - [x] [Integration tests](https://github.com/juju/juju/tree/main/tests), with comments saying what you're testing ## QA steps ```sh $ cd tests && ./main.sh machine ```
Should stop this warning from appearing in CI: Flag --tojson has been deprecated, please use -o=json instead
juju#15652 Should stop this warning from appearing in CI: Flag --tojson has been deprecated, please use -o=json instead *`-j` is an alias for `--tojson`, and `-o=j` is an alias for `-o=json`.*
We are having intermittent teardown failures in many tests. This should help us diagnose the issue(s).
…ries juju#15661 GetOSFromSeries was making reads (calling getOSFromSeries) outside the scope of a mutex lock. ## QA steps Run tests with -race ## Documentation changes N/A ## Bug reference https://jenkins.juju.canonical.com/job/unit-tests-race-amd64/1150/consoleText
juju#15662 Fixes 5s sleep being inadequate. Instead polls debug-log up to 10 times with 10s delay. ## QA steps Apply this patch ``` diff --git a/testcharms/charms/departer/hooks/self-relation-departed b/testcharms/charms/departer/hooks/self-relation-departed index 241c4f724d..b04dd07dc8 100644 --- a/testcharms/charms/departer/hooks/self-relation-departed +++ b/testcharms/charms/departer/hooks/self-relation-departed @@ -1,6 +1,8 @@ #!/bin/sh echo $0 +sleep 30 + remote_unit=$(echo $JUJU_REMOTE_UNIT) departing_unit=$(echo $JUJU_DEPARTING_UNIT) ``` Run the test ``` ./main.sh -v -s '"test_relation_data_exchange,test_relation_list_app"' relations test_relation_departing_unit` ``` ## Documentation changes N/A ## Bug reference https://jenkins.juju.canonical.com/job/test-relations-test-relation-departing-unit-lxd/591/consoleText
juju#15663 Switch test-branch test to use juju-qa-dummy-source as it is a simple charm that won't change and has a configuration option for testing branches with. ## QA steps ``` ./main.sh -v -s '"test_active_branch_output"' branches test_branch ``` ## Documentation changes N/A ## Bug reference https://jenkins.juju.canonical.com/job/test-branches-test-branch-lxd/1147/consoleText
PR Feedback for side tear down work. Have renamed the life flag ErrNotFound to ErrEntityNotFound to better reflect the error and it's purpose.
CAASApplicationProvisioner now has a facade method for destroying units. This commit adds missing unit tests to the client side of the Facade to assert behaviour.
Update the error name used for life flag worker and client to be more specific as to it's purpose.
juju#15313 With the introduction of sidecar charms and Pebble Juju had no logic to tear down charms. Specifically when a pod/unit inside of a Statefulset for Kubernetes was going away through the receiving of (SIGTERM) it would be given at most 5 seconds to run the charms tear down hooks. This also applied to cases of using `juju scale-application` as this subsequently just trigger a scale within Kubernetes. Obviously in a lot of situations this isn't enough time for a charm to clean up after itself and push the state of the application into one that can be safely shut down. As part of this PR we are introducing the ability to use `juju scale-application` to now take individual units of an application into a "zombie" state by where tear down hooks are run and given an infinite amount of time to execute. Once the desired number of units have entered a zombie/dead state then Juju will command the Kubernetes deployment resource to scale down. To make sure that the zombie pod starts to be de-considered for traffic after tear down the charms readiness probe is set to failing. This PR also fixes storage reattaching for sidecar applications. Depends on juju/description#128 ## Checklist - [x] Code style: imports ordered, good names, simple structure, etc - [x] Comments saying why design decisions were made - [x] Go unit tests, with comments saying what you're testing - [x] [Integration tests](https://github.com/juju/juju/tree/develop/tests), with comments saying what you're testing - [x] [doc.go](https://discourse.charmhub.io/t/readme-in-packages/451) added or updated in changed packages ## QA steps - create a sidecar charm that sleeps on remove hook - bootstrap k8s - `juju deploy mycharm` - `juju scale-application 3` - wait for units to settle - `juju scale-application 0` - wait for units to start running the remove hook - `juju scale-application 3` - observe that the units get removed/pods get removed, then the pods come back and new units come back. - verify storage is reattached to the same unit - verify all hooks have run before the unit and pod are removed Things to also try: - during scale down, delete a pod from the application, it should come back and continue the hooks. - add a sleep in storage-attached or storage-detached hook and ensure the pod can be deleted and comes back during that hook and it reruns successfully. - manually scale the stateful set in kubernetes, notice that the unwanted pods block in the init container. ## Documentation changes N/A ## Bug reference https://bugs.launchpad.net/juju/+bug/1951415 https://bugs.launchpad.net/juju/+bug/1995466 https://bugs.launchpad.net/juju/+bug/2018147
(cherry picked from commit 7e1b1dc)
if params.IsCodeAlreadyExists(err) { | ||
return nil, errors.AlreadyExists | ||
} else if params.IsCodeNotAssigned(err) { | ||
return nil, errors.NotAssigned | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have used apiservererrors.RestoreError()
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will handle in 2.9
@@ -40,6 +42,9 @@ type Application interface { | |||
|
|||
// Units of the application fetched from kubernetes by matching pod labels. | |||
Units() ([]Unit, error) | |||
|
|||
UnitsToRemove(context.Context, int) ([]string, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should comment this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will handle in 2.9
- Fix controller terminating due to caasapplicationprovisioner updating controller colocated sidecar charm. - Fix startup shell scripts. - Fix proxy error not surfacing.
eae516e
to
54b093c
Compare
/merge |
#15693 Forward ports: - #15636 - #15650 - #15652 - #15653 - #15633 - #15657 - #15660 - #15661 - #15662 - #15663 - #15313 - #15542 - #15674 Conflicts: - cmd/containeragent/unit/manifolds.go - cmd/containeragent/unit/manifolds_test.go - tests/suites/machine/machine.sh - tests/suites/refresh/refresh.sh
Forward ports:
aws ec2 delete-network-interface
#15660Back ports:
New changes:
Minor conflicts across most files.