Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge 2.9 to 3.1 with sidecar teardown #15674

Merged
merged 59 commits into from
Jun 2, 2023

Conversation

tlm and others added 30 commits March 31, 2023 10:20
With Juju moving to sidecar charms running under Pebble it is
necessary to make sure that when units for an application are being
added or removed that they have their teardown hooks run to completion.

Once all units needing to be removed have become dead Juju will issue the scale down command to Kubernetes.

This commit does not deal with the units that can become dead through direct control of Kubernetes scale bypassing Juju.

It won't stop new units from coming up before the desired scale has been reached. This will be rectified in a separate commit.

Bug: https://bugs.launchpad.net/juju/+bug/1951415
run_refresh_channel_no_new_revision is intended to test charms are
correctly refreshed to a new channel even if the revision doesn't
change.

Use a test charm purpose built for this, as we cannot guarentee this
will be the case for any other charm
…tu_with_juju_qa_fixed_rev

juju#15636

run_refresh_channel_no_new_revision is intended to test charms are
correctly refreshed to a new channel even if the revision doesn't
change.

Use a test charm purpose built for this, as we cannot guarantee this
will be the case for any other charm

## Checklist

- [x] Code style: imports ordered, good names, simple structure, etc
- ~[ ] Comments saying why design decisions were made~
- ~[ ] Go unit tests, with comments saying what you're testing~
- [x] [Integration tests](https://github.com/juju/juju/tree/main/tests), with comments saying what you're testing
- ~[ ] [doc.go](https://discourse.charmhub.io/t/readme-in-packages/451) added or updated in changed packages~

## QA steps

```sh
./main.sh -v -c aws -p ec2 refresh test_basic
```
That is because new psql does not have TLS built-in. Fix is simple, integrate psql with certificates charm.
However, MM should not have container crash but go to the blocked state with clear error message to a user until TLS is available.
Thx to https://bugs.launchpad.net/charm-k8s-mattermost/+bug/1997540

Fixes static-analisys (formatting).
These charms managed by Juju team, and better fit CI goals on long
distance.

Removes unused dummy-sink-k8s charm.
If we're using postgres charm it prevents us from tearing down because
of storage issues. This test doesn't need to use postgres, we should
just us a simpler charm for now.
…ogs-aws

juju#15650

If we're using postgres charm it prevents us from tearing down because of storage issues. This test doesn't need to use postgres, we should just use a simpler charm for now.


## Checklist

- [x] [Integration tests](https://github.com/juju/juju/tree/main/tests), with comments saying what you're testing

## QA steps

```sh
$ cd tests && ./main.sh machine
```
Should stop this warning from appearing in CI:
Flag --tojson has been deprecated, please use -o=json instead
juju#15652

Should stop this warning from appearing in CI:
Flag --tojson has been deprecated, please use -o=json instead

*`-j` is an alias for `--tojson`, and `-o=j` is an alias for `-o=json`.*
We are having intermittent teardown failures in many tests.
This should help us diagnose the issue(s).
hpidcock and others added 19 commits May 26, 2023 11:45
…ries

juju#15661

GetOSFromSeries was making reads (calling getOSFromSeries) outside the scope of a mutex lock.

## QA steps

Run tests with -race

## Documentation changes

N/A

## Bug reference

https://jenkins.juju.canonical.com/job/unit-tests-race-amd64/1150/consoleText
juju#15662

Fixes 5s sleep being inadequate. Instead polls debug-log up to 10 times with 10s delay.

## QA steps

Apply this patch
```
diff --git a/testcharms/charms/departer/hooks/self-relation-departed b/testcharms/charms/departer/hooks/self-relation-departed
index 241c4f724d..b04dd07dc8 100644
--- a/testcharms/charms/departer/hooks/self-relation-departed
+++ b/testcharms/charms/departer/hooks/self-relation-departed
@@ -1,6 +1,8 @@
 #!/bin/sh
 echo $0

+sleep 30
+
 remote_unit=$(echo $JUJU_REMOTE_UNIT)
 departing_unit=$(echo $JUJU_DEPARTING_UNIT)
```

Run the test
```
./main.sh -v -s '"test_relation_data_exchange,test_relation_list_app"' relations test_relation_departing_unit`
```

## Documentation changes

N/A

## Bug reference

https://jenkins.juju.canonical.com/job/test-relations-test-relation-departing-unit-lxd/591/consoleText
juju#15663

Switch test-branch test to use juju-qa-dummy-source as it is a simple charm that won't change and has a configuration option for testing branches with.

## QA steps

```
./main.sh -v -s '"test_active_branch_output"' branches test_branch
```

## Documentation changes

N/A

## Bug reference

https://jenkins.juju.canonical.com/job/test-branches-test-branch-lxd/1147/consoleText
PR Feedback for side tear down work. Have renamed the life flag
ErrNotFound to ErrEntityNotFound to better reflect the error and it's
purpose.
CAASApplicationProvisioner now has a facade method for destroying units.
This commit adds missing unit tests to the client side of the Facade to
assert behaviour.
Update the error name used for life flag worker and client to be more
specific as to it's purpose.
juju#15313

With the introduction of sidecar charms and Pebble Juju had no logic to tear down charms. Specifically when a pod/unit inside of a Statefulset for Kubernetes was going away through the receiving of (SIGTERM) it would be given at most 5 seconds to run the charms tear down hooks. This also applied to cases of using `juju scale-application` as this subsequently just trigger a scale within Kubernetes.

Obviously in a lot of situations this isn't enough time for a charm to clean up after itself and push the state of the application into one that can be safely shut down. As part of this PR we are introducing the ability to use `juju scale-application` to now take individual units of an application into a "zombie" state by where tear down hooks are run and given an infinite amount of time to execute.

Once the desired number of units have entered a zombie/dead state then Juju will command the Kubernetes deployment resource to scale down.

To make sure that the zombie pod starts to be de-considered for traffic after tear down the charms readiness probe is set to failing.

This PR also fixes storage reattaching for sidecar applications.

Depends on juju/description#128

## Checklist

- [x] Code style: imports ordered, good names, simple structure, etc
- [x] Comments saying why design decisions were made
- [x] Go unit tests, with comments saying what you're testing
- [x] [Integration tests](https://github.com/juju/juju/tree/develop/tests), with comments saying what you're testing
- [x] [doc.go](https://discourse.charmhub.io/t/readme-in-packages/451) added or updated in changed packages

## QA steps

- create a sidecar charm that sleeps on remove hook
- bootstrap k8s
- `juju deploy mycharm`
- `juju scale-application 3`
- wait for units to settle
- `juju scale-application 0`
- wait for units to start running the remove hook
- `juju scale-application 3`
- observe that the units get removed/pods get removed, then the pods come back and new units come back.
- verify storage is reattached to the same unit
- verify all hooks have run before the unit and pod are removed

Things to also try:
- during scale down, delete a pod from the application, it should come back and continue the hooks.
- add a sleep in storage-attached or storage-detached hook and ensure the pod can be deleted and comes back during that hook and it reruns successfully.
- manually scale the stateful set in kubernetes, notice that the unwanted pods block in the init container.

## Documentation changes

N/A

## Bug reference

https://bugs.launchpad.net/juju/+bug/1951415
https://bugs.launchpad.net/juju/+bug/1995466
https://bugs.launchpad.net/juju/+bug/2018147
Comment on lines +46 to +50
if params.IsCodeAlreadyExists(err) {
return nil, errors.AlreadyExists
} else if params.IsCodeNotAssigned(err) {
return nil, errors.NotAssigned
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have used apiservererrors.RestoreError() here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will handle in 2.9

@@ -40,6 +42,9 @@ type Application interface {

// Units of the application fetched from kubernetes by matching pod labels.
Units() ([]Unit, error)

UnitsToRemove(context.Context, int) ([]string, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should comment this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will handle in 2.9

- Fix controller terminating due to caasapplicationprovisioner updating
  controller colocated sidecar charm.
- Fix startup shell scripts.
- Fix proxy error not surfacing.
@hpidcock hpidcock force-pushed the merge-2.9-3.1-sidecar-teardown branch from eae516e to 54b093c Compare June 2, 2023 03:11
@hpidcock
Copy link
Member Author

hpidcock commented Jun 2, 2023

/merge

@jujubot jujubot merged commit 5869687 into juju:3.1 Jun 2, 2023
18 of 20 checks passed
@hpidcock hpidcock mentioned this pull request Jun 2, 2023
jujubot added a commit that referenced this pull request Jun 2, 2023
#15693

Forward ports:
- #15636
- #15650
- #15652
- #15653
- #15633
- #15657
- #15660
- #15661
- #15662
- #15663
- #15313
- #15542
- #15674

Conflicts:
- cmd/containeragent/unit/manifolds.go
- cmd/containeragent/unit/manifolds_test.go
- tests/suites/machine/machine.sh
- tests/suites/refresh/refresh.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
8 participants