Description
It seems like when deploying sidecar and podspec charms, wait_for_idle does not wait for podspec charms to be active and idle.
In this case for instance, the training-operator is a sidecar charm and the other two are podspec.
test_pod_spec_and_sidecar.py::test_deploy_pod_spec_chamrs PASSED
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Model Controller Cloud/Region Version SLA Timestamp
test-pod-spec-and-sidecar-61ew uk8s microk8s/localhost 2.9.42 unsupported 13:46:41Z
App Version Status Scale Charm Channel Rev Address Exposed Message
argo-server res:oci-image@576d038 waiting 1 argo-server 3.3/stable 185 no
minio res:oci-image@1755999 waiting 1 minio ckf-1.7/stable 186 10.152.183.89 no
training-operator active 1 training-operator 1.6/stable 215 10.152.183.191 no
Unit Workload Agent Address Ports Message
argo-server/0* waiting idle 10.1.15.22 2746/TCP waiting for container
minio/0* waiting idle 10.1.15.23 9000/TCP,9001/TCP waiting for container
training-operator/0* active idle 10.1.15.12
This behaviour is not observed in a model with only podspec or only sidecars. Also, it appears to not happen when all units (podspec and sidecar) are active and idle before the timeout runs out.
An example in CI
Another much more complex example is:
I have this PR, where I'm interested in running integration tests. The first step is to build and deploy 10+ charms from this bundle definition. The second step is to wait for idle.
Eventually the execution continues past the wait_for_idle call, but if I check the juju status, some of the units are still in a waiting status.
Workaround
Use tenacity to iterate over all podspec Deployments and assert the replicas are ready and available. See here for more info.
Urgency
Annoying bug in our test suite
Python-libjuju version
juju==2.9.42.4
Juju version
2.9.42-ubuntu-amd64
Reproduce / Test
Use the files in: https://gist.github.com/DnPlas/7090b044f2d52d9666bdac284a73a776
1. tox -ve combined
2. juju status --watch 5s
3. Observe the training-operator being active and idle, observe other charms in waiting status
4. Observe once the training-operator is active and idle (and the timeout runs out) the test case passes
Description
It seems like when deploying sidecar and podspec charms,
wait_for_idledoes not wait for podspec charms to be active and idle.In this case for instance, the
training-operatoris a sidecar charm and the other two are podspec.This behaviour is not observed in a model with only podspec or only sidecars. Also, it appears to not happen when all units (podspec and sidecar) are active and idle before the timeout runs out.
An example in CI
Another much more complex example is:
I have this PR, where I'm interested in running integration tests. The first step is to build and deploy 10+ charms from this bundle definition. The second step is to wait for idle.
Eventually the execution continues past the
wait_for_idlecall, but if I check thejuju status, some of the units are still in a waiting status.Workaround
Use
tenacityto iterate over all podspecDeploymentsand assert the replicas are ready and available. See here for more info.Urgency
Annoying bug in our test suite
Python-libjuju version
juju==2.9.42.4
Juju version
2.9.42-ubuntu-amd64
Reproduce / Test