Skip to content

[Bug]: wait_for_idle does not wait for podspec charms when deployed along sidecar charms #900

@DnPlas

Description

@DnPlas

Description

It seems like when deploying sidecar and podspec charms, wait_for_idle does not wait for podspec charms to be active and idle.

In this case for instance, the training-operator is a sidecar charm and the other two are podspec.

test_pod_spec_and_sidecar.py::test_deploy_pod_spec_chamrs PASSED
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Model                           Controller  Cloud/Region        Version  SLA          Timestamp
test-pod-spec-and-sidecar-61ew  uk8s        microk8s/localhost  2.9.42   unsupported  13:46:41Z

App                Version                Status   Scale  Charm              Channel         Rev  Address         Exposed  Message
argo-server        res:oci-image@576d038  waiting      1  argo-server        3.3/stable      185                  no       
minio              res:oci-image@1755999  waiting      1  minio              ckf-1.7/stable  186  10.152.183.89   no       
training-operator                         active       1  training-operator  1.6/stable      215  10.152.183.191  no       

Unit                  Workload  Agent  Address     Ports              Message
argo-server/0*        waiting   idle   10.1.15.22  2746/TCP           waiting for container
minio/0*              waiting   idle   10.1.15.23  9000/TCP,9001/TCP  waiting for container
training-operator/0*  active    idle   10.1.15.12      

This behaviour is not observed in a model with only podspec or only sidecars. Also, it appears to not happen when all units (podspec and sidecar) are active and idle before the timeout runs out.

An example in CI

Another much more complex example is:
I have this PR, where I'm interested in running integration tests. The first step is to build and deploy 10+ charms from this bundle definition. The second step is to wait for idle.
Eventually the execution continues past the wait_for_idle call, but if I check the juju status, some of the units are still in a waiting status.

Workaround

Use tenacity to iterate over all podspec Deployments and assert the replicas are ready and available. See here for more info.

Urgency

Annoying bug in our test suite

Python-libjuju version

juju==2.9.42.4

Juju version

2.9.42-ubuntu-amd64

Reproduce / Test

Use the files in: https://gist.github.com/DnPlas/7090b044f2d52d9666bdac284a73a776

1. tox -ve combined
2. juju status --watch 5s
3. Observe the training-operator being active and idle, observe other charms in waiting status
4. Observe once the training-operator is active and idle (and the timeout runs out) the test case passes

Metadata

Metadata

Assignees

Labels

hint/2.9going on 2.9 branchkind/bugindicates a bug in the project

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions