`unit.run` output doesn't contain "results" field in v3.0.0, which breaks `run_on_unit` function in zaza #707

agileshaw · 2022-08-11T00:09:47Z

With python-libjuju 3.0.0, unit.run returns less information than before, notably the absent of the results field. I think the commit causes this change is a1c4f20 (when self.connection.is_using_old_client is true).

With 3.0.0

>>> print(action.data)
{'model-uuid': '12f9d804-4369-4071-86ee-3dfaaae00ed4', 'id': '26', 'receiver': 'ubuntu/0', 'name': 'juju-run', 'status': 'completed', 'message': '', 'enqueued': '2022-08-10T23:10:02Z', 'started': '2022-08-10T23:10:06Z', 'completed': '2022-08-10T23:10:06Z'}

With 2.9.11

>>> action = await unit.run("df -h", timeout=None)
>>> print(action.data)
{'model-uuid': '12f9d804-4369-4071-86ee-3dfaaae00ed4', 'id': '28', 'receiver': 'ubuntu/0', 'name': 'juju-run', 'parameters': {'command': 'df -h', 'timeout': 0, 'workload-context': False}, 'status': 'completed', 'message': '', 'results': {'Code': '0', 'Stdout': 'Filesystem      Size  Used Avail Use% Mounted on\nudev            975M     0  975M   0% /dev\ntmpfs           199M  1.1M  198M   1% /run\n/dev/vda1        20G  2.0G   18G  11% /\ntmpfs           992M     0  992M   0% /dev/shm\ntmpfs           5.0M     0  5.0M   0% /run/lock\ntmpfs           992M     0  992M   0% /sys/fs/cgroup\n/dev/loop0       62M   62M     0 100% /snap/core20/1518\n/dev/loop1       68M   68M     0 100% /snap/lxd/22753\n/dev/loop2       47M   47M     0 100% /snap/snapd/16292\n/dev/vda15      105M  5.2M  100M   5% /boot/efi\n/dev/vdb         40G   24K   38G   1% /mnt\ntmpfs           1.0M     0  1.0M   0% /var/snap/lxd/common/ns\n'}, 'enqueued': '2022-08-10T23:46:05Z', 'started': '2022-08-10T23:46:06Z', 'completed': '2022-08-10T23:46:07Z'}

This causes the zaza functional tests failure when executing run_on_unit function.

Additional information

>>> print(unit.connection.info["server-version"])
2.9.29

The text was updated successfully, but these errors were encountered:

cderici · 2022-08-11T00:27:44Z

Hi @agileshaw, thanks for your detailed input! Yeah this is a result of an upstream change in the coming juju 3.0 api, things got confused a little bit because we wanted pylibjuju to support both 2.9 and 3.0 at the same time.

I believe #706 should fix this. We'll keep working on it if it doesn't 👍

agileshaw · 2022-08-11T13:51:26Z

Hello @cderici . Thank you for the fast response. I would like to let you know that, unfortunately, v3.0.1 didn't solve the above-mentioned issue. It seems that the problem sits in the unit.run function, which was not in the scope of the newly merged PR .

cderici · 2022-08-11T17:48:52Z

Ah yes I thought I saw run_action there, sorry about that.

So this is actually a change in the juju api before the juju 3.0. The tldr is, instead of getting the data like action.data, you'll need to wait on it first, e.g. action = await action.wait(), and then access the results of running the action via action.results. The long story is below.

So, looks like you're using slightly older Juju from latest/stable. The problem is, the Juju you're using, the 2.9.29, is using the Action Facade v6, whereas the Juju latest/stable (2.9.33) (and later versions like 3.0) is using the Action Facade v7 (that uses a new Run function that's not anymore a synchronous function -- it queues operations and instead of the results we get the enqueued actions to later "wait" on to get the results).

These sorts of breaking changes (that we had to do to support the Juju 3.0) are actually why we had a major version bump in python-libjuju. However, we're still trying to support 2.9 track as well, so I think we should've made these clearer in the changelogs.

What I'm doing right now is that I'm cleaning up the run function (looks like it needs a bit of love too) and in the PR I'll make sure to add a test that shows the use of this new run version. Hope this helps 👍

agileshaw · 2022-08-11T19:04:18Z

Thanks for the detailed explanation.

In that case, I think zaza test would need to be updated to facilitate this API change once they unpin libjuju<3.0.0. I'm looking forward to the new PR and seeing the usage of the run function.

agileshaw · 2022-08-11T19:20:07Z

I was also wondering if it is possible to have the action = await action.wait() build into the run function when self.connection.is_using_old_client is evaluated to be true. This way the return object from this function would be consistent for both types of connections.

addyess · 2022-08-11T20:34:47Z

👀 i'm SUPER interested in this answer

addyess · 2022-08-11T21:04:09Z

OK, so more info

i'm getting different results dictionaries on a few different runs:

here's the setup

>>> import asyncio
>>> def do(c):
...     return asyncio.get_event_loop().run_until_complete(c)
...
>>> from juju.model import Model
>>> model = Model()

On Vsphere (addyess-vsphere k8s-vsphere/Boston 2.9.31)

>>> do(model.connect_model("addyess-vsphere:admin/default"))
>>> unit = model.units['etcd/0']
>>> action = do(unit.run("whoami"))
>>> result = do(action.wait())
>>> result.results
{'Code': '0', 'Stdout': 'root\n'}

on Azure (addyess-azure azure/centralus 2.9.33)

>>> do(model.connect_model("addyess-azure:admin/default"))
>>> unit = model.units['etcd/0']
>>> action = do(unit.run("whoami"))
>>> result = do(action.wait())
>>> result.results
{'return-code': 0, 'stdout': 'root\n'}

addyess · 2022-08-11T21:30:45Z

here's some fun antics i'm taking to "work-around"

async def juju_run(unit: Unit, cmd):
    action = await unit.run(cmd)
    result = await action.wait()
    code = str(result.results.get("Code") or result.results.get("return-code"))
    stdout = result.results.get("Stdout") or result.results.get("stdout")
    stderr = result.results.get("Stderr") or result.results.get("stderr")
    assert code == "0", f"{cmd} failed ({code}): {stderr or stdout}"
    return stdout

... regardless of the version of the client and/or facade being used Fixes juju#707

cderici · 2022-08-11T23:51:35Z

i'm getting different results dictionaries on a few different runs:

To see why the internals of the received objects are different, I'd need to look at Juju's side of things @addyess. It is possible that 2.9.33 to use ActionFacade v7, while the 2.9.31 is using v6 and whoever made that change in the Juju side (added v7) wasn't being consistent there. Regardless, this looks like an issue on the Juju side, rather than python-libjuju 👍

cderici · 2022-08-11T23:57:11Z

I was also wondering if it is possible to have the action = await action.wait() build into the run function when self.connection.is_using_old_client is evaluated to be true. This way the return object from this function would be consistent for both types of connections.

I talked about your request in the PR a little bit, @agileshaw . I think it's valid (and also a very easy/cheap way for me to keep the backwards compatibility), however, I think it would only make sense if we were releasing for the 2.9 track. I don't think it's a good idea to do that in the 3.0. You can see my explanation in #710. Hope this helps, and hope #710 solves your issue 👍

juju#707 I agree there's inconsistency from the juju api response there. Since `action.wait()` ultimately just fills out the `results` dict more, what we we add some decode logic to the `Action` object to handle the various facades? Action could have an attribute that returned an object like `subprocess.CompletedProcess`[https://docs.python.org/3/library/subprocess.html#subprocess.CompletedProcess] users of the unit.run class could do this: ```python response = await unit.run(cmd) response.completed.check_returncode() # raises a known exception for non-zero return codes stdout = response.completed.stdout ```

jameinel · 2022-08-12T15:48:32Z

I think it makes sense to do this, but from the other way around. If we see that we are connecting to the 2.9 facade, massage the fields to be consistent with the return values from the 3.0 request. The end user still needs to update their code to be compatible with the 3.0 API, but they don't have to carry the burden of 2.9 compatibility in their code tree as well.

…

On Thu, Aug 11, 2022 at 7:57 PM Caner Derici ***@***.***> wrote: I was also wondering if it is possible to have the action = await action.wait() build into the run function when self.connection.is_using_old_client is evaluated to be true. This way the return object from this function would be consistent for both types of connections. I talked about your request in the PR a little bit, @agileshaw <https://github.com/agileshaw> . I think it's valid (and also a very easy/cheap way for me to keep the backwards compatibility), however, I think it would only make sense if we were releasing for the 2.9 track. I don't think it's a good idea to do that in the 3.0. You can see my explanation in #710 <#710>. Hope this helps, and hope #710 <#710> solves your issue 👍 — Reply to this email directly, view it on GitHub <#707 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABRQ7JQMSMOEAS7YPG3N6TVYWHOHANCNFSM56GLLYBA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

agileshaw · 2022-08-12T21:46:23Z

@cderici Thanks for the explanation and PR #710. I tend to agree with the approach suggested by @jameinel . Since the unit.run function is already checking the connection type (Action Facade v6 vs v7), it seems redundant to let the user check it again on their side to determine what kind of results they are expecting. It would make much more sense if the returned result is consistent.

cderici · 2022-08-16T22:04:38Z

@agileshaw Now with the #710 landing the results should be consistent across the board. Also, see dcadb7a adds a little convenience flag to be used with the older versions, or for whoever wants to block until a result is there, hope it makes it easier for you to use 👍

#710 #### Description This is PR complements #706, and tries to address the issues being discussed in #707, by updating the `old_client`s with `2.9.34` schema and fixing the `unit.run()` function to be asynchronous, as it's (breakingly) changed to be in the `Juju 3.0`. This also brings `unit.run` and `unit.run_action` functions very close, the only difference is that the former is using `Action.Run` while the latter is using `Action.Enqueue` functions. It's plausible that one of these ways of doing the similar thing will disappear in the future since the newer versions of `Action.Run` is asynchronous anyway. Should fix the #707. #### QA Steps I updated the integration tests for `run()`, so basically all the tests in the following suite (that includes `test_run` and `test_run_action`) should pass: ```python tox -e integration -- tests/integration/test_unit.py ``` Additionally I added the example that I have been using myself to test this into the examples folder, so it should be running fine: ``` $ python examples/run_action.py ``` Both the integration tests (`test_run` & `test_run_action`) and the example module are tested in the following controllers: ``` Controller Model User Access Cloud/Region Models Nodes HA Version caner-k8s test admin superuser microk8s/localhost 2 - - 2.9.33 caner-k8s-2929 testhede admin superuser microk8s/localhost 2 - - 2.9.29 caner-k8s-2931 testhaylaz admin superuser microk8s/localhost 2 - - 2.9.31 caner-k8s-2934* haylazlar admin superuser microk8s/localhost 2 - - 2.9.34 caner-lxd-2.9 default admin superuser localhost/localhost 3 2 none 2.9.34.1 caner-lxd-2929 default admin superuser localhost/localhost 2 1 none 2.9.29 caner-lxd-2931 default admin superuser localhost/localhost 2 1 none 2.9.31 caner-lxd-2932 default admin superuser localhost/localhost 2 1 none 2.9.32 caner-lxd-3.0 canhiras admin superuser localhost/localhost 1 1 none 3.0-rc1.1 k8s-30 test4 admin superuser microk8s/localhost 2 - - 3.0-rc1 ``` Note that, all the clients that I used `<2.9.34` are using the `ActionFacade v6`, and the ones `>=2.9.34` are using the `ActionFacade v7` (regardless of `2.9` vs` 3.0`). #### Notes & Discussion One of the requests was to keep the `unit.run()` function synchronous for any client using `<ActionFacade v6`. While this is a valid request from the user perspective, it also means that we would have the same function returning different types (either a dictionary or an action object) depending on the facade being used, which would make it way harder to maintain. That's why I kept the change for all the versions.

nobuto-m mentioned this issue Aug 11, 2022

Possible regression - there is no tailed log from a failed unit openstack-charmers/zaza#531

Open

addyess mentioned this issue Aug 11, 2022

Update allow-privileged description to better describe situations it's necessary charmed-kubernetes/charm-kubernetes-control-plane#236

Merged

cderici added a commit to cderici/python-libjuju that referenced this issue Aug 11, 2022

Fix the unit.run function to conform to the juju 3.0 way of interaction

0d04d6e

... regardless of the version of the client and/or facade being used Fixes juju#707

cderici mentioned this issue Aug 11, 2022

[JUJU-1593] Fix unit.run() and update the old client codes #710

Merged

arturo-seijas mentioned this issue Aug 12, 2022

Fix failing integration test canonical/indico-operator#61

Merged

addyess mentioned this issue Aug 12, 2022

Possible solution for unit run inconsistencies #711

Closed

addyess mentioned this issue Aug 12, 2022

Bump juju from 3.0.0 to 3.0.1 charmed-kubernetes/jenkins#978

Merged

jujubot closed this as completed in c1b997f Aug 16, 2022

amandahla mentioned this issue Jun 1, 2023

fix(ISD-413): Upgrade celery-prometheus-exporter-image canonical/indico-operator#271

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`unit.run` output doesn't contain "results" field in v3.0.0, which breaks `run_on_unit` function in zaza #707

`unit.run` output doesn't contain "results" field in v3.0.0, which breaks `run_on_unit` function in zaza #707

agileshaw commented Aug 11, 2022

cderici commented Aug 11, 2022

agileshaw commented Aug 11, 2022

cderici commented Aug 11, 2022

agileshaw commented Aug 11, 2022

agileshaw commented Aug 11, 2022

addyess commented Aug 11, 2022

addyess commented Aug 11, 2022 •

edited

Loading

addyess commented Aug 11, 2022 •

edited

Loading

cderici commented Aug 11, 2022

cderici commented Aug 11, 2022

jameinel commented Aug 12, 2022 via email

agileshaw commented Aug 12, 2022

cderici commented Aug 16, 2022

unit.run output doesn't contain "results" field in v3.0.0, which breaks run_on_unit function in zaza #707

unit.run output doesn't contain "results" field in v3.0.0, which breaks run_on_unit function in zaza #707

Comments

agileshaw commented Aug 11, 2022

cderici commented Aug 11, 2022

agileshaw commented Aug 11, 2022

cderici commented Aug 11, 2022

agileshaw commented Aug 11, 2022

agileshaw commented Aug 11, 2022

addyess commented Aug 11, 2022

addyess commented Aug 11, 2022 • edited Loading

addyess commented Aug 11, 2022 • edited Loading

cderici commented Aug 11, 2022

cderici commented Aug 11, 2022

jameinel commented Aug 12, 2022 via email

agileshaw commented Aug 12, 2022

cderici commented Aug 16, 2022

`unit.run` output doesn't contain "results" field in v3.0.0, which breaks `run_on_unit` function in zaza #707

`unit.run` output doesn't contain "results" field in v3.0.0, which breaks `run_on_unit` function in zaza #707

addyess commented Aug 11, 2022 •

edited

Loading

addyess commented Aug 11, 2022 •

edited

Loading