Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unit.run output doesn't contain "results" field in v3.0.0, which breaks run_on_unit function in zaza #707

Closed
agileshaw opened this issue Aug 11, 2022 · 13 comments

Comments

@agileshaw
Copy link

With python-libjuju 3.0.0, unit.run returns less information than before, notably the absent of the results field. I think the commit causes this change is a1c4f20 (when self.connection.is_using_old_client is true).

With 3.0.0

>>> print(action.data)
{'model-uuid': '12f9d804-4369-4071-86ee-3dfaaae00ed4', 'id': '26', 'receiver': 'ubuntu/0', 'name': 'juju-run', 'status': 'completed', 'message': '', 'enqueued': '2022-08-10T23:10:02Z', 'started': '2022-08-10T23:10:06Z', 'completed': '2022-08-10T23:10:06Z'}

With 2.9.11

>>> action = await unit.run("df -h", timeout=None)
>>> print(action.data)
{'model-uuid': '12f9d804-4369-4071-86ee-3dfaaae00ed4', 'id': '28', 'receiver': 'ubuntu/0', 'name': 'juju-run', 'parameters': {'command': 'df -h', 'timeout': 0, 'workload-context': False}, 'status': 'completed', 'message': '', 'results': {'Code': '0', 'Stdout': 'Filesystem      Size  Used Avail Use% Mounted on\nudev            975M     0  975M   0% /dev\ntmpfs           199M  1.1M  198M   1% /run\n/dev/vda1        20G  2.0G   18G  11% /\ntmpfs           992M     0  992M   0% /dev/shm\ntmpfs           5.0M     0  5.0M   0% /run/lock\ntmpfs           992M     0  992M   0% /sys/fs/cgroup\n/dev/loop0       62M   62M     0 100% /snap/core20/1518\n/dev/loop1       68M   68M     0 100% /snap/lxd/22753\n/dev/loop2       47M   47M     0 100% /snap/snapd/16292\n/dev/vda15      105M  5.2M  100M   5% /boot/efi\n/dev/vdb         40G   24K   38G   1% /mnt\ntmpfs           1.0M     0  1.0M   0% /var/snap/lxd/common/ns\n'}, 'enqueued': '2022-08-10T23:46:05Z', 'started': '2022-08-10T23:46:06Z', 'completed': '2022-08-10T23:46:07Z'}

This causes the zaza functional tests failure when executing run_on_unit function.


Additional information

>>> print(unit.connection.info["server-version"])
2.9.29
@cderici
Copy link
Member

cderici commented Aug 11, 2022

Hi @agileshaw, thanks for your detailed input! Yeah this is a result of an upstream change in the coming juju 3.0 api, things got confused a little bit because we wanted pylibjuju to support both 2.9 and 3.0 at the same time.

I believe #706 should fix this. We'll keep working on it if it doesn't 👍

@agileshaw
Copy link
Author

Hello @cderici . Thank you for the fast response. I would like to let you know that, unfortunately, v3.0.1 didn't solve the above-mentioned issue. It seems that the problem sits in the unit.run function, which was not in the scope of the newly merged PR .

@cderici
Copy link
Member

cderici commented Aug 11, 2022

Ah yes I thought I saw run_action there, sorry about that.

So this is actually a change in the juju api before the juju 3.0. The tldr is, instead of getting the data like action.data, you'll need to wait on it first, e.g. action = await action.wait(), and then access the results of running the action via action.results. The long story is below.

So, looks like you're using slightly older Juju from latest/stable. The problem is, the Juju you're using, the 2.9.29, is using the Action Facade v6, whereas the Juju latest/stable (2.9.33) (and later versions like 3.0) is using the Action Facade v7 (that uses a new Run function that's not anymore a synchronous function -- it queues operations and instead of the results we get the enqueued actions to later "wait" on to get the results).

These sorts of breaking changes (that we had to do to support the Juju 3.0) are actually why we had a major version bump in python-libjuju. However, we're still trying to support 2.9 track as well, so I think we should've made these clearer in the changelogs.

What I'm doing right now is that I'm cleaning up the run function (looks like it needs a bit of love too) and in the PR I'll make sure to add a test that shows the use of this new run version. Hope this helps 👍

@agileshaw
Copy link
Author

Thanks for the detailed explanation.

In that case, I think zaza test would need to be updated to facilitate this API change once they unpin libjuju<3.0.0. I'm looking forward to the new PR and seeing the usage of the run function.

@agileshaw
Copy link
Author

I was also wondering if it is possible to have the action = await action.wait() build into the run function when self.connection.is_using_old_client is evaluated to be true. This way the return object from this function would be consistent for both types of connections.

@addyess
Copy link
Contributor

addyess commented Aug 11, 2022

👀 i'm SUPER interested in this answer

@addyess
Copy link
Contributor

addyess commented Aug 11, 2022

OK, so more info

i'm getting different results dictionaries on a few different runs:

here's the setup

>>> import asyncio
>>> def do(c):
...     return asyncio.get_event_loop().run_until_complete(c)
...
>>> from juju.model import Model
>>> model = Model()

On Vsphere (addyess-vsphere k8s-vsphere/Boston 2.9.31)

>>> do(model.connect_model("addyess-vsphere:admin/default"))
>>> unit = model.units['etcd/0']
>>> action = do(unit.run("whoami"))
>>> result = do(action.wait())
>>> result.results
{'Code': '0', 'Stdout': 'root\n'}

on Azure (addyess-azure azure/centralus 2.9.33)

>>> do(model.connect_model("addyess-azure:admin/default"))
>>> unit = model.units['etcd/0']
>>> action = do(unit.run("whoami"))
>>> result = do(action.wait())
>>> result.results
{'return-code': 0, 'stdout': 'root\n'}

@addyess
Copy link
Contributor

addyess commented Aug 11, 2022

here's some fun antics i'm taking to "work-around"

async def juju_run(unit: Unit, cmd):
    action = await unit.run(cmd)
    result = await action.wait()
    code = str(result.results.get("Code") or result.results.get("return-code"))
    stdout = result.results.get("Stdout") or result.results.get("stdout")
    stderr = result.results.get("Stderr") or result.results.get("stderr")
    assert code == "0", f"{cmd} failed ({code}): {stderr or stdout}"
    return stdout

cderici added a commit to cderici/python-libjuju that referenced this issue Aug 11, 2022
... regardless of the version of the client and/or facade being used

Fixes juju#707
@cderici
Copy link
Member

cderici commented Aug 11, 2022

i'm getting different results dictionaries on a few different runs:

To see why the internals of the received objects are different, I'd need to look at Juju's side of things @addyess. It is possible that 2.9.33 to use ActionFacade v7, while the 2.9.31 is using v6 and whoever made that change in the Juju side (added v7) wasn't being consistent there. Regardless, this looks like an issue on the Juju side, rather than python-libjuju 👍

@cderici
Copy link
Member

cderici commented Aug 11, 2022

I was also wondering if it is possible to have the action = await action.wait() build into the run function when self.connection.is_using_old_client is evaluated to be true. This way the return object from this function would be consistent for both types of connections.

I talked about your request in the PR a little bit, @agileshaw . I think it's valid (and also a very easy/cheap way for me to keep the backwards compatibility), however, I think it would only make sense if we were releasing for the 2.9 track. I don't think it's a good idea to do that in the 3.0. You can see my explanation in #710. Hope this helps, and hope #710 solves your issue 👍

addyess added a commit to addyess/python-libjuju that referenced this issue Aug 12, 2022
juju#707

I agree there's inconsistency from the juju api response there.  Since `action.wait()` ultimately just fills out the `results` dict more, what we we add some decode logic to the `Action` object to handle the various facades?  

Action could have an attribute that returned an object like `subprocess.CompletedProcess`[https://docs.python.org/3/library/subprocess.html#subprocess.CompletedProcess] 

users of the unit.run class could do this:

```python
    response = await unit.run(cmd)
    response.completed.check_returncode()  # raises a known exception for non-zero return codes
    stdout = response.completed.stdout
```
@jameinel
Copy link
Member

jameinel commented Aug 12, 2022 via email

@agileshaw
Copy link
Author

@cderici Thanks for the explanation and PR #710. I tend to agree with the approach suggested by @jameinel . Since the unit.run function is already checking the connection type (Action Facade v6 vs v7), it seems redundant to let the user check it again on their side to determine what kind of results they are expecting. It would make much more sense if the returned result is consistent.

@cderici
Copy link
Member

cderici commented Aug 16, 2022

@agileshaw Now with the #710 landing the results should be consistent across the board. Also, see dcadb7a adds a little convenience flag to be used with the older versions, or for whoever wants to block until a result is there, hope it makes it easier for you to use 👍

jujubot added a commit that referenced this issue Aug 16, 2022
#710

#### Description

This is PR complements #706, and tries to address the issues being discussed in #707, by updating the `old_client`s with `2.9.34` schema and fixing the `unit.run()` function to be asynchronous, as it's (breakingly) changed to be in the `Juju 3.0`. This also brings `unit.run` and `unit.run_action` functions very close, the only difference is that the former is using `Action.Run` while the latter is using `Action.Enqueue` functions. It's plausible that one of these ways of doing the similar thing will disappear in the future since the newer versions of `Action.Run` is asynchronous anyway.

Should fix the #707.

#### QA Steps

I updated the integration tests for `run()`, so basically all the tests in the following suite (that includes `test_run` and `test_run_action`) should pass:

```python
tox -e integration -- tests/integration/test_unit.py
``` 

Additionally I added the example that I have been using myself to test this into the examples folder, so it should be running fine:

```
 $ python examples/run_action.py
```

Both the integration tests (`test_run` & `test_run_action`) and the example module are tested in the following controllers:

```
Controller Model User Access Cloud/Region Models Nodes HA Version
caner-k8s test admin superuser microk8s/localhost 2 - - 2.9.33 
caner-k8s-2929 testhede admin superuser microk8s/localhost 2 - - 2.9.29 
caner-k8s-2931 testhaylaz admin superuser microk8s/localhost 2 - - 2.9.31 
caner-k8s-2934* haylazlar admin superuser microk8s/localhost 2 - - 2.9.34 
caner-lxd-2.9 default admin superuser localhost/localhost 3 2 none 2.9.34.1 
caner-lxd-2929 default admin superuser localhost/localhost 2 1 none 2.9.29 
caner-lxd-2931 default admin superuser localhost/localhost 2 1 none 2.9.31 
caner-lxd-2932 default admin superuser localhost/localhost 2 1 none 2.9.32 
caner-lxd-3.0 canhiras admin superuser localhost/localhost 1 1 none 3.0-rc1.1 
k8s-30 test4 admin superuser microk8s/localhost 2 - - 3.0-rc1 
```

Note that, all the clients that I used `<2.9.34` are using the `ActionFacade v6`, and the ones `>=2.9.34` are using the `ActionFacade v7` (regardless of `2.9` vs` 3.0`).

#### Notes & Discussion

One of the requests was to keep the `unit.run()` function synchronous for any client using `<ActionFacade v6`. While this is a valid request from the user perspective, it also means that we would have the same function returning different types (either a dictionary or an action object) depending on the facade being used, which would make it way harder to maintain. That's why I kept the change for all the versions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants