Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invoke host-select via bash login shell #2555

Merged
merged 5 commits into from
Mar 18, 2022
Merged

Conversation

MetRonnie
Copy link
Contributor

@MetRonnie MetRonnie commented Mar 9, 2022

This closes #2552

Changes the command that gets run in the background, as follows:

 ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8 <hostname> \
-    env CYLC_VERSION=<version> rose host-select-client
+    env CYLC_VERSION=<version> CYLC_ENV_NAME=<$CYLC_ENV_NAME> bash -l -c 'rose host-select-client'

@MetRonnie MetRonnie added the bug label Mar 9, 2022
@MetRonnie MetRonnie added this to the 2.0rc2 milestone Mar 9, 2022
@MetRonnie MetRonnie self-assigned this Mar 9, 2022
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just need to drag through the value of CYLC_ENV_NAME provided by the wrapper script.

metomi/rose/host_select.py Outdated Show resolved Hide resolved
@MetRonnie
Copy link
Contributor Author

Might have to use exec, let me see

@MetRonnie MetRonnie marked this pull request as draft March 9, 2022 15:19
@MetRonnie
Copy link
Contributor Author

@oliver-sanders Any idea why this fails for localhost? I tried using exec instead of shlex.quoteing the command but that didn't work either

$ rose host-select -vv localhost
[INFO] Rank method: load
[INFO] env CYLC_VERSION=8.0rc2.dev bash -l -c \'rose\ host-select-client\' <<'__STDIN__'
[INFO] 
[INFO] ***start**
[INFO] [["getloadavg"], ["cpu_count"]]
[INFO] **end**
[INFO] 
[INFO] __STDIN__
[WARN] <name-of-localhost>: (failed 127)
[FAIL] No hosts selected.

@oliver-sanders
Copy link
Member

Not sure, worth checking whether stdin is being passed through properly.

@dpmatthews
Copy link
Member

The escaping is the problem:

$ ssh $(hostname) env CYLC_ENV_NAME=cylc-8.0rc2.dev-1 bash -l -c \'rose\ version\'
rose 2.0rc2.dev
$ env CYLC_ENV_NAME=cylc-8.0rc2.dev-1 bash -l -c \'rose\ version\'
bash: rose version: command not found
$ env CYLC_ENV_NAME=cylc-8.0rc2.dev-1 bash -l -c 'rose version'
rose 2.0rc2.dev

@MetRonnie
Copy link
Contributor Author

The escaping is only in the reporting/logging, it doesn't affect the call to subprocess.Popen(). You can see this by applying

diff --git a/metomi/rose/popen.py b/metomi/rose/popen.py
index 6a5249e5..074ade25 100644
--- a/metomi/rose/popen.py
+++ b/metomi/rose/popen.py
@@ -132,7 +132,7 @@ class RosePopener:
     def list_to_shell_str(cls, args):
         if not args:
             return ""
-        return " ".join([re.sub(r"([\"'\s])", r"\\\1", arg) for arg in args])
+        return " ".join(args)
 
     def __init__(self, event_handler=None):
         self.event_handler = event_handler

and it still fails in the same way

@MetRonnie
Copy link
Contributor Author

Ah wait, I think I've got it

@MetRonnie MetRonnie marked this pull request as ready for review March 10, 2022 17:40
@MetRonnie MetRonnie marked this pull request as draft March 10, 2022 20:10
@MetRonnie
Copy link
Contributor Author

MetRonnie commented Mar 10, 2022

I've figured out why the tests are failing this time round. It has brought up a question: rose-app.conf[poll]all-files and rose-app.conf[poll]any-files seem to accept globs but this is not documented in https://metomi.github.io/rose/2.0rc1/html/api/configuration/application.html#rose:conf:rose-app.conf[poll]all-files

I have updated the docs

@MetRonnie MetRonnie marked this pull request as ready for review March 14, 2022 13:30
Copy link
Member

@dpmatthews dpmatthews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this but not reviewed the change

@MetRonnie
Copy link
Contributor Author

MetRonnie commented Mar 14, 2022

MacOS test showing as failed but when you click on it the "re-run failed tests" step actually passed

@oliver-sanders
Copy link
Member

MacOS test showing as failed but when you click on it the "re-run failed tests" step actually passed

For the Rose repo we re-run tests in serial purely for debugging purposes, they are expected to pass first time around in parallel.

(in Cylc the functional tests are sufficiently flaky we've had no choice but to rerun them but this is really bad and something we should dispense with one day)

@MetRonnie
Copy link
Contributor Author

Actually the MacOS tests passed the first time around in the previous commit: https://github.com/metomi/rose/runs/5536961434?check_suite_focus=true

Most recent commit is only a change to documentation so shouldn't have affected it

@oliver-sanders
Copy link
Member

oliver-sanders commented Mar 15, 2022

(easier just to re-run)

(failed again, but only one test this time)

(kicked off a run against master to see if this issue has crept in https://github.com/metomi/rose/actions/runs/1986244844)

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

metomi/rose/app_run.py Outdated Show resolved Hide resolved
setup.cfg Outdated Show resolved Hide resolved
@oliver-sanders
Copy link
Member

Tests passed first time on master https://github.com/metomi/rose/actions/runs/1986244844

Perhaps some change here made tests incompatible when running in parallel?

@MetRonnie
Copy link
Contributor Author

MetRonnie commented Mar 15, 2022

Reran on master, failed: https://github.com/metomi/rose/actions/runs/1986244844

@oliver-sanders
Copy link
Member

oliver-sanders commented Mar 16, 2022

The error from the failed jobs is genuine and looks the same: #2558

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, everything worked as intended.

Had a quick look to see why we weren't quoting commands in the Popen thinggy, I think this is perfectly sensible.

@oliver-sanders
Copy link
Member

oliver-sanders commented Mar 16, 2022

Strangely t/rose-host-select/01-timeout.t failed, but only for Python 3.7 on Linux?

--- -	2022-03-16 15:26:42.653472890 +0000
+++ mock-ssh.out.sorted	2022-03-16 15:26:42.646377238 +0000
@@ -1,2 +1 @@
-sleepy1 bash -l -c 'rose host-select-client'
 sleepy2 bash -l -c 'rose host-select-client'

https://github.com/metomi/rose/runs/5572332850?check_suite_focus=true#step:12:238

Copy link
Contributor

@wxtim wxtim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Read code
  • Functionally tested

@wxtim wxtim merged commit dbbe46c into metomi:master Mar 18, 2022
@MetRonnie MetRonnie deleted the host-select branch March 18, 2022 12:51
@wxtim wxtim mentioned this pull request Mar 18, 2022
28 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

host-select: invoke via bash -l -c exec ...
4 participants