Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git file install: support for file creation from Git repositories #2754

Merged

Conversation

benfitzpatrick
Copy link
Contributor

Work in progress!

Aims to address #1419.

I'm not that happy with the configuration syntax, although I am happy with the three main elements of it as elements.

The mechanism itself uses filtering and either partial clones or proper sparse checkouts where the Git version allows.

@benfitzpatrick benfitzpatrick force-pushed the 1419.git_file_creation_loc_handler branch from 28e76e2 to 689784e Compare January 17, 2024 14:26
@wxtim
Copy link
Contributor

wxtim commented Jan 18, 2024

Let us know when you want us to have a review at this.

@oliver-sanders oliver-sanders added this to the 2.3.0 milestone Jan 18, 2024
@benfitzpatrick benfitzpatrick marked this pull request as ready for review January 23, 2024 13:32
@benfitzpatrick
Copy link
Contributor Author

Ready for review, not ready to go in IMO!

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

.github/workflows/test.yml Outdated Show resolved Hide resolved
metomi/rose/loc_handlers/git.py Outdated Show resolved Hide resolved
@MetRonnie MetRonnie self-requested a review February 23, 2024 11:46
@oliver-sanders
Copy link
Member

Ready for review, not ready to go in IMO!

@benfitzpatrick, what's left to do?

@benfitzpatrick
Copy link
Contributor Author

I'd like to do a bit of in-anger user-case testing if we feel it's close enough as-is? Shouldn't take too long.

@benfitzpatrick
Copy link
Contributor Author

I've done some testing against real cases I could find in our workflows, found and fixed some issues in afddb2b.

I tested a case with 6 GitHub file installs in a single rose-app.conf - we were worried about hitting a rate limit. It worked fine and looked like these happen sequentially, at least from the verbose output? Happy to demo offline.

Otherwise I'm happy

@oliver-sanders
Copy link
Member

looked like these happen sequentially

Curious, I'll check that's because GitHub is serving the requests sequentially rather than Rose failing to run them in parallel.

@benfitzpatrick
Copy link
Contributor Author

Little bump

@oliver-sanders
Copy link
Member

looked like these happen sequentially

Curious, I'll check that's because GitHub is serving the requests sequentially rather than Rose failing to run them in parallel.

After a bit of digging I'm happy that these operations are being run concurrently, resulting in their subprocesses being run in parallel.

To track subprocesses, try the following diff:

diff --git a/metomi/rose/popen.py b/metomi/rose/popen.py
index 8993fb93..8ec00c94 100644
--- a/metomi/rose/popen.py
+++ b/metomi/rose/popen.py
@@ -372,6 +372,7 @@ class RosePopener:
                 proc = Popen(args[0], **kwargs)
             else:
                 command = ' '.join(map(shlex.quote, args))
+                print(f'# {command}')  # process started
                 proc = await asyncio.create_subprocess_shell(command, **kwargs)
         except OSError as exc:
             if exc.filename is None and args:
@@ -392,7 +393,9 @@ class RosePopener:
         if isinstance(kwargs.get("stdin"), str):
             stdin = kwargs.get("stdin")
         stdout, stderr = await proc.communicate(stdin)
+        command = ' '.join(map(shlex.quote, args))
         await proc.wait()
+        print(f'$ {command}')  # process returned
         return proc.returncode, stdout, stderr
 
     async def run_ok_async(self, *args, **kwargs):

Using this diff with the following config:

[command]
default=true

[file:cylc/scheduler.py]
source=git:git@github.com:cylc/cylc-flow.git::cylc/flow/scheduler.py::master

[file:cylc/task_proxy.py]
source=git:https://github.com/cylc/cylc-flow.git::cylc/flow/task_proxy.py::master

[file:cylc/async_util.py]
source=git:https://github.com/cylc/cylc-flow.git::cylc/flow/async_util.py::master

I get this output showing up to three subprocesses running simultaneously:

# git --git-dir=/var/tmp/tmpqqkj_tvk/.git init
# git --git-dir=/var/tmp/tmpd4c71k8x/.git init
# git --git-dir=/var/tmp/tmp50f2vgwd/.git init
$ git --git-dir=/var/tmp/tmpqqkj_tvk/.git init
# git --git-dir=/var/tmp/tmpqqkj_tvk/.git remote add origin https://github.com/cylc/cylc-flow.git
$ git --git-dir=/var/tmp/tmp50f2vgwd/.git init
# git --git-dir=/var/tmp/tmp50f2vgwd/.git remote add origin https://github.com/cylc/cylc-flow.git
$ git --git-dir=/var/tmp/tmpd4c71k8x/.git init
# git --git-dir=/var/tmp/tmpd4c71k8x/.git remote add origin git@github.com:cylc/cylc-flow.git
$ git --git-dir=/var/tmp/tmpqqkj_tvk/.git remote add origin https://github.com/cylc/cylc-flow.git
# git --git-dir=/var/tmp/tmpqqkj_tvk/.git fetch --depth=1 origin 4b70574c9100e1cf830a226c254cb3159a558873
$ git --git-dir=/var/tmp/tmp50f2vgwd/.git remote add origin https://github.com/cylc/cylc-flow.git
# git --git-dir=/var/tmp/tmp50f2vgwd/.git fetch --depth=1 origin 4b70574c9100e1cf830a226c254cb3159a558873
$ git --git-dir=/var/tmp/tmpd4c71k8x/.git remote add origin git@github.com:cylc/cylc-flow.git
# git --git-dir=/var/tmp/tmpd4c71k8x/.git fetch --depth=1 origin 4b70574c9100e1cf830a226c254cb3159a558873
$ git --git-dir=/var/tmp/tmpqqkj_tvk/.git fetch --depth=1 origin 4b70574c9100e1cf830a226c254cb3159a558873
# git --git-dir=/var/tmp/tmpqqkj_tvk/.git --work-tree=/var/tmp/tmpqqkj_tvk checkout 4b70574c9100e1cf830a226c254cb3159a558873
$ git --git-dir=/var/tmp/tmp50f2vgwd/.git fetch --depth=1 origin 4b70574c9100e1cf830a226c254cb3159a558873
# git --git-dir=/var/tmp/tmp50f2vgwd/.git --work-tree=/var/tmp/tmp50f2vgwd checkout 4b70574c9100e1cf830a226c254cb3159a558873
$ git --git-dir=/var/tmp/tmp50f2vgwd/.git --work-tree=/var/tmp/tmp50f2vgwd checkout 4b70574c9100e1cf830a226c254cb3159a558873
# rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmp50f2vgwd/cylc/flow/async_util.py /var/tmp/tmph9lc16hm/f13f5dacdb0f632826821bf9278a10fc
$ git --git-dir=/var/tmp/tmpqqkj_tvk/.git --work-tree=/var/tmp/tmpqqkj_tvk checkout 4b70574c9100e1cf830a226c254cb3159a558873
# rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmpqqkj_tvk/cylc/flow/task_proxy.py /var/tmp/tmph9lc16hm/759c658589e37a9af232bfa8e03d01b0
$ rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmpqqkj_tvk/cylc/flow/task_proxy.py /var/tmp/tmph9lc16hm/759c658589e37a9af232bfa8e03d01b0
$ rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmp50f2vgwd/cylc/flow/async_util.py /var/tmp/tmph9lc16hm/f13f5dacdb0f632826821bf9278a10fc
$ git --git-dir=/var/tmp/tmpd4c71k8x/.git fetch --depth=1 origin 4b70574c9100e1cf830a226c254cb3159a558873
# git --git-dir=/var/tmp/tmpd4c71k8x/.git --work-tree=/var/tmp/tmpd4c71k8x checkout 4b70574c9100e1cf830a226c254cb3159a558873
$ git --git-dir=/var/tmp/tmpd4c71k8x/.git --work-tree=/var/tmp/tmpd4c71k8x checkout 4b70574c9100e1cf830a226c254cb3159a558873
# rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmpd4c71k8x/cylc/flow/scheduler.py /var/tmp/tmph9lc16hm/8a367e85378d4334d5133fa065d71116
$ rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmpd4c71k8x/cylc/flow/scheduler.py /var/tmp/tmph9lc16hm/8a367e85378d4334d5133fa065d71116

You can also test it like this:

diff --git a/metomi/rose/popen.py b/metomi/rose/popen.py
index 8993fb93..35bdbe27 100644
--- a/metomi/rose/popen.py
+++ b/metomi/rose/popen.py
@@ -371,7 +371,8 @@ class RosePopener:
             if kwargs.get("shell"):
                 proc = Popen(args[0], **kwargs)
             else:
-                command = ' '.join(map(shlex.quote, args))
+                # command = ' '.join(map(shlex.quote, args))
+                command = 'sleep 5'
                 proc = await asyncio.create_subprocess_shell(command, **kwargs)
         except OSError as exc:
             if exc.filename is None and args:

The time taken should increase linearly as the sleep is increased, but only see a small increase (parsing overheads) when more files are configured for installation.

sphinx/api/configuration/file-creation.rst Outdated Show resolved Hide resolved
metomi/rose/loc_handlers/git.py Outdated Show resolved Hide resolved
metomi/rose/loc_handlers/git.py Show resolved Hide resolved
@benfitzpatrick
Copy link
Contributor Author

Curious, I'll check that's because GitHub is serving the requests sequentially rather than Rose failing to run them in parallel.

After a bit of digging I'm happy that these operations are being run concurrently, resulting in their subprocesses being run in parallel.

Ah, I know why I thought they were sequential - the parsing is serial, the actual pulling can be concurrent. Sorry-thanks!

if loc.loc_type == "tree":
dest += "/"
cmd = self.manager.popen.get_cmd("rsync", name, dest)
await self.manager.popen.run_ok_async(*cmd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting a failure here:

[INFO] 2024-06-03T11:56:18+0100 rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmpzpxb2lpj/README.md /var/tmp/tmpjul8aid7/e99d80506f38bfee39893fd042f2f5d6
[FAIL] 2024-06-03T11:56:19+0100 rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmpzpxb2lpj/README.md /var/tmp/tmpjul8aid7/e99d80506f38bfee39893fd042f2f5d6 # return-code=3, stderr=
[FAIL] 2024-06-03T11:56:19+0100 rsync: change_dir#3 "/var/tmp/tmpjul8aid7" failed: No such file or directory (2)
[FAIL] 2024-06-03T11:56:19+0100 rsync error: errors selecting input/output files, dirs (code 3) at main.c(694) [Receiver=3.1.2]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(status update - we're finding it hard to independently reproduce this one, currently a mystery)

@benfitzpatrick benfitzpatrick force-pushed the 1419.git_file_creation_loc_handler branch from ad11e99 to 5a16399 Compare June 5, 2024 10:08
benfitzpatrick and others added 2 commits June 5, 2024 11:38
Co-authored-by: Oliver Sanders <oliver.sanders@metoffice.gov.uk>
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review - LGTM, some small comments.

@@ -856,7 +857,7 @@ def parse(self, loc, conf_tree):
# Scheme specified in the configuration.
handler = self.get_handler(loc.scheme)
if handler is None:
raise ValueError(loc.name)
raise ValueError(f"don't support scheme {loc.scheme}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] maybe a slightly cryptic error message for users to digest:

Suggested change
raise ValueError(f"don't support scheme {loc.scheme}")
raise ValueError(f"Rose doesn't support scheme {loc.scheme}")

@@ -865,7 +866,7 @@ def parse(self, loc, conf_tree):
if handler is None:
handler = self.guess_handler(loc)
if handler is None:
raise ValueError(loc.name)
raise ValueError(f"don't know how to process {loc.name}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] maybe a slightly cryptic error message for users to digest:

Suggested change
raise ValueError(f"don't know how to process {loc.name}")
raise ValueError(f"Rose doesn't know how to process {loc.name}")

# sparse-checkout available and suitable for this case.
await self.manager.popen.run_ok_async(
"git", git_dir_opt, "sparse-checkout", "set", path,
"--no-cone"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is required, but worth noting the documented warnings:

https://git-scm.com/docs/git-sparse-checkout#_commands

"git", git_dir_opt, f"--work-tree={tmpdirname}", "checkout",
loc.key
)
name = tmpdirname + "/" + path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] os.path.join would be nicer.

sphinx/api/configuration/file-creation.rst Outdated Show resolved Hide resolved
Comment on lines 61 to 63
You should set ``git config uploadpack.allowFilter true`` and
``git config uploadpack.allowAnySHA1InWant true`` on repositories
if you are setting them up to pull from.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth explaining that this only applies to repos within your management (e.g. local repos).

E.G. this doesn't apply to a GitHub hosted repository (I think)?

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@wxtim
Copy link
Contributor

wxtim commented Jun 6, 2024

I've opened #2784 - it looks like the bug I found is unrelated to this work.

Co-authored-by: Oliver Sanders <oliver.sanders@metoffice.gov.uk>
metomi/rose/loc_handlers/git.py Outdated Show resolved Hide resolved
metomi/rose/loc_handlers/git.py Outdated Show resolved Hide resolved
metomi/rose/loc_handlers/git.py Outdated Show resolved Hide resolved
metomi/rose/loc_handlers/git.py Show resolved Hide resolved
benfitzpatrick and others added 2 commits June 13, 2024 08:48
Co-authored-by: Ronnie Dutta <61982285+MetRonnie@users.noreply.github.com>
@oliver-sanders
Copy link
Member

Whilst working on #2785, I've noticed that "cannot connect to" type errors appear to surface with the following message:

ConfigProcessError: file:README.md=source=git:https@github.com:metomi/rose::README.md::master: ls-remote: could not find ref 'master' in 'git@github.com:metomi/rose'

Where the underlying cause here is that GitHub SSH access has not been configured.

@benfitzpatrick
Copy link
Contributor Author

I could do a "or remote not contactable?" on the end of the error message?

@oliver-sanders
Copy link
Member

If it's possible to differentiate between these cases, great, but if not that'll be fine.

@oliver-sanders
Copy link
Member

Waiting for tests to pass again, then ready to merge.

@oliver-sanders
Copy link
Member

oliver-sanders commented Jun 18, 2024

Good news, the tests pass.

Bad news, if some of the tests skip, then the test fails, e.g:

diff --git a/t/rose-app-run/28-git.t b/t/rose-app-run/28-git.t
index 1b4eaaa90..92a411611 100644
--- a/t/rose-app-run/28-git.t
+++ b/t/rose-app-run/28-git.t
@@ -72,12 +72,12 @@ remote_locations=("$HOSTNAME:$TEST_DIR/hellorepo/" "http://localhost:$GIT_WS_POR
for i in 0 1 2; do
     remote_mode="${remote_test_modes[$i]}"
     remote="${remote_locations[$i]}"
-    if [[ "$remote_mode" == "ssh" ]] && ! ssh -n -q -oBatchMode=yes $HOSTNAME true 1>'/dev/null' 2>/dev/null; then
+    if true; then
         skip 14 "cannot ssh to localhost $HOSTNAME"
         echo "Skip $remote" >/dev/tty        
         continue
     fi
-    if [[ "$remote_mode" == "http" ]] && ! curl --head --silent --fail $remote >/dev/null 2>&1; then
+    if true; then
         skip 14 "failed to launch http on localhost"
         echo "Skip $remote" >/dev/tty        
         continue

:(

[FAIL] ../config: rose-app.conf not found.
Dubious, test returned 1 (wstat 256, 0x100)
Failed 15/57 subtests 
	(less 42 skipped subtests: 0 okay)
Test Summary Report
-------------------
t/rose-app-run/28-git.t (Wstat: 256 Tests: 42 Failed: 0)
  Non-zero exit status: 1
  Parse errors: Bad plan.  You planned 57 tests but ran 42.

Not a release blocker. Suggest bumping this to future work.

Opened: #2789

@oliver-sanders
Copy link
Member

@MetRonnie, could you check the last couple of commits and merge if you're happy with the tests being pushed to a bugfix release milestone.

@oliver-sanders
Copy link
Member

Brill, that's fixed it (will close issue). Tested with all possible skip combinations, test passed each time.

@oliver-sanders
Copy link
Member

(Mac OS tests failing due to gh-actions DNS issues, safe to ignore)

Copy link
Contributor

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge if you're happy with the tests being pushed to a bugfix release milestone.

It's a minor release rather than a bugfix release though?

@oliver-sanders
Copy link
Member

(the test fixes would have been on a bugfix release milestone, however, there is no need for a post-fix, Ben has sorted it on this PR)

@oliver-sanders oliver-sanders merged commit 3bd74d5 into metomi:master Jun 18, 2024
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants