Skip to content

Allow sync wait period to be truncated by an HUP signal.#664

Merged
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
trulede:master
Feb 15, 2023
Merged

Allow sync wait period to be truncated by an HUP signal.#664
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
trulede:master

Conversation

@trulede
Copy link
Copy Markdown
Contributor

@trulede trulede commented Jan 25, 2023

For discussion related to issue #660

The HUP signal is used so that normal use of SIGINT (i.e. CTRL-C) is not affected.

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Jan 25, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Jan 25, 2023
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @trulede!

It looks like this is your first PR to kubernetes/git-sync 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/git-sync has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jan 25, 2023
@trulede trulede marked this pull request as draft January 25, 2023 19:44
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jan 25, 2023
Copy link
Copy Markdown
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly simple.

This also needs an e2e test (test_e2e.sh).

Comment thread cmd/git-sync/main.go Outdated
Comment thread cmd/git-sync/main.go Outdated
Comment thread cmd/git-sync/main.go
Comment thread cmd/git-sync/main.go
@thockin
Copy link
Copy Markdown
Member

thockin commented Jan 29, 2023

still needs e2e, also

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 29, 2023
@thockin thockin added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jan 30, 2023
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 5, 2023
@trulede
Copy link
Copy Markdown
Contributor Author

trulede commented Feb 5, 2023

@thockin Unless, by some miracle, the e2e tests work ... I will need a bit more time to debug them. In the meantime, please let me know if you have any comments regarding test coverage.

Copy link
Copy Markdown
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test looks good, but they can't be the same name :)

Here's my accumulated diff, after which tests pass!

diff --git a/test_e2e.sh b/test_e2e.sh
index bc8184e..352e573 100755
--- a/test_e2e.sh
+++ b/test_e2e.sh
@@ -115,12 +115,12 @@ function docker_kill() {
     docker kill "$1" >/dev/null
 }
 
-function docker_signal_process() {
-    if [[ -z "$3" ]]; then
-        echo "usage: $0 <id> <signal> <name>"
+function docker_signal() {
+    if [[ -z "$1" || -z "$2" ]]; then
+        echo "usage: $0 <id> <signal>"
         return 1
     fi
-    docker exec -t $1 "killall -s $2 $3"
+    docker kill -s "$2" "$1" >/dev/null
 }
 
 # E2E_TAG is the tag used for docker builds.  This is needed because docker
@@ -1074,7 +1074,7 @@ function e2e::sync_slow_git_long_timeout() {
 ##############################################
 # Test sync-on-signal with SIGHUP
 ##############################################
-function e2e::sync_on_signal() {
+function e2e::sync_on_signal_name() {
      # First sync
     echo "$FUNCNAME 1" > "$REPO"/file
     git -C "$REPO" commit -qam "$FUNCNAME 1"
@@ -1098,9 +1098,9 @@ function e2e::sync_on_signal() {
     git -C "$REPO" commit -qam "$FUNCNAME 2"
     # Send signal (note --period is 100s)
     CTR=$(docker ps --filter label="git-sync-e2e=$RUNID" --format="{{.ID}}")
-    docker_signal_process "$CTR", SIGHUP, git-sync
+    docker_signal "$CTR" SIGHUP
     wait_for_sync 3
-    aassert_link_exists "$ROOT"/link
+    assert_link_exists "$ROOT"/link
     assert_file_exists "$ROOT"/link/file
     assert_file_eq "$ROOT"/link/file "$FUNCNAME 2"
 }
@@ -1108,7 +1108,7 @@ function e2e::sync_on_signal() {
 ##############################################
 # Test sync-on-signal with 1 (SIGHUP)
 ##############################################
-function e2e::sync_on_signal() {
+function e2e::sync_on_signal_num() {
      # First sync
     echo "$FUNCNAME 1" > "$REPO"/file
     git -C "$REPO" commit -qam "$FUNCNAME 1"
@@ -1132,9 +1132,9 @@ function e2e::sync_on_signal() {
     git -C "$REPO" commit -qam "$FUNCNAME 2"
     # Send signal (note --period is 100s)
     CTR=$(docker ps --filter label="git-sync-e2e=$RUNID" --format="{{.ID}}")
-    docker_signal_process "$CTR", 1, git-sync
+    docker_signal "$CTR" 1
     wait_for_sync 3
-    aassert_link_exists "$ROOT"/link
+    assert_link_exists "$ROOT"/link
     assert_file_exists "$ROOT"/link/file
     assert_file_eq "$ROOT"/link/file "$FUNCNAME 2"
 }

Comment thread cmd/git-sync/main.go Outdated
Comment thread cmd/git-sync/main.go Outdated

// Setup signal notify channel
sigChan := make(chan os.Signal, 1)
signal.Ignore(syncSig)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the need to ignore the signal?

What I don't like is that a signal which arrives JUST BEFORE we go to sleep will be totally ignored, but a signal 0.0001 seconds later will be received.

Why not just leave the signal active?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I found here was that if the signal was not disabled at this point, then the git process would receive the signal (and terminate). I would like to find a better way of dealing with that ... possibly having a signal handler function (goroutine) catch the signal.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might touch on what is happening (https://stackoverflow.com/questions/33165530/prevent-ctrlc-from-interrupting-exec-command-in-golang).

Perhaps a change here: https://github.com/trulede/git-sync/blob/3bf26ac2be804b7337a4951191fd636f2a0a5325/pkg/cmd/cmd.go#L56 , possibly triggered by putting something in the context (ctx) so that its only active when sigChan is set

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer at thar StackOverflow seems promising:

diff --git a/pkg/cmd/cmd.go b/pkg/cmd/cmd.go
index bdadf7d..c869228 100644
--- a/pkg/cmd/cmd.go
+++ b/pkg/cmd/cmd.go
@@ -22,6 +22,7 @@ import (
 	"fmt"
 	"os/exec"
 	"strings"
+	"syscall"
 
 	"github.com/go-logr/logr"
 )
@@ -72,6 +73,9 @@ func runWithStdin(ctx context.Context, log logintf, cwd string, env []string, st
 	cmd.Stdout = outbuf
 	cmd.Stderr = errbuf
 	cmd.Stdin = bytes.NewBufferString(stdin)
+	cmd.SysProcAttr = &syscall.SysProcAttr{
+		Setpgid: true,
+	}
 
 	err := cmd.Run()
 	stdout := strings.TrimSpace(outbuf.String())

Can we craft an e2e case that forces this condition?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be possible to use the slow fetch script (see e2e::sync_fetch_skip_depth_1) and do something like this:

  1. Start the sync. Shell script for a "slow fetch" runs in own process, sleeps for 5, then runs the git command.
  2. Test sleeps 1 second, to be sure that the shell script has started.
  3. Test sends the signal. This signal should not be propagated to the shell script.
  4. Wait for the git-sync to complete, wait 4+3 seconds (remaining wait time + normal 3 seconds).

I will try it tomorrow.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had no luck in getting a signal to propagate from git-sync to the slow_git_clone.sh script. My last attempt looked like:

if [ "$1" != "clone" ]; then
  git "$@"
  exit $?
fi

set +m
signal_handler() {
  echo XXXXX SIGNAL YYYYY
#  exit
}

trap "signal_handler" HUP INT
sleep 5.1
git "$@"

Works perfectly fine from the shell.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signals in shell are tricky - during an exec() the shell blocks signals. sleep is an exec(). Try this:

sleep 5.1 &
wait

wait is a shell builtin.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signals in shell are tricky - during an exec() the shell blocks signals. sleep is an exec(). Try this:

sleep 5.1 &
wait

wait is a shell builtin.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd.SysProcAttr = &syscall.SysProcAttr{
+		Setpgid: true,
+	}

With Setpgid: false, and running this command:

$ bin/linux_amd64/git-sync --repo https://github.com/trulede/raspberry.docker.git --period 6s -v 5 --root /tmp/foo --sync-on-signal="SIGINT"
....
{"logger":"","ts":"2023-02-15 19:00:48.679291","caller":{"file":"main.go","line":794},"level":1,"msg":"next sync","waitTime":"6s"}
{"logger":"","ts":"2023-02-15 19:00:54.684689","caller":{"file":"main.go","line":715},"level":1,"msg":"syncing repo"}
{"logger":"","ts":"2023-02-15 19:00:54.684872","caller":{"file":"main.go","line":949},"level":0,"msg":"sanity-checking git repo","repo":"/tmp/foo"}
{"logger":"","ts":"2023-02-15 19:00:54.685222","caller":{"file":"cmd.go","line":55},"level":5,"msg":"running command","cwd":"/tmp/foo","cmd":"git rev-parse --show-toplevel"}
{"logger":"","ts":"2023-02-15 19:00:54.688410","caller":{"file":"cmd.go","line":55},"level":5,"msg":"running command","cwd":"/tmp/foo","cmd":"git fsck --no-progress --connectivity-only"}
{"logger":"","ts":"2023-02-15 19:00:54.704816","caller":{"file":"main.go","line":930},"level":0,"msg":"root directory is valid","path":"/tmp/foo"}
{"logger":"","ts":"2023-02-15 19:00:54.705000","caller":{"file":"cmd.go","line":55},"level":5,"msg":"running command","cwd":"/tmp/foo","cmd":"git rev-parse HEAD"}
{"logger":"","ts":"2023-02-15 19:00:54.708792","caller":{"file":"cmd.go","line":55},"level":5,"msg":"running command","cwd":"/tmp/foo","cmd":"git ls-remote -q https://github.com/trulede/raspberry.docker.git HEAD"}
^C{"logger":"","ts":"2023-02-15 19:00:54.988749","caller":{"file":"main.go","line":730},"msg":"too many failures, aborting","error":"Run(git ls-remote -q https://github.com/trulede/raspberry.docker.git HEAD): signal: interrupt: { stdout: \"\", stderr: \"\" }","failCount":1}

... it is possible to "CTRL-C" just as the next sync begins, killing the underlying command (git ls-remote in this case). As a result git-sync exits.

Setting Setpgid: true prevents that behaviour.

Copy link
Copy Markdown
Member

@thockin thockin Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, ^C seems to confuse things. Try this instead

rm -rf /tmp/gs; bin/linux_amd64/git-sync --one-time --repo https://github.com/kubernetes/kubernetes -v 5 --root /tmp/gs --sync-on-signal="SIGINT"

I chose this repo because it is big and slow to clone, making timing easier.

Now hit ^C and see that git does get killed:

{"logger":"","ts":"2023-02-15 10:43:33.392142","caller":{"file":"cmd.go","line":54},"level":5,"msg":"running command","cwd":"","cmd":"git clone --no-checkout https://github.com/kubernetes/kubernetes /tmp/gs"}
^C{"logger":"","ts":"2023-02-15 10:43:34.768060","caller":{"file":"main.go","line":725},"msg":"too many failures, aborting","error":"Run(git clone --no-checkout https://github.com/kubernetes/kubernetes /tmp/gs): signal: interrupt: { stdout: \"\", stderr: \"Cloning into '/tmp/gs'...\\nfetch-pack: unexpected disconnect while reading sideband packet\" }","failCount":1}

But this is not really the representative case. Now try:

rm -rf /tmp/gs; bin/linux_amd64/git-sync --one-time --repo https://github.com/kubernetes/kubernetes -v 5 --root /tmp/gs --sync-on-signal="SIGINT" & sleep 2; echo -ne "\n\nSENDING SIGKILL\n\n"; kill -INT $!; wait

And see

{"logger":"","ts":"2023-02-15 10:47:48.278868","caller":{"file":"cmd.go","line":54},"level":5,"msg":"running command","cwd":"","cmd":"git clone --no-checkout https://github.com/kubernetes/kubernetes /tmp/gs"}


SENDING SIGKILL


It will eventually complete the sync (it takes well over a minute for me :)

So, I think "hit ^C" is actually doing the right thing - you are telling your shell to "stop this process" which REALLY does mean the whole PGroup. ^C is more than "send a signal", and doesn't represent how people will use this.

Comment thread cmd/git-sync/main.go Outdated
Comment thread test_e2e.sh Outdated
Comment thread test_e2e.sh Outdated
Comment thread test_e2e.sh Outdated
Comment thread test_e2e.sh Outdated
Comment thread test_e2e.sh Outdated
Comment thread test_e2e.sh Outdated
Comment thread test_e2e.sh Outdated
@thockin
Copy link
Copy Markdown
Member

thockin commented Feb 13, 2023

While playing to try to repro the signal thing I accumulated this diff - please apply if it makes sense

diff --git a/cmd/git-sync/main.go b/cmd/git-sync/main.go
index c94e0d0..7e6fa21 100644
--- a/cmd/git-sync/main.go
+++ b/cmd/git-sync/main.go
@@ -409,8 +409,12 @@ func main() {
                } else {
                        // sync-on-signal value is a name
                        syncSig = unix.SignalNum(*flSyncOnSignal)
+                       if syncSig == 0 {
+                               // last resort - maybe they said "HUP", meaning "SIGHUP"
+                               syncSig = unix.SignalNum("SIG" + *flSyncOnSignal)
+                       }
                }
-               if unix.SignalName(syncSig) == "" {
+               if syncSig == 0 {
                        handleConfigError(log, true, "ERROR: --sync-on-signal must be a valid signal name or number")
                }
        }
@@ -681,7 +685,10 @@ func main() {
 
        // Setup signal notify channel
        sigChan := make(chan os.Signal, 1)
-       signal.Ignore(syncSig)
+       if syncSig != 0 {
+               log.V(2).Info("installing signal handler", "signal", unix.SignalName(syncSig))
+               signal.Notify(sigChan, syncSig)
+       }
 
        // Craft a function that can be called to refresh credentials when needed.
        refreshCreds := func(ctx context.Context) error {
@@ -787,16 +794,14 @@ func main() {
                cancel()
 
                // Sleep until the next sync. If syncSig is set then the sleep may
-               // be interrupted by that signal. After sleeping, ignore the signal
-               // again (so that it does not interrupt the git sync).
-               signal.Notify(sigChan, syncSig)
+               // be interrupted by that signal.
                t := time.NewTimer(*flPeriod)
                select {
                case <-t.C:
                case <-sigChan:
+                       log.V(2).Info("caught signal", "signal", unix.SignalName(syncSig))
                        t.Stop()
                }
-               signal.Ignore(syncSig)
        }
 }
 

Comment thread cmd/git-sync/main.go Outdated
Copy link
Copy Markdown
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than these two points it looks good to me!

Comment thread pkg/cmd/cmd.go Outdated
Comment thread cmd/git-sync/main.go Outdated
@thockin
Copy link
Copy Markdown
Member

thockin commented Feb 15, 2023

Also, you probably want to squash down to 1 commit with a good message and un "draft" this PR :)

@k8s-ci-robot k8s-ci-robot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Feb 15, 2023
@thockin
Copy link
Copy Markdown
Member

thockin commented Feb 15, 2023

It looks like your rebase accidentally pulled another commit in - not sure why? Try a full get fetch upstream; git rebase -i upstream/master ?

Operation:
git-sync --repo https://github.com/kubernetes/kubernetes --sync-on-signal SIGHUP
git-sync --repo https://github.com/kubernetes/kubernetes --sync-on-signal HUP
git-sync --repo https://github.com/kubernetes/kubernetes --sync-on-signal 1

Signals can be sent to docker containers with cmd:
docker kill -signal SIGHUP <Container ID>
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Feb 15, 2023
@trulede trulede marked this pull request as ready for review February 15, 2023 19:52
@k8s-ci-robot k8s-ci-robot requested a review from nan-yu February 15, 2023 19:52
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 15, 2023
Copy link
Copy Markdown
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 15, 2023
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: thockin, trulede

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 15, 2023
@thockin
Copy link
Copy Markdown
Member

thockin commented Feb 15, 2023

Should we close #660 now? Or leave it open in hopes that someone wants to do an HTTP version (which needs its own discussion :)

@k8s-ci-robot k8s-ci-robot merged commit 2b3f1bc into kubernetes:master Feb 15, 2023
@trulede
Copy link
Copy Markdown
Contributor Author

trulede commented Feb 15, 2023

Should we close #660 now? Or leave it open in hopes that someone wants to do an HTTP version (which needs its own discussion :)

I just closed it. Its easy enough to implement a sidecar to terminate HTTP and send a signal towards git-sync 💡

Thanks for all the help with the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants