Allow sync wait period to be truncated by an HUP signal.#664
Allow sync wait period to be truncated by an HUP signal.#664k8s-ci-robot merged 1 commit intokubernetes:masterfrom
Conversation
|
Welcome @trulede! |
thockin
left a comment
There was a problem hiding this comment.
Surprisingly simple.
This also needs an e2e test (test_e2e.sh).
|
still needs e2e, also |
|
@thockin Unless, by some miracle, the e2e tests work ... I will need a bit more time to debug them. In the meantime, please let me know if you have any comments regarding test coverage. |
thockin
left a comment
There was a problem hiding this comment.
test looks good, but they can't be the same name :)
Here's my accumulated diff, after which tests pass!
diff --git a/test_e2e.sh b/test_e2e.sh
index bc8184e..352e573 100755
--- a/test_e2e.sh
+++ b/test_e2e.sh
@@ -115,12 +115,12 @@ function docker_kill() {
docker kill "$1" >/dev/null
}
-function docker_signal_process() {
- if [[ -z "$3" ]]; then
- echo "usage: $0 <id> <signal> <name>"
+function docker_signal() {
+ if [[ -z "$1" || -z "$2" ]]; then
+ echo "usage: $0 <id> <signal>"
return 1
fi
- docker exec -t $1 "killall -s $2 $3"
+ docker kill -s "$2" "$1" >/dev/null
}
# E2E_TAG is the tag used for docker builds. This is needed because docker
@@ -1074,7 +1074,7 @@ function e2e::sync_slow_git_long_timeout() {
##############################################
# Test sync-on-signal with SIGHUP
##############################################
-function e2e::sync_on_signal() {
+function e2e::sync_on_signal_name() {
# First sync
echo "$FUNCNAME 1" > "$REPO"/file
git -C "$REPO" commit -qam "$FUNCNAME 1"
@@ -1098,9 +1098,9 @@ function e2e::sync_on_signal() {
git -C "$REPO" commit -qam "$FUNCNAME 2"
# Send signal (note --period is 100s)
CTR=$(docker ps --filter label="git-sync-e2e=$RUNID" --format="{{.ID}}")
- docker_signal_process "$CTR", SIGHUP, git-sync
+ docker_signal "$CTR" SIGHUP
wait_for_sync 3
- aassert_link_exists "$ROOT"/link
+ assert_link_exists "$ROOT"/link
assert_file_exists "$ROOT"/link/file
assert_file_eq "$ROOT"/link/file "$FUNCNAME 2"
}
@@ -1108,7 +1108,7 @@ function e2e::sync_on_signal() {
##############################################
# Test sync-on-signal with 1 (SIGHUP)
##############################################
-function e2e::sync_on_signal() {
+function e2e::sync_on_signal_num() {
# First sync
echo "$FUNCNAME 1" > "$REPO"/file
git -C "$REPO" commit -qam "$FUNCNAME 1"
@@ -1132,9 +1132,9 @@ function e2e::sync_on_signal() {
git -C "$REPO" commit -qam "$FUNCNAME 2"
# Send signal (note --period is 100s)
CTR=$(docker ps --filter label="git-sync-e2e=$RUNID" --format="{{.ID}}")
- docker_signal_process "$CTR", 1, git-sync
+ docker_signal "$CTR" 1
wait_for_sync 3
- aassert_link_exists "$ROOT"/link
+ assert_link_exists "$ROOT"/link
assert_file_exists "$ROOT"/link/file
assert_file_eq "$ROOT"/link/file "$FUNCNAME 2"
}|
|
||
| // Setup signal notify channel | ||
| sigChan := make(chan os.Signal, 1) | ||
| signal.Ignore(syncSig) |
There was a problem hiding this comment.
I'm not sure I understand the need to ignore the signal?
What I don't like is that a signal which arrives JUST BEFORE we go to sleep will be totally ignored, but a signal 0.0001 seconds later will be received.
Why not just leave the signal active?
There was a problem hiding this comment.
What I found here was that if the signal was not disabled at this point, then the git process would receive the signal (and terminate). I would like to find a better way of dealing with that ... possibly having a signal handler function (goroutine) catch the signal.
There was a problem hiding this comment.
This might touch on what is happening (https://stackoverflow.com/questions/33165530/prevent-ctrlc-from-interrupting-exec-command-in-golang).
Perhaps a change here: https://github.com/trulede/git-sync/blob/3bf26ac2be804b7337a4951191fd636f2a0a5325/pkg/cmd/cmd.go#L56 , possibly triggered by putting something in the context (ctx) so that its only active when sigChan is set
There was a problem hiding this comment.
The answer at thar StackOverflow seems promising:
diff --git a/pkg/cmd/cmd.go b/pkg/cmd/cmd.go
index bdadf7d..c869228 100644
--- a/pkg/cmd/cmd.go
+++ b/pkg/cmd/cmd.go
@@ -22,6 +22,7 @@ import (
"fmt"
"os/exec"
"strings"
+ "syscall"
"github.com/go-logr/logr"
)
@@ -72,6 +73,9 @@ func runWithStdin(ctx context.Context, log logintf, cwd string, env []string, st
cmd.Stdout = outbuf
cmd.Stderr = errbuf
cmd.Stdin = bytes.NewBufferString(stdin)
+ cmd.SysProcAttr = &syscall.SysProcAttr{
+ Setpgid: true,
+ }
err := cmd.Run()
stdout := strings.TrimSpace(outbuf.String())Can we craft an e2e case that forces this condition?
There was a problem hiding this comment.
It might be possible to use the slow fetch script (see e2e::sync_fetch_skip_depth_1) and do something like this:
- Start the sync. Shell script for a "slow fetch" runs in own process, sleeps for 5, then runs the git command.
- Test sleeps 1 second, to be sure that the shell script has started.
- Test sends the signal. This signal should not be propagated to the shell script.
- Wait for the git-sync to complete, wait 4+3 seconds (remaining wait time + normal 3 seconds).
I will try it tomorrow.
There was a problem hiding this comment.
I had no luck in getting a signal to propagate from git-sync to the slow_git_clone.sh script. My last attempt looked like:
if [ "$1" != "clone" ]; then
git "$@"
exit $?
fi
set +m
signal_handler() {
echo XXXXX SIGNAL YYYYY
# exit
}
trap "signal_handler" HUP INT
sleep 5.1
git "$@"
Works perfectly fine from the shell.
There was a problem hiding this comment.
signals in shell are tricky - during an exec() the shell blocks signals. sleep is an exec(). Try this:
sleep 5.1 &
wait
wait is a shell builtin.
There was a problem hiding this comment.
signals in shell are tricky - during an exec() the shell blocks signals. sleep is an exec(). Try this:
sleep 5.1 &
wait
wait is a shell builtin.
There was a problem hiding this comment.
cmd.SysProcAttr = &syscall.SysProcAttr{ + Setpgid: true, + }
With Setpgid: false, and running this command:
$ bin/linux_amd64/git-sync --repo https://github.com/trulede/raspberry.docker.git --period 6s -v 5 --root /tmp/foo --sync-on-signal="SIGINT"
....
{"logger":"","ts":"2023-02-15 19:00:48.679291","caller":{"file":"main.go","line":794},"level":1,"msg":"next sync","waitTime":"6s"}
{"logger":"","ts":"2023-02-15 19:00:54.684689","caller":{"file":"main.go","line":715},"level":1,"msg":"syncing repo"}
{"logger":"","ts":"2023-02-15 19:00:54.684872","caller":{"file":"main.go","line":949},"level":0,"msg":"sanity-checking git repo","repo":"/tmp/foo"}
{"logger":"","ts":"2023-02-15 19:00:54.685222","caller":{"file":"cmd.go","line":55},"level":5,"msg":"running command","cwd":"/tmp/foo","cmd":"git rev-parse --show-toplevel"}
{"logger":"","ts":"2023-02-15 19:00:54.688410","caller":{"file":"cmd.go","line":55},"level":5,"msg":"running command","cwd":"/tmp/foo","cmd":"git fsck --no-progress --connectivity-only"}
{"logger":"","ts":"2023-02-15 19:00:54.704816","caller":{"file":"main.go","line":930},"level":0,"msg":"root directory is valid","path":"/tmp/foo"}
{"logger":"","ts":"2023-02-15 19:00:54.705000","caller":{"file":"cmd.go","line":55},"level":5,"msg":"running command","cwd":"/tmp/foo","cmd":"git rev-parse HEAD"}
{"logger":"","ts":"2023-02-15 19:00:54.708792","caller":{"file":"cmd.go","line":55},"level":5,"msg":"running command","cwd":"/tmp/foo","cmd":"git ls-remote -q https://github.com/trulede/raspberry.docker.git HEAD"}
^C{"logger":"","ts":"2023-02-15 19:00:54.988749","caller":{"file":"main.go","line":730},"msg":"too many failures, aborting","error":"Run(git ls-remote -q https://github.com/trulede/raspberry.docker.git HEAD): signal: interrupt: { stdout: \"\", stderr: \"\" }","failCount":1}
... it is possible to "CTRL-C" just as the next sync begins, killing the underlying command (git ls-remote in this case). As a result git-sync exits.
Setting Setpgid: true prevents that behaviour.
There was a problem hiding this comment.
Again, ^C seems to confuse things. Try this instead
rm -rf /tmp/gs; bin/linux_amd64/git-sync --one-time --repo https://github.com/kubernetes/kubernetes -v 5 --root /tmp/gs --sync-on-signal="SIGINT"
I chose this repo because it is big and slow to clone, making timing easier.
Now hit ^C and see that git does get killed:
{"logger":"","ts":"2023-02-15 10:43:33.392142","caller":{"file":"cmd.go","line":54},"level":5,"msg":"running command","cwd":"","cmd":"git clone --no-checkout https://github.com/kubernetes/kubernetes /tmp/gs"}
^C{"logger":"","ts":"2023-02-15 10:43:34.768060","caller":{"file":"main.go","line":725},"msg":"too many failures, aborting","error":"Run(git clone --no-checkout https://github.com/kubernetes/kubernetes /tmp/gs): signal: interrupt: { stdout: \"\", stderr: \"Cloning into '/tmp/gs'...\\nfetch-pack: unexpected disconnect while reading sideband packet\" }","failCount":1}
But this is not really the representative case. Now try:
rm -rf /tmp/gs; bin/linux_amd64/git-sync --one-time --repo https://github.com/kubernetes/kubernetes -v 5 --root /tmp/gs --sync-on-signal="SIGINT" & sleep 2; echo -ne "\n\nSENDING SIGKILL\n\n"; kill -INT $!; wait
And see
{"logger":"","ts":"2023-02-15 10:47:48.278868","caller":{"file":"cmd.go","line":54},"level":5,"msg":"running command","cwd":"","cmd":"git clone --no-checkout https://github.com/kubernetes/kubernetes /tmp/gs"}
SENDING SIGKILL
It will eventually complete the sync (it takes well over a minute for me :)
So, I think "hit ^C" is actually doing the right thing - you are telling your shell to "stop this process" which REALLY does mean the whole PGroup. ^C is more than "send a signal", and doesn't represent how people will use this.
|
While playing to try to repro the signal thing I accumulated this diff - please apply if it makes sense diff --git a/cmd/git-sync/main.go b/cmd/git-sync/main.go
index c94e0d0..7e6fa21 100644
--- a/cmd/git-sync/main.go
+++ b/cmd/git-sync/main.go
@@ -409,8 +409,12 @@ func main() {
} else {
// sync-on-signal value is a name
syncSig = unix.SignalNum(*flSyncOnSignal)
+ if syncSig == 0 {
+ // last resort - maybe they said "HUP", meaning "SIGHUP"
+ syncSig = unix.SignalNum("SIG" + *flSyncOnSignal)
+ }
}
- if unix.SignalName(syncSig) == "" {
+ if syncSig == 0 {
handleConfigError(log, true, "ERROR: --sync-on-signal must be a valid signal name or number")
}
}
@@ -681,7 +685,10 @@ func main() {
// Setup signal notify channel
sigChan := make(chan os.Signal, 1)
- signal.Ignore(syncSig)
+ if syncSig != 0 {
+ log.V(2).Info("installing signal handler", "signal", unix.SignalName(syncSig))
+ signal.Notify(sigChan, syncSig)
+ }
// Craft a function that can be called to refresh credentials when needed.
refreshCreds := func(ctx context.Context) error {
@@ -787,16 +794,14 @@ func main() {
cancel()
// Sleep until the next sync. If syncSig is set then the sleep may
- // be interrupted by that signal. After sleeping, ignore the signal
- // again (so that it does not interrupt the git sync).
- signal.Notify(sigChan, syncSig)
+ // be interrupted by that signal.
t := time.NewTimer(*flPeriod)
select {
case <-t.C:
case <-sigChan:
+ log.V(2).Info("caught signal", "signal", unix.SignalName(syncSig))
t.Stop()
}
- signal.Ignore(syncSig)
}
}
|
thockin
left a comment
There was a problem hiding this comment.
Other than these two points it looks good to me!
|
Also, you probably want to squash down to 1 commit with a good message and un "draft" this PR :) |
|
It looks like your rebase accidentally pulled another commit in - not sure why? Try a full |
Operation: git-sync --repo https://github.com/kubernetes/kubernetes --sync-on-signal SIGHUP git-sync --repo https://github.com/kubernetes/kubernetes --sync-on-signal HUP git-sync --repo https://github.com/kubernetes/kubernetes --sync-on-signal 1 Signals can be sent to docker containers with cmd: docker kill -signal SIGHUP <Container ID>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: thockin, trulede The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Should we close #660 now? Or leave it open in hopes that someone wants to do an HTTP version (which needs its own discussion :) |
I just closed it. Its easy enough to implement a sidecar to terminate HTTP and send a signal towards git-sync 💡 Thanks for all the help with the PR! |
For discussion related to issue #660
The HUP signal is used so that normal use of SIGINT (i.e. CTRL-C) is not affected.