Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect message "Auto packing the repository in background for optimum performance" #2221

Closed
YngveNPettersen opened this issue Jun 9, 2019 · 7 comments

Comments

@YngveNPettersen
Copy link

  • [x ] I was not able to find an open or closed issue matching what I'm seeing

Setup

  • Which version of Git for Windows are you using? Is it 32-bit or 64-bit?
git version 2.21.0.windows.1
cpu: x86_64
built from commit: 2481c4cbe949856f270a3ee80c802f5dd89381aa
sizeof-long: 4
sizeof-size_t: 8```

 - Which version of Windows are you running? Vista, 7, 8, 10? Is it 32-bit or 64-bit?

Win 7 and Win10, both 64-bit


 - What options did you set as part of the installation? Or did you choose the
   defaults?

Editor Option: VIM
Custom Editor Path:
Path Option: BashOnly
Plink Path: C:\Program Files\PuTTY\plink.exe
SSH Option: Plink
CURL Option: OpenSSL
CRLF Option: CRLFAlways
Bash Terminal Option: MinTTY
Performance Tweaks FSCache: Enabled
Use Credential Manager: Disabled
Enable Symlinks: Disabled```

  • Any other interesting things about your environment that might be related
    to the issue you're seeing?

Big repos

Details

  • Which terminal/shell are you running Git from? e.g Bash/CMD/PowerShell/other

Git Bash

Example: a rebase of a branch with a lot of commits; e.g from one chromium branch to another.
  • What did you expect to occur after running these commands?

Complete successfully

  • What actually happened instead?

Occasionally, especially after a big rebase, the command shell lock up for a while with the message "Auto packing the repository in background for optimum performance"

Considering that the current operation is blocked by this operation, and no new commands can be run, I think the phrase "in background" is incorrect and should be removed. Alternatively, the operation should be spawned into a true background operation.

  • If the problem was occurring with a specific repository, can you provide the
    URL to that repository to help us with testing?

see this on most repos with any kind of local commit activity

@dscho
Copy link
Member

dscho commented Oct 16, 2019

Considering that the current operation is blocked by this operation, and no new commands can be run, I think the phrase "in background" is incorrect and should be removed. Alternatively, the operation should be spawned into a true background operation.

I agree! And I would much prefer the true background operation.

Sadly, this seems to be not so trivial, as we cannot daemonize() on Windows: the daemonize() function uses the fork() syscall for which there is no equivalent in the Win32 API.

So what to do about it?

My best idea is to take a step back and reflect what the daemonize() thing accomplishes (see also this StackOverflow post):

  1. it spawns a new process
  2. crucially, it starts a new process group via setsid()

That second part is important: on Linux/Unix/macOS, this makes sure that if you terminate the calling process, it won't terminate the spawned process with it.

On Windows, the behavior is completely different. If you call TerminateProcess() on a process that spawned a child process, that child process will not be terminated.

Note: there is a way to terminate all console processes that are attached to the same console, which is somewhat similar in spirit, but it does not help us here, as Git often spawns GUI processes (e.g. Git GUI), and the caller of Git is typically attached to the same console.

To work around Git's expectations to that end, we have specific code in exit-process.h (which has not made it into git.git yet, it exists only in Git for Windows) to traverse the process tree so as to simulate the same process group idea as on Linux.

In particular, the process ID and the parent process ID of each process is looked up, and when we want to terminate a process, the process tree starting at that given process is enumerated and terminated.

Now, what happens if a process spawns a child process which in turn spawns a grand child process, and then the child process exits without terminating the spawned grand child process? Then the process tree is broken, and that grand child process is sort of "daemonized".

So this is my idea how we could simulate daemonize() on Windows: by spawning a new process with the same exact command-line parameters, except that the --detach is removed, then exiting with success.

In other words,

  1. A copy of the original argc/argv needs to be preserved, for use by daemonize(). This can be easily accomplished by using a struct argv_array saved = ARGV_ARRAY_INIT; and then calling argv_array_pushl(&saved, "git", "-c", "gc.autodetach=0"); argv_array_pushv(&saved, argv); before parsing the arguments in cmd_gc(). (Note the gc.autodetach=0 to force the process not to detach.)
  2. In daemonize(), in a clause guarded by #ifdef GIT_WINDOWS_NATIVE, we have to call mingw_spawnvpe(), taking pains to imitate sanitize_stdfds() by passing appropriate file descriptors for stdin, stdout and stderr: fdin = open("/dev/null", O_RDONLY, 0); fdout = open("/dev/null", O_WRONLY, 0); fdin = open("/dev/null", O_WRONLY, 0);. You really need to have three different ones, as the spawned process might close, say, stdin, but still might write to stderr.
  3. Call exit() after spawning, using an exit code that reflects whether the child process was spawned successfully.

An alternative would be to construct a child_process with those new arguments and then calling start_command(). However, we would then have to force the cleanup_children() function into not waiting for that process. Which boils down to using the clean_on_exit/RUN_CLEAN_ON_EXIT flag.

Yet another, completely different alternative would be to change things on the caller's side. But that would be slightly ugly: all callers that run auto-gc would have to be touched, and we would rely on the different behavior of exit() on Windows vs Linux, I think, where it would wait for child processes to exit on Linux but not on Windows.

Another downside of that alternative would be that it would work only for auto-gc, not for git daemon.

Speaking of git daemon: it would need the exact same treatment, as it also calls daemonize(). In this case, however, we have to edit the parameters to prevent the spawned process from trying to daemonize, rather than defining a config setting. In other words, just before passing the saved arguments, any --detach needs to be removed. There is no convenient API function yet, it would need to be done like this:

int i, j;
for (i = j = 0; i < saved.argc; i++)
    if (strcmp(saved.argv[i], "--detach"))
        saved.argv[j++] = saved.argv[i];
saved.argc = j;

Gladly, it is that easy because git daemon does all its command-line parsing itself (not using parse-options), and all parameters that take values are of the form --name=value (i.e. not --name value), therefore it is not possible that a parameter's value --detach would be mistaken for a parameter (think git grep -e --detach: in this case, --detach is not an parameter, but the value of the -e parameter).

Note: I suspect that even what I wrote above is not yet enough, we probably have to teach mingw_spawnvpe() a trick where it can optionally use the flag CREATE_NEW_PROCESS_GROUP and/or DETACHED_PROCESS to prevent a Ctrl+C in a CMD from terminating the spawned process (see https://docs.microsoft.com/en-us/windows/win32/procthread/process-creation-flags for a thorough explanation).

I will be honest: this is a moderately challenging project even for me, and I do not currently have the resources to tackle it. Are you up for it, @YngveNPettersen?

If so, I would strongly recommend to

  1. install Git for Windows' SDK,
  2. sdk cd git,
  3. build Git via make -j$(nproc)

to get started, then adding some code to git gc for testing this better. I could imagine, for example, that a warning("about to sleep"); sleep(10); warning("yawn!"); in the auto-gc, non-detached case would help with testing Ctrl+C in a CMD. You also might want to add another warning("spawned child"); sleep(10); in daemonize(), to have enough time to press Ctrl+C 😄

Speaking of CMD: if you want to test things in-place (i.e. without running make install and possibly messing up your SDK), just after building, you can use .\git --exec-path=C:\git-sdk-64\usr\src\git gc --auto.

@mfriedrich74
Copy link

mfriedrich74 commented Oct 16, 2019 via email

@dscho
Copy link
Member

dscho commented Oct 16, 2019

A similar topic was discussed here: https://stackoverflow.com/questions/604522/performing-equivalent-of-kill-process-tree-in-c-on-windows

There was a most crucial line in there: "You can enforce children processes to stay within the Job, by setting appropriate attribute."

In other words, you would have to change pretty much every caller.

The way the Windows port of Git works is that it tries to be minimally intrusive, often emulating POSIX functions using Win32 API functions. I don't think that would work here, in particular because we cannot control how git.exe itself is called. So that puts a pretty hard stop into the idea of switching to the Jobs API, I would think.

@mfriedrich74
Copy link

mfriedrich74 commented Oct 16, 2019 via email

@mfriedrich74
Copy link

mfriedrich74 commented Oct 16, 2019

The idea i had was to create a job at the central entry point (there aren't multiple, correct?) of the git wrapper. Then in daemonize() you break off from the job and create a new one.

(I do not know much about how git source is designed. I just wanted to contribute an idea. I posted this alternative idea because the above mentioned parent PID handling worried me, as i know that is not reliable. PIDs can be reused by windows once a process is terminated and all process handles closed. I think to guarantee a PID belongs to the the desired process you must hold a valid process handle to it. But... maybe the git code considers all this already. So, please ignore me if thats the case.)

@dscho
Copy link
Member

dscho commented Oct 18, 2019

You are right about all those PID problems, but I am afraid of the amount of complexity the jobs idea would bring. And I am certainly not willing to implement that.

The idea I presented here is much more contained, and changes Git's source code only minimally. It therefore has the best chance of giving us the solution we so desire.

@dscho
Copy link
Member

dscho commented Oct 15, 2021

Closing this stale ticket.

@dscho dscho closed this as completed Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants