Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pull: release handles to pack files before potentially gc'ing #1032

Closed

Conversation

dscho
Copy link
Member

@dscho dscho commented Sep 8, 2021

It was reported in Git for Windows that there are still some code paths where .pack files are still open when a garbage collection is triggered. This time, the affected command is git pull.

The fix is relatively trivial but uncovers a bug in the commit-graph code, which we have to fix first.

Cc: Derrick Stolee dstolee@microsoft.com

The slab has information about the commit graph. That means that it is
meaningless (and even misleading) when the commit graph was closed.

This seems not to matter currently, but we're about to fix a
Windows-specific bug where `git pull` does not close the object store
before fetching (risking that an implicit auto-gc fails to remove the
now-obsolete pack file(s)), and once we have that bug fix in place, it
does matter: after that bug fix, we will open the object store, do some
stuff with it, then close it, fetch, and then open it again, and do more
stuff. If we close the commit graph without releasing the corresponding
slab, we're hit by a symptom like this in t5520.19:

	BUG: commit-reach.c:85: bad generation skip 9223372036854775807
	> 3 at 5cd378271655d43a3b4477520014f02213ad1546

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On Windows, files cannot be removed nor renamed if there are still
handles held by a process. To remedy that, we try to release all open
handles to any `.pack` file before e.g. repacking (which would want to
remove the original `.pack` file(s) after it is done).

Since the `read_cache_unmerged()` and/or the `get_oid()` call in `git
pull` can cause `.pack` files to be opened, we need to release the open
handles before calling `git fetch`: the latter process might want to
spawn an auto-gc, which in turn might want to repack the objects.

This commit is similar in spirit to 5bdece0 (gc/repack: release
packs when needed, 2018-12-15).

This fixes git-for-windows#3336.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho
Copy link
Member Author

dscho commented Sep 8, 2021

/submit

@dscho dscho self-assigned this Sep 8, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Sep 8, 2021

Submitted as pull.1032.git.1631089771.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-1032/dscho/release-packs-before-fetching-v1

To fetch this version to local tag pr-1032/dscho/release-packs-before-fetching-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-1032/dscho/release-packs-before-fetching-v1

@@ -26,6 +26,7 @@
#include "wt-status.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> On Windows, files cannot be removed nor renamed if there are still
> handles held by a process. To remedy that, we try to release all open
> handles to any `.pack` file before e.g. repacking (which would want to
> remove the original `.pack` file(s) after it is done).
>
> Since the `read_cache_unmerged()` and/or the `get_oid()` call in `git
> pull` can cause `.pack` files to be opened, we need to release the open
> handles before calling `git fetch`: the latter process might want to
> spawn an auto-gc, which in turn might want to repack the objects.
>
> This commit is similar in spirit to 5bdece0d705 (gc/repack: release
> packs when needed, 2018-12-15).
>
> This fixes https://github.com/git-for-windows/git/issues/3336.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  builtin/pull.c | 2 ++
>  1 file changed, 2 insertions(+)

After run_fetch() returns, we then go on to access objects from our
object store (that's natural---after all, we fetched because we
wanted to access the objects we have plus objects they have to offer
to us) and the object store is transparently reopened for us.  Which
may make a bit confusing API to newbies, but is an easy one to use,
once we get used to it.

A few general comments.

 * Right now, run_fetch() does not do anything that needs to access
   objects, but there is no reason to expect that will continue to
   be the case, and once we added an call to an innocuous helper
   function that happens to access objects, the close_object_store()
   call made by the caller before run_fetch() was called becomes
   moot.  The more we can delay the call to close_object_store(),
   the better.  And the absolute last point we can defer the call to
   close_object_store() is where immediately before run_fetch() calls
   run_command_v_opt() to spawn "git fetch".

 * Which makes me wonder if we may be better off having a bit in the
   flags word the run_command() API takes to make a call to
   close_object_store() for us.  run_fetch() that uses the
   run_command API can use that bit without having to worry about
   making a call to close_object_store() itself and when.

 * Hits from "git grep -A2 close_object_store()" shows a notable
   pattern.  Before run_auto_maintenance(), we often see a call to
   it.  It almost feels (but I didn't dig it deeper) that a call to
   run_auto_maintenance() that does not call close_object_store()
   before doing so is a bug (there is one in builtin/commit.c).

 * Which in turn makes me wonder if these many calls to close before
   run_auto_maintenance() should be moved to run_auto_maintenance()
   itself (which in turn can use the new flags bit in the
   run_command() API).

Sprinkling yet another call to close_object_store() as we discover
need for doing so like this patch does is certainly OK, but as we
add new hooks and higher-level commands, it will get messier and
messier.  It probably may make sense to go in and clean it up,
hopefully guided by the above observations, either before this
"fix", or soon after it graduates before we forget.

Will queue, but will not merge down to 'next' until I hear an Ack on
the commit-graph stuff.

Thanks.



> diff --git a/builtin/pull.c b/builtin/pull.c
> index 3e13f810843..d9f0156d969 100644
> --- a/builtin/pull.c
> +++ b/builtin/pull.c
> @@ -26,6 +26,7 @@
>  #include "wt-status.h"
>  #include "commit-reach.h"
>  #include "sequencer.h"
> +#include "packfile.h"
>  
>  /**
>   * Parses the value of --rebase. If value is a false value, returns
> @@ -998,6 +999,7 @@ int cmd_pull(int argc, const char **argv, const char *prefix)
>  			oidclr(&rebase_fork_point);
>  	}
>  
> +	close_object_store(the_repository->objects);
>  	if (run_fetch(repo, refspecs))
>  		return 1;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Wed, 8 Sep 2021, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > On Windows, files cannot be removed nor renamed if there are still
> > handles held by a process. To remedy that, we try to release all open
> > handles to any `.pack` file before e.g. repacking (which would want to
> > remove the original `.pack` file(s) after it is done).
> >
> > Since the `read_cache_unmerged()` and/or the `get_oid()` call in `git
> > pull` can cause `.pack` files to be opened, we need to release the open
> > handles before calling `git fetch`: the latter process might want to
> > spawn an auto-gc, which in turn might want to repack the objects.
> >
> > This commit is similar in spirit to 5bdece0d705 (gc/repack: release
> > packs when needed, 2018-12-15).
> >
> > This fixes https://github.com/git-for-windows/git/issues/3336.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  builtin/pull.c | 2 ++
> >  1 file changed, 2 insertions(+)
>
> After run_fetch() returns, we then go on to access objects from our
> object store (that's natural---after all, we fetched because we
> wanted to access the objects we have plus objects they have to offer
> to us) and the object store is transparently reopened for us.  Which
> may make a bit confusing API to newbies, but is an easy one to use,
> once we get used to it.
>
> A few general comments.
>
>  * Right now, run_fetch() does not do anything that needs to access
>    objects, but there is no reason to expect that will continue to
>    be the case, and once we added an call to an innocuous helper
>    function that happens to access objects, the close_object_store()
>    call made by the caller before run_fetch() was called becomes
>    moot.  The more we can delay the call to close_object_store(),
>    the better.  And the absolute last point we can defer the call to
>    close_object_store() is where immediately before run_fetch() calls
>    run_command_v_opt() to spawn "git fetch".
>
>  * Which makes me wonder if we may be better off having a bit in the
>    flags word the run_command() API takes to make a call to
>    close_object_store() for us.  run_fetch() that uses the
>    run_command API can use that bit without having to worry about
>    making a call to close_object_store() itself and when.
>
>  * Hits from "git grep -A2 close_object_store()" shows a notable
>    pattern.  Before run_auto_maintenance(), we often see a call to
>    it.  It almost feels (but I didn't dig it deeper) that a call to
>    run_auto_maintenance() that does not call close_object_store()
>    before doing so is a bug (there is one in builtin/commit.c).
>
>  * Which in turn makes me wonder if these many calls to close before
>    run_auto_maintenance() should be moved to run_auto_maintenance()
>    itself (which in turn can use the new flags bit in the
>    run_command() API).
>
> Sprinkling yet another call to close_object_store() as we discover
> need for doing so like this patch does is certainly OK, but as we
> add new hooks and higher-level commands, it will get messier and
> messier.  It probably may make sense to go in and clean it up,
> hopefully guided by the above observations, either before this
> "fix", or soon after it graduates before we forget.

I like those ideas, and submitted a follow-up patch series.

> Will queue, but will not merge down to 'next' until I hear an Ack on
> the commit-graph stuff.

Thank you.

For procedural reasons, I would like to keep the current patch series
as-is, because that will free some mental space for me maintaining Git for
Windows (where I already merged them, after a contributor verified the
fix).

Thanks,
Dscho

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 8, 2021

This branch is now known as js/pull-release-packs-before-fetching.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 8, 2021

This patch series was integrated into seen via git@83b6f62.

@gitgitgadget gitgitgadget bot added the seen label Sep 8, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Sep 8, 2021

This patch series was integrated into seen via git@ca6ddd5.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 9, 2021

This patch series was integrated into seen via git@03d36ec.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 10, 2021

This patch series was integrated into seen via git@7f2ff35.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 10, 2021

This patch series was integrated into seen via git@6c4965e.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 10, 2021

This patch series was integrated into seen via git@e7de65d.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 10, 2021

There was a status update in the "New Topics" section about the branch js/pull-release-packs-before-fetching on the Git mailing list:

"git pull" was taught to close open file descriptors to the
packfiles before spawning "git fetch" to help auto-gc that may be
invoked by it.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 12, 2021

This patch series was integrated into seen via git@bdfb5cd.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 13, 2021

This patch series was integrated into seen via git@441d4de.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 13, 2021

This patch series was integrated into next via git@97c3614.

@gitgitgadget gitgitgadget bot added the next label Sep 13, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Sep 14, 2021

There was a status update in the "Cooking" section about the branch js/pull-release-packs-before-fetching on the Git mailing list:

"git pull" was taught to close open file descriptors to the
packfiles before spawning "git fetch" to help auto-gc that may be
invoked by it.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 14, 2021

This patch series was integrated into seen via git@289d3f0.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 15, 2021

This patch series was integrated into seen via git@78d3300.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 17, 2021

This patch series was integrated into seen via git@8a8b98e.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 17, 2021

This patch series was integrated into next via git@05cdcab.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 17, 2021

There was a status update in the "Cooking" section about the branch js/pull-release-packs-before-fetching on the Git mailing list:

"git pull" was taught to close open file descriptors to the
packfiles before spawning "git fetch" to help auto-gc that may be
invoked by it.

Will merge to 'master'.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 20, 2021

This patch series was integrated into seen via git@c042ad5.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 20, 2021

This patch series was integrated into master via git@c042ad5.

@gitgitgadget gitgitgadget bot added the master label Sep 20, 2021
@gitgitgadget gitgitgadget bot closed this Sep 20, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Sep 20, 2021

Closed via c042ad5.

@dscho dscho deleted the release-packs-before-fetching branch October 4, 2021 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant