Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace "git pack-refs" with C function refs_pack_refs #1695

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

vkabc
Copy link

@vkabc vkabc commented Mar 27, 2024

Thanks for taking the time to contribute to Git! Please be advised that the
Git community does not use github.com for their contributions. Instead, we use
a mailing list (git@vger.kernel.org) for code submissions, code reviews, and
bug reports. Nevertheless, you can use GitGitGadget (https://gitgitgadget.github.io/)
to conveniently send your Pull Requests commits to our mailing list.

For a single-commit pull request, please leave the pull request description
empty
: your commit message itself should describe your changes.

Please read the "guidelines for contributing" linked above!

gitster and others added 17 commits March 18, 2024 13:24
* ps/reftable-stack-tempfile:
  reftable/stack: register compacted tables as tempfiles
  reftable/stack: register lockfiles during compaction
  reftable/stack: register new tables as tempfiles
  lockfile: report when rollback fails
In `reftable_stack_init_addition()` we call `stack_uptodate()` after
having created the lockfile to check whether the stack was modified
concurrently, which is indicated by a positive return code from the
latter function. If so, we return a `REFTABLE_LOCK_ERROR` to the caller
and abort the addition.

The error handling has an off-by-one though because we check whether the
error code is `> 1` instead of `> 0`. Thus, instead of returning the
locking error, we would return a positive value. One of the callers of
`reftable_stack_init_addition()` works around this bug by repeating the
error code check without the off-by-one. But other callers are subtly
broken by this bug.

Fix this by checking for `err > 0` instead. This has the consequence
that `reftable_stack_init_addition()` won't ever return a positive error
code anymore, but will instead return `REFTABLE_LOCK_ERROR` now. Thus,
we can drop the check for a positive error code in `stack_try_add()`
now.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently throw two different errors into a similar-but-different
error code:

  - Errors when trying to lock the reftable stack.

  - Errors when trying to write to the reftable stack which has been
    modified concurrently.

This results in unclear error handling and user-visible error messages.

Create a new `REFTABLE_OUTDATED_ERROR` so that those error conditions
can be clearly told apart from each other. Adjust users of the old
`REFTABLE_LOCK_ERROR` to use the new error code as required.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Compaction of a reftable stack may fail gracefully when there is a
concurrent process that writes to the reftable stack and which has thus
locked either the "tables.list" file or one of the tables. This is
expected and can be handled gracefully by some of the callers which
invoke compaction. Thus, to indicate this situation to our callers, we
return a positive return code from `stack_compact_range()` and bubble it
up to the caller.

This kind of error handling is somewhat awkward though as many callers
in the call chain never even think of handling positive return values.
Thus, the result is either that such errors are swallowed by accident,
or that we abort operations with an unhelpful error message.

Make the code more robust by always using negative error codes when
compaction fails, with `REFTABLE_LOCK_ERROR` for the described benign
error case.

Note that only a single callsite knew to handle positive error codes
gracefully in the first place. Subsequent commits will touch up some of
the other sites to handle those errors better.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whenever we commit a new table to the reftable stack we will end up
invoking auto-compaction of the stack to keep the total number of tables
at bay. This auto-compaction may fail though in case at least one of the
tables which we are about to compact is locked. This is indicated by the
compaction function returning `REFTABLE_LOCK_ERROR`. We do not handle
this case though, and thus bubble that return value up the calling
chain, which will ultimately cause a failure.

Fix this bug by ignoring `REFTABLE_LOCK_ERROR`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When git-pack-refs(1) fails in the reftable backend we end up printing
no error message at all, leaving the caller puzzled as to why compaction
has failed. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test helper provides a "ref-store <store> pack-refs" wrapper that
more or less directly invokes `refs_pack_refs()`. This helper is only
used in a single test with the "PACK_REFS_PRUNE" and "PACK_REFS_ALL"
flags. Both of these flags can directly be accessed via git-pack-refs(1)
though via the `--all` and `--prune` flags, which makes the helper
superfluous.

Refactor the test to use git-pack-refs(1) instead of the test helper.
Drop the now-unused test helper command.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The declaration of `struct pack_refs_opts` is in a seemingly random
place. Move it so that it's located right next to its flags and
functions that use it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The intent of the `PACK_REFS_ALL` flag is to ask the backend to compact
all refs instead of only a subset of them. Thus, this flag gets passed
down to `refs_pack_refs()` via `struct pack_refs_opts::flags`.

But starting with 4fe42f3 (pack-refs: teach pack-refs --include
option, 2023-05-12), the flag's semantics have changed. Instead of being
handled by the respective backends, this flag is now getting handled by
the callers of `refs_pack_refs()` which will add a single glob ("*") to
the list of refs-to-be-packed. Thus, the flag serves no purpose to the
ref backends anymore.

Remove the flag and replace it with a local variable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Under normal circumstances, the "reftable" backend will automatically
perform compaction after appending to the stack. It is thus not
necessary and may even be considered wasteful to run git-pack-refs(1) in
"reftable"-backed repositories as it will cause the backend to compact
all tables into a single one. We do exactly that though when running
`git maintenance run --auto` or `git gc --auto`, which gets spawned by
Git after running some specific commands.

The `--auto` mode is typically only executing optimizations as needed.
To do so, we already use several heuristics for the various different
data structures in Git to determine whether to optimize them or not.
We do not use any heuristics for refs though and instead always optimize
them.

Introduce a new `PACK_REFS_AUTO` flag that can be passed to the backend.
When not handled by the backend we will continue to behave the exact
same as we do right now, that is we optimize refs unconditionally. This
is done for the "files" backend for now to retain current behaviour,
even though we may eventually also want to introduce heuristics here.
For the "reftable" backend though we already do have auto-compaction, so
we can easily reuse that logic to implement the new auto-packing flag.

Note that under normal circumstances, this should always end up being a
no-op. After all, we already invoke the code for every single addition
to the stack. But there are special cases where it can still be helpful
to execute the auto-compaction code explicitly:

  - Concurrent writers may cause compaction to not run due to locks.

  - Callers may decide to disable compaction altogether and then pack
    refs at a later point due to various reasons.

  - Other implementations of the reftable format may do compaction
    differently or even not at all.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some of the command line options in `cmd_pack_refs()` require us to
allocate memory. This memory is never released and thus leaking, but we
paper over this leak by declaring the respective variables as `static`
function-level variables, which is somewhat awkward.

Refactor the code to release the allocated memory and drop the `static`
declaration. While at it, remove the useless `flags` variable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Calling git-pack-refs(1) will unconditionally cause it to pack all
requested refs regardless of the current state of the ref database. For
example:

  - With the "files" backend we will end up rewriting the complete
    "packed-refs" file even if only a single ref would require
    compaction.

  - With the "reftable" backend we will end up always compacting all
    tables into a single table.

This behaviour can be completely unnecessary depending on the backend
and is thus wasteful.

With the introduction of the `PACK_REFS_AUTO` flag in the preceding
commit we can improve this and let the backends decide for themselves
whether to pack refs in the first place. Expose this functionality via a
new "--auto" flag in git-pack-refs(1), which mirrors the same flag in
both git-gc(1) and git-maintenance(1).

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're about to start using `struct maintenance_run_opts` in
`maintenance_task_pack_refs()`. Move its definition up to prepare for
this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ".git/obects/17/" shard is somewhat special because it is used by
git-gc(1) to estimate how many objects there are by extrapolating the
number of objects in that shard, only. In t6500 we thus have a hard
coded set of data that, when written to the object database, result in
blobs starting with that prefix.

We are about to need such "17"-prefixed objects in another test suite.
Extract them into "t/oid-info/hash-info" so that they can be reused by
other tests.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Forward the `--auto` flag to git-pack-refs(1) when it has been invoked
with this flag itself. This does not change anything for the "files"
backend, which will continue to eagerly pack refs. But it does ensure
that the "reftable" backend only compacts refs as required.

This change does not impact git-maintenance(1) because this command will
in fact never run the pack-refs task when run with `--auto`. This issue
will be addressed in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running `git maintenance run --auto`, then the various subtasks
will only run as needed. Thus, we for example end up only packing loose
objects if we hit a certain threshold.

Interestingly enough, the "pack-refs" task is actually _never_ executed
when the auto-flag is set because it does not have a condition at all.
As 41abfe1 (maintenance: add pack-refs task, 2021-02-09) mentions:

    The 'auto_condition' function pointer is left NULL for now. We could
    extend this in the future to have a condition check if pack-refs
    should be run during 'git maintenance run --auto'.

It is not quite clear from that quote whether it is actually intended
that the task doesn't run at all in this mode. Also, no test was added
to verify this behaviour. Ultimately though, it feels quite surprising
that `git maintenance run --auto --task=pack-refs` would quietly never
do anything at all.

In any case, now that we do have the logic in place to let ref backends
decide whether or not to repack refs, it does make sense to wire it up
accordingly. With the "reftable" backend we will thus now perform
auto-compaction, which optimizes the refdb as needed.

But for the "files" backend we now unconditionally pack refs as it does
not yet know to handle the "auto" flag. Arguably, this can be seen as a
bug fix given that previously the task never did anything at all.
Eventually though we should amend the "files" backend to use some
heuristics for auto compaction, as well.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
maintenance_task_pack_refs() calls "git pack-refs" which
spawns an external process. C function refs_pack_refs can
be called directly, hence saving an external process.
@vkabc
Copy link
Author

vkabc commented Mar 27, 2024

not sure if this is correct, I'm following https://lore.kernel.org/git/ZgO-W3E-CeT3n7vl@tanuki/ to build on top of ps/pack-refs-auto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants