Skip to content

[pull] master from git:master#115

Merged
pull[bot] merged 14 commits intomakesoftwaresafe:masterfrom
git:master
Apr 29, 2023
Merged

[pull] master from git:master#115
pull[bot] merged 14 commits intomakesoftwaresafe:masterfrom
git:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Apr 29, 2023

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

ttaylorr and others added 14 commits April 18, 2023 14:56
The function `stage_tmp_packfiles()` generates a filename to use for
staging the contents of what will become the pack's ".mtimes" file.

The name is generated in `write_mtimes_file()` and the result is
returned back to `stage_tmp_packfiles()` which uses it to rename the
temporary file into place via `rename_tmp_packfiles()`.

`write_mtimes_file()` returns a `const char *`, indicating that callers
are not expected to free its result (similar to, e.g., `oid_to_hex()`).
But callers are expected to free its result, so this return type is
incorrect.

Change the function's signature to return a non-const `char *`, and free
it at the end of `stage_tmp_packfiles()`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When cruft packs were originally being developed, `-C` was designated as
the short-form for `--cruft` (as in `git repack -C`).

This was dropped due to confusion with Git's top-level `-C` option
before submitting to the list. But the reference to it in
`--cruft-expiration`'s help text was never updated. Fix that dangling
reference in this patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When cruft packs were implemented, we never adjusted the code for `git
gc`'s `--keep-largest-pack` and `gc.bigPackThreshold` to ignore cruft
packs. This option and configuration option share a common
implementation, but including cruft packs is wrong in both cases:

  - Running `git gc --keep-largest-pack` in a repository where the
    largest pack is the cruft pack itself will make it impossible for
    `git gc` to prune objects, since the cruft pack itself is kept.

  - The same is true for `gc.bigPackThreshold`, if the size of the cruft
    pack exceeds the limit set by the caller.

In the future, it is possible that `gc.bigPackThreshold` could be used
to write a separate cruft pack containing any new unreachable objects
that entered the repository since the last time a cruft pack was
written.

There are some complexities to doing so, mainly around handling
pruning objects that are in an existing cruft pack that is above the
threshold (which would either need to be rewritten, or else delay
pruning). Rewriting a substantially similar cruft pack isn't ideal, but
it is significantly better than the status-quo.

If users have large cruft packs that they don't want to rewrite, they
can mark them as `*.keep` packs. But in general, if a repository has a
cruft pack that is so large it is slowing down GC's, it should probably
be pruned anyway.

In the meantime, ignore cruft packs in the common implementation for
both of these options, and add a pair of tests to prevent any future
regressions here.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many of the tests in t5304 run `git gc`, and rely on its behavior that
unreachable-but-recent objects are written out loose. This is sensible,
since t5304 deals specifically with this kind of pruning.

If left unattended, however, this test would break when the default
behavior of a bare "git gc" is adjusted to generate a cruft pack by
default.

Ensure that these tests continue to work as-is (and continue to provide
coverage of loose object pruning) by passing `--no-cruft` explicitly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar spirit as previous commits, prepare for `gc --cruft`
becoming the default by ensuring that the tests in t6501 explicitly
cover the case of freshening loose objects not using cruft packs.

We could run this test twice, once with `--cruft` and once with
`--no-cruft`, but doing so is unnecessary, since we already test object
rescuing, freshening, and dealing with corrupt parts of the unreachable
object graph extensively via t5329.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 12253ab (gc: add tests for --cruft and friends, 2022-10-26), we
added a handful of tests to t6500 to ensure that `git gc` respected the
value of `--cruft` and `gc.cruftPacks`.

Then, in c695592 (config: let feature.experimental imply
gc.cruftPacks=true, 2022-10-26), another set of similar tests was added
to ensure that `feature.experimental` correctly implied enabling cruft
pack generation (or not).

These tests are similar and could be consolidated. Do so in this patch
to prepare for expanding the set of command-line invocations that enable
or disable writing cruft packs. This makes it possible to easily test
more combinations of arguments without being overly repetitive.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the last commit, we refactored some of the tests in t6500 to make
clearer when cruft packs will and won't be generated by `git gc`.

Add the remaining cases not covered by the previous patch into this one,
which enumerates all possible combinations of arguments that will
produce (or not produce) a cruft pack.

This prepares us for a future commit which will change the default value
of `gc.cruftPacks` by ensuring that we understand which invocations do
and do not change as a result.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar fashion as previous commits, adjust the fast-import tests
to prepare for "git gc" generating a cruft pack by default.

This adjustment is slightly different, however. Instead of relying on us
writing out the objects loose, and then calling `git prune` to remove
them, t9300 needs to be prepared to drop objects that would be moved
into cruft packs.

To do this, we can combine the `git gc` invocation with `git prune` into
one `git gc --prune`, which handles pruning both loose objects, and
objects that would otherwise be written to a cruft pack.

Likely this pattern of "git gc && git prune" started all the way back in
03db452 (Support gitlinks in fast-import., 2008-07-19), which
happened after deprecating `git gc --prune` in 9e7d501 (builtin-gc.c:
deprecate --prune, it now really has no effect, 2008-05-09).

After `--prune` was un-deprecated in 58e9d9d (gc: make --prune useful
again by accepting an optional parameter, 2009-02-14), this script got a
handful of new "git gc && git prune" instances via via 4cedb78
(fast-import: add input format tests, 2011-08-11). These could have been
`git gc --prune`, but weren't (likely taking after 03db452).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Back in 5b92477 (builtin/gc.c: conditionally avoid pruning objects
via loose, 2022-05-20), `git gc` learned the `--cruft` option and
`gc.cruftPacks` configuration to opt-in to writing cruft packs when
collecting or pruning unreachable objects.

Cruft packs were introduced with the merge in a50036d (Merge branch
'tb/cruft-packs', 2022-06-03). They address the problem of "loose object
explosions", where Git will write out many individual loose objects when
there is a large number of unreachable objects that have not yet aged
past `--prune=<date>`.

Instead of keeping track of those unreachable yet recent objects via
their loose object file's mtime, cruft packs collect all unreachable
objects into a single pack with a corresponding `*.mtimes` file that
acts as a table to store the mtimes of all unreachable objects. This
prevents the need to store unreachable objects as loose as they age out
of the repository, and avoids the problem of loose object explosions.

Beyond avoiding loose object explosions, cruft packs also act as a more
efficient mechanism to store unreachable objects as they age out of a
repository. This is because pairs of similar unreachable objects serve
as delta bases for one another.

In 5b92477, the feature was introduced as experimental. Since then,
GitHub has been running these patches in every repository generating
hundreds of millions of cruft packs along the way. The feature is
battle-tested, and avoids many pathological cases such as above. Users
who either run `git gc` manually, or via `git maintenance` can benefit
from having cruft packs.

As such, enable cruft pack generation to take place by default (by
making `gc.cruftPacks` have the default of "true" rather than "false).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As of the previous commit, all callers that need to read the value of
`gc.cruftPacks` do so outside without using the `repo_settings` struct,
making its `gc_cruft_packs` unused. Drop it accordingly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In check_signature(), we initialize the trust_level field to "-1", with
the idea that if gpg does not return a trust level at all (if there is
no signature, or if the signature is made by an unknown key), we'll
use that value. But this has two problems:

  1. Since the field is an enum, it's up to the compiler to decide what
     underlying storage to use, and it only has to fit the values we've
     declared. So we may not be able to store "-1" at all. And indeed,
     on my system (linux with gcc), the resulting enum is an unsigned
     32-bit value, and -1 becomes 4294967295.

     The difference may seem academic (and you even get "-1" if you pass
     it to printf("%d")), but it means that code like this:

       status |= sigc->trust_level < configured_min_trust_level;

     does not necessarily behave as expected. This turns out not to be a
     bug in practice, though, because we keep the "-1" only when gpg did
     not report a signature from a known key, in which case the line
     above:

       status |= sigc->result != 'G';

     would always set status to non-zero anyway. So only a 'G' signature
     with no parsed trust level would cause a problem, which doesn't
     seem likely to trigger (outside of unexpected gpg behavior).

  2. When using the "%GT" format placeholder, we pass the value to
     gpg_trust_level_to_str(), which complains that the value is out of
     range with a BUG(). This behavior was introduced by 803978d
     (gpg-interface: add function for converting trust level to string,
     2022-07-11). Before that, we just did a switch() on the enum, and
     anything that wasn't matched would end up as the empty string.

     Curiously, solving this by naively doing:

       if (level < 0)
               return "";

     in that function isn't sufficient. Because of (1) above, the
     compiler can (and does in my case) actually remove that conditional
     as dead code!

We can solve both by representing this state as an enum value. We could
do this by adding a new "unknown" value. But this really seems to match
the existing "undefined" level well. GPG describes this as "Not enough
information for calculation".

We have tests in t7510 that trigger this case (verifying a signature
from a key that we don't have, and then checking various %G
placeholders), but they didn't notice the BUG() because we didn't look
at %GT for that case! Let's make sure we check all %G placeholders for
each case in the formatting tests.

The interesting ones here are "show unknown signature with custom
format" and "show lack of signature with custom format", both of which
would BUG() before, and now turn %GT into "undefined". Prior to
803978d they would have turned it into the empty string, but I think
saying "undefined" consistently is a reasonable outcome, and probably
makes life easier for anyone parsing the output (and any such parser had
to be ready to see "undefined" already).

The other modified tests produce the same output before and after this
patch, but now we're consistently checking both %G? and %GT in all of
them.

Signed-off-by: Jeff King <peff@peff.net>
Reported-by: Rolf Eike Beer <eb@emlix.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "gc" needs to retain unreachable objects, packing them into
cruft packs (instead of exploding them into loose object files) has
been offered as a more efficient option for some time.  Now the use
of cruft packs has been made the default and no longer considered
an experimental feature.

* tb/enable-cruft-packs-by-default:
  repository.h: drop unused `gc_cruft_packs`
  builtin/gc.c: make `gc.cruftPacks` enabled by default
  t/t9300-fast-import.sh: prepare for `gc --cruft` by default
  t/t6500-gc.sh: add additional test cases
  t/t6500-gc.sh: refactor cruft pack tests
  t/t6501-freshen-objects.sh: prepare for `gc --cruft` by default
  t/t5304-prune.sh: prepare for `gc --cruft` by default
  builtin/gc.c: ignore cruft packs with `--keep-largest-pack`
  builtin/repack.c: fix incorrect reference to '-C'
  pack-write.c: plug a leak in stage_tmp_packfiles()
The "%GT" placeholder for the "--format" option of "git log" and
friends caused BUG() to trigger on a commit signed with an unknown
key, which has been corrected.

* jk/gpg-trust-level-fix:
  gpg-interface: set trust level of missing key to "undefined"
Signed-off-by: Junio C Hamano <gitster@pobox.com>
@pull pull bot added the ⤵️ pull label Apr 29, 2023
@pull pull bot merged commit 48d89b5 into makesoftwaresafe:master Apr 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants