Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: git/git
base: 223a1bfb5821387981c700654e4edd2443c5a7fc
Choose a base ref
...
head repository: git/git
compare: e861b0963626dd2732f7efbf2a187a85b060d9cb
Choose a head ref
  • 10 commits
  • 14 files changed
  • 2 contributors

Commits on Sep 29, 2021

  1. midx: expose write_midx_file_only() publicly

    Expose a variant of the write_midx_file() function which ignores packs
    that aren't included in an explicit "allow" list.
    
    This will be used in an upcoming patch to power a new `--stdin-packs`
    mode of `git multi-pack-index write` for callers that only want to
    include certain packs in a MIDX (and ignore any packs which may have
    happened to enter the repository independently, e.g., from pushes).
    
    Those patches will provide test coverage for this new function.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Sep 29, 2021
    Copy the full SHA
    56d863e View commit details
    Browse the repository at this point in the history
  2. builtin/multi-pack-index.c: support --stdin-packs mode

    To power a new `--write-midx` mode, `git repack` will want to write a
    multi-pack index containing a certain set of packs in the repository.
    
    This new option will be used by `git repack` to write a MIDX which
    contains only the packs which will survive after the repack (that is, it
    will exclude any packs which are about to be deleted).
    
    This patch effectively exposes the function implemented in the previous
    commit via the `git multi-pack-index` builtin. An alternative approach
    would have been to call that function from the `git repack` builtin
    directly, but this introduces awkward problems around closing and
    reopening the object store, so the MIDX will be written out-of-process.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Sep 29, 2021
    Copy the full SHA
    6fb22ca View commit details
    Browse the repository at this point in the history
  3. midx: preliminary support for --refs-snapshot

    To figure out which commits we can write a bitmap for, the multi-pack
    index/bitmap code does a reachability traversal, marking any commit
    which can be found in the MIDX as eligible to receive a bitmap.
    
    This approach will cause a problem when multi-pack bitmaps are able to
    be generated from `git repack`, since the reference tips can change
    during the repack. Even though we ignore commits that don't exist in
    the MIDX (when doing a scan of the ref tips), it's possible that a
    commit in the MIDX reaches something that isn't.
    
    This can happen when a multi-pack index contains some pack which refers
    to loose objects (e.g., if a pack was pushed after starting the repack
    but before generating the MIDX which depends on an object which is
    stored as loose in the repository, and by definition isn't included in
    the multi-pack index).
    
    By taking a snapshot of the references before we start repacking, we can
    close that race window. In the above scenario (where we have a packed
    object pointing at a loose one), we'll either (a) take a snapshot of the
    references before seeing the packed one, or (b) take it after, at which
    point we can guarantee that the loose object will be packed and included
    in the MIDX.
    
    This patch does just that. It writes a temporary "reference snapshot",
    which is a list of OIDs that are at the ref tips before writing a
    multi-pack bitmap. References that are "preferred" (i.e,. are a suffix
    of at least one value of the 'pack.preferBitmapTips' configuration) are
    marked with a special '+'.
    
    The format is simple: one line per commit at each tip, with an optional
    '+' at the beginning (for preferred references, as described above).
    
    When provided, the reference snapshot is used to drive bitmap selection
    instead of the MIDX code doing its own traversal. When it isn't
    provided, the usual traversal takes place instead.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Sep 29, 2021
    Copy the full SHA
    08944d1 View commit details
    Browse the repository at this point in the history
  4. builtin/repack.c: keep track of existing packs unconditionally

    In order to be able to write a multi-pack index during repacking, `git
    repack` must keep track of which packs it wants to write into the MIDX.
    This set is the union of existing packs which will not be deleted,
    new pack(s) generated as a result of the repack, and .keep packs.
    
    Prior to this patch, `git repack` populated the list of existing packs
    only when repacking all-into-one (i.e., with `-A` or `-a`), but we will
    soon need to know this list when repacking when writing a MIDX without
    a-i-o.
    
    Populate the list of existing packs unconditionally, and guard removing
    packs from that list only when repacking a-i-o.
    
    Additionally, keep track of filenames of kept packs separately, since
    this, too, will be used in an upcoming patch.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Sep 29, 2021
    Copy the full SHA
    90f838b View commit details
    Browse the repository at this point in the history
  5. builtin/repack.c: rename variables that deal with non-kept packs

    The new variable `existing_kept_packs` (and corresponding parameter
    `fname_kept_list`) added by the previous patch make it seem like
    `existing_packs` and `fname_list` are each subsets of the other two
    respectively.
    
    In reality, each pair is disjoint: one stores the packs without .keep
    files, and the other stores the packs with .keep files. Rename each to
    more clearly reflect this.
    
    Suggested-by: Jonathan Tan <jonathantanmy@google.com>
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Sep 29, 2021
    Copy the full SHA
    a169166 View commit details
    Browse the repository at this point in the history
  6. builtin/repack.c: extract showing progress to a variable

    We only ask whether stderr is a tty before calling
    'prune_packed_objects()', but the subsequent patch will add another use.
    
    Extract this check into a variable so that both can use it without
    having to call 'isatty()' twice.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Sep 29, 2021
    Copy the full SHA
    5f18e31 View commit details
    Browse the repository at this point in the history
  7. builtin/repack.c: support writing a MIDX while repacking

    Teach `git repack` a new `--write-midx` option for callers that wish to
    persist a multi-pack index in their repository while repacking.
    
    There are two existing alternatives to this new flag, but they don't
    cover our particular use-case. These alternatives are:
    
      - Call 'git multi-pack-index write' after running 'git repack', or
    
      - Set 'GIT_TEST_MULTI_PACK_INDEX=1' in your environment when running
        'git repack'.
    
    The former works, but introduces a gap in bitmap coverage between
    repacking and writing a new MIDX (since the repack may have deleted a
    pack included in the existing MIDX, invalidating it altogether).
    
    Setting the 'GIT_TEST_' environment variable is obviously unsupported.
    In fact, even if it were supported officially, it still wouldn't work,
    because it generates the MIDX *after* redundant packs have been dropped,
    leading to the same issue as above.
    
    Introduce a new option which eliminates this race by teaching `git
    repack` to generate the MIDX at the critical point: after the new packs
    have been written and moved into place, but before the redundant packs
    have been removed.
    
    This option is compatible with `git repack`'s '--bitmap' option (it
    changes the interpretation to be: "write a bitmap corresponding to the
    MIDX after one has been generated").
    
    There is a little bit of additional noise in the patch below to avoid
    repeating ourselves when selecting which packs to delete. Instead of a
    single loop as before (where we iterate over 'existing_packs', decide if
    a pack is worth deleting, and if so, delete it), we have two loops (the
    first where we decide which ones are worth deleting, and the second
    where we actually do the deleting). This makes it so we have a single
    check we can make consistently when (1) telling the MIDX which packs we
    want to exclude, and (2) actually unlinking the redundant packs.
    
    There is also a tiny change to short-circuit the body of
    write_midx_included_packs() when no packs remain in the case of an empty
    repository. The MIDX code does not handle this, so avoid trying to
    generate a MIDX covering zero packs in the first place.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Sep 29, 2021
    Copy the full SHA
    1d89d88 View commit details
    Browse the repository at this point in the history
  8. builtin/repack.c: make largest pack preferred

    When repacking into a geometric series and writing a multi-pack bitmap,
    it is beneficial to have the largest resulting pack be the preferred
    object source in the bitmap's MIDX, since selecting the large packs can
    lead to fewer broken delta chains and better compression.
    
    Teach 'git repack' to identify this pack and pass it to the MIDX write
    machinery in order to mark it as preferred.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Sep 29, 2021
    Copy the full SHA
    6d08b9d View commit details
    Browse the repository at this point in the history

Commits on Oct 1, 2021

  1. builtin/repack.c: pass --refs-snapshot when writing bitmaps

    To prevent the race described in an earlier patch, generate and pass a
    reference snapshot to the multi-pack bitmap code, if we are writing one
    from `git repack`.
    
    This patch is mostly limited to creating a temporary file, and then
    calling for_each_ref(). Except we try to minimize duplicates, since
    doing so can drastically reduce the size in network-of-forks style
    repositories. In the kernel's fork network (the repository containing
    all objects from the kernel and all its forks), deduplicating the
    references drops the snapshot size from 934 MB to just 12 MB.
    
    But since we're handling duplicates in this way, we have to make sure
    that we preferred references (those listed in pack.preferBitmapTips)
    before non-preferred ones (to avoid recording an object which is pointed
    at by a preferred tip as non-preferred).
    
    We accomplish this by doing separate passes over the references: first
    visiting each prefix in pack.preferBitmapTips, and then over the rest of
    the references.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Oct 1, 2021
    Copy the full SHA
    324efc9 View commit details
    Browse the repository at this point in the history

Commits on Oct 7, 2021

  1. test-read-midx: fix leak of bitmap_index struct

    In read_midx_preferred_pack(), we open the bitmap index but never free
    it. This isn't a big deal since this is just a test helper, and we exit
    immediately after, but since we're trying to keep our leak-checking tidy
    now, it's worth fixing.
    
    Signed-off-by: Jeff King <peff@peff.net>
    Acked-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    peff authored and gitster committed Oct 7, 2021
    Copy the full SHA
    e861b09 View commit details
    Browse the repository at this point in the history