Skip to content

Commit

Permalink
pack-objects: keep track of pack_start for each reuse pack
Browse files Browse the repository at this point in the history
When reusing objects from a pack, we keep track of a set of one or more
`reused_chunk`s, corresponding to sections of one or more object(s) from
a source pack that we are reusing. Each chunk contains two pieces of
information:

  - the offset of the first object in the source pack (relative to the
    beginning of the source pack)
  - the difference between that offset, and the corresponding offset in
    the pack we're generating

The purpose of keeping track of these is so that we can patch an
OFS_DELTAs that cross over a section of the reuse pack that we didn't
take.

For instance, consider a hypothetical pack as shown below:

                                                (chunk #2)
                                                __________...
                                               /
                                              /
      +--------+---------+-------------------+---------+
  ... | <base> | <other> |      (unused)     | <delta> | ...
      +--------+---------+-------------------+---------+
       \                /
        \______________/
           (chunk #1)

Suppose that we are sending objects "base", "other", and "delta", and
that the "delta" object is stored as an OFS_DELTA, and that its base is
"base". If we don't send any objects in the "(unused)" range, we can't
copy the delta'd object directly, since its delta offset includes a
range of the pack that we didn't copy, so we have to account for that
difference when patching and reassembling the delta.

In order to compute this value correctly, we need to know not only where
we are in the packfile we're assembling (with `hashfile_total(f)`) but
also the position of the first byte of the packfile that we are
currently reusing. Currently, this works just fine, since when reusing
only a single pack those two values are always identical (because
verbatim reuse is the first thing pack-objects does when enabled after
writing the pack header).

But when reusing multiple packs which have one or more gaps, we'll need
to account for these two values diverging.

Together, these two allow us to compute the reused chunk's offset
difference relative to the start of the reused pack, as desired.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
  • Loading branch information
ttaylorr authored and gitster committed Dec 14, 2023
1 parent 5e29c3f commit d1d701e
Showing 1 changed file with 8 additions and 3 deletions.
11 changes: 8 additions & 3 deletions builtin/pack-objects.c
Original file line number Diff line number Diff line change
Expand Up @@ -1015,6 +1015,7 @@ static off_t find_reused_offset(off_t where)

static void write_reused_pack_one(struct packed_git *reuse_packfile,
size_t pos, struct hashfile *out,
off_t pack_start,
struct pack_window **w_curs)
{
off_t offset, next, cur;
Expand All @@ -1024,7 +1025,8 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile,
offset = pack_pos_to_offset(reuse_packfile, pos);
next = pack_pos_to_offset(reuse_packfile, pos + 1);

record_reused_object(offset, offset - hashfile_total(out));
record_reused_object(offset,
offset - (hashfile_total(out) - pack_start));

cur = offset;
type = unpack_object_header(reuse_packfile, w_curs, &cur, &size);
Expand Down Expand Up @@ -1094,6 +1096,7 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile,

static size_t write_reused_pack_verbatim(struct packed_git *reuse_packfile,
struct hashfile *out,
off_t pack_start UNUSED,
struct pack_window **w_curs)
{
size_t pos = 0;
Expand Down Expand Up @@ -1125,10 +1128,12 @@ static void write_reused_pack(struct packed_git *reuse_packfile,
{
size_t i = 0;
uint32_t offset;
off_t pack_start = hashfile_total(f) - sizeof(struct pack_header);
struct pack_window *w_curs = NULL;

if (allow_ofs_delta)
i = write_reused_pack_verbatim(reuse_packfile, f, &w_curs);
i = write_reused_pack_verbatim(reuse_packfile, f, pack_start,
&w_curs);

for (; i < reuse_packfile_bitmap->word_alloc; ++i) {
eword_t word = reuse_packfile_bitmap->words[i];
Expand All @@ -1145,7 +1150,7 @@ static void write_reused_pack(struct packed_git *reuse_packfile,
* for why.
*/
write_reused_pack_one(reuse_packfile, pos + offset, f,
&w_curs);
pack_start, &w_curs);
display_progress(progress_state, ++written);
}
}
Expand Down

0 comments on commit d1d701e

Please sign in to comment.