Especially with the new index format to come, it is more appropriate to encapsulate more into check_packed_git_idx() and assume less of the index format in struct packed_git. To that effect, the index_base is renamed to index_data with void * type so it is not used directly but other pointers initialized with it. This allows for a couple pointer cast removal, as well as providing a better generic name to grep for when adding support for new index versions or formats. And index_data is declared const too while at it. Signed-off-by: Nicolas Pitre <email@example.com> Signed-off-by: Junio C Hamano <firstname.lastname@example.org>
Signed-off-by: Junio C Hamano <email@example.com>
Plain integer types without a fixed size can vary between platforms. Even though all common platforms use 32-bit ints, there is no guarantee that this won't change at some point. Furthermore, specifying an integer type with explicit size makes the definition of structures more obvious. Signed-off-by: Simon 'corecode' Schubert <firstname.lastname@example.org> Signed-off-by: Junio C Hamano <email@example.com>
Way back when Junio developed the 64 bit index topic he came up with a means of changing the .idx file format so that older Git clients would recognize that they don't understand the file and refuse to read it, while newer clients could tell the difference between the old-style and new-style .idx files. Unfortunately this wasn't recorded anywhere. This change documents how we might go about changing the .idx file format by using a special signature in the first four bytes. Credit (and possible blame) goes completely to Junio for thinking up this technique. The change also modifies the error message of the current Git code so that users get a recommendation to upgrade their Git software should this version or later encounter a new-style .idx which it cannot process. We already do this for the .pack files, but since we usually process the .idx files first its important that these files are recognized and encourage an upgrade. Signed-off-by: Shawn O. Pearce <firstname.lastname@example.org> Signed-off-by: Junio C Hamano <email@example.com>
* np/pack: add the capability for index-pack to read from a stream index-pack: compare only the first 20-bytes of the key. git-repack: repo.usedeltabaseoffset pack-objects: document --delta-base-offset option allow delta data reuse even if base object is a preferred base zap a debug remnant let the GIT native protocol use offsets to delta base when possible make pack data reuse compatible with both delta types make git-pack-objects able to create deltas with offset to base teach git-index-pack about deltas with offset to base teach git-unpack-objects about deltas with offset to base introduce delta objects with offset to base
This reverts commit 1685457. Git as recent as v1.1.6 do not understand version 3 delta. v1.2.0 is Ok and I personally would say it is old enough, but the improvement between version 2 and version 3 delta is not bit enough to justify breaking older clients. We should resurrect this later, but when we do so we shold make it conditional. Signed-off-by: Junio C Hamano <firstname.lastname@example.org>
This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <email@example.com> Signed-off-by: Junio C Hamano <firstname.lastname@example.org>
It's been quite a while now that GIT is able to read version 3 packs. Let's create them at last. Signed-off-by: Nicolas Pitre <email@example.com> Signed-off-by: Junio C Hamano <firstname.lastname@example.org>
This updates the type-enumeration constants introduced to reduce the memory footprint of "struct object" to match the type bits already used in the packfile format, by removing the former (i.e. TYPE_* constant macros) and using the latter (i.e. enum object_type) throughout the code for consistency. Eventually we can stop passing around the "type strings" entirely, and this will help - no confusion about two different integer enumeration. Signed-off-by: Linus Torvalds <email@example.com> Signed-off-by: Junio C Hamano <firstname.lastname@example.org>
When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <email@example.com>
After experimenting with code to add the ability to encode a delta against part of the deltified file, it turns out that resulting packs are _bigger_ than when this ability is not used. The raw delta output might be smaller, but it doesn't compress as well using gzip with a negative net saving on average. Said bit would in fact be more useful to allow for encoding the copying of chunks larger than 64KB providing more savings with large files. This will correspond to packs version 3. While the current code still produces packs version 2, it is made future proof so pack versions 2 and 3 are accepted. Any pack version 2 are compatible with version 3 since the redefined bit was never used before. When enough time has passed, code to use that bit to produce version 3 packs could be added. Signed-off-by: Nicolas Pitre <firstname.lastname@example.org> Signed-off-by: Junio C Hamano <email@example.com>
Nico pointed out that having verify_pack.c and verify-pack.c was confusing. Rename verify_pack.c to pack-check.c as suggested, and enhances the verification done quite a bit. - Built-in sha1_file unpacking knows that a base object of a deltified object _must_ be in the same pack, and takes advantage of that fact. - Earlier verify-pack command only checked the SHA1 sum for the entire pack file and did not look into its contents. It now checks everything idx file claims to have unpacks correctly. - It now has a hook to give more detailed information for objects contained in the pack under -v flag. Signed-off-by: Junio C Hamano <firstname.lastname@example.org> Signed-off-by: Linus Torvalds <email@example.com>
Given a list of <pack>.idx files, this command validates the index file and the corresponding .pack file for consistency. This patch also uses the same validation mechanism in fsck-cache when the --full flag is used. During normal operation, sha1_file.c verifies that a given .idx file matches the .pack file by comparing the SHA1 checksum stored in .idx file and .pack file as a minimum sanity check. We may further want to check the pack signature and version when we map the pack, but that would be a separate patch. Earlier, errors to map a pack file was not flagged fatal but led to a random fatal error later. This version explicitly die()s when such an error is detected. Signed-off-by: Junio C Hamano <firstname.lastname@example.org> Signed-off-by: Linus Torvalds <email@example.com>
This makes it match the new delta encoding, and admittedly makes the code easier to follow. This also updates the PACK file version to 2, since this (and the delta encoding change in the previous commit) are incompatible with the old format.
This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently.