Skip to content

Commit

Permalink
integrated feedback from the git list to packfiles
Browse files Browse the repository at this point in the history
  • Loading branch information
schacon committed Sep 5, 2008
1 parent 640afaf commit fef851a
Show file tree
Hide file tree
Showing 6 changed files with 50 additions and 15 deletions.
Binary file modified assets/images/figure/packfile-format.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/images/figure/packfile-index.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion script/html.rb
Expand Up @@ -5,7 +5,8 @@
require 'rdiscount'
require "uv"

MIN_SIZE = 1200
#MIN_SIZE = 1200
MIN_SIZE = 800

def do_replacements(html, type = :html)

Expand Down
Expand Up @@ -26,9 +26,9 @@ tip of the other branch, which is stored temporarily in MERGE_HEAD.
During the merge, the index holds three versions of each file. Each of
these three "file stages" represents a different version of the file:

$ git show :1:file.txt # the file in a common ancestor of both branches
$ git show :2:file.txt # the version from HEAD.
$ git show :3:file.txt # the version from MERGE_HEAD.
$ git show :1:file.txt # the file in a common ancestor of both branches
$ git show :2:file.txt # the version from HEAD.
$ git show :3:file.txt # the version from MERGE_HEAD.

When you ask linkgit:git-diff[1] to show the conflicts, it runs a
three-way diff between the conflicted merge results in the work tree with
Expand Down
@@ -1,7 +1,5 @@
### Multiway Merge ###



You can merge several heads at one time by simply listing them on the same
linkgit:git-merge[1] command. For instance,

Expand Down
54 changes: 45 additions & 9 deletions text/52_The_Packfile/0_The_Packfile.markdown
Expand Up @@ -10,7 +10,8 @@ bookmarks into a packfile.

There are two versions of the packfile index - version one, which is the default
in versions of Git earlier than 1.6, and version two, which is the default
from 1.6 forward, but which can be read by Git versions going back to 1.5.2.
from 1.6 forward, but which can be read by Git versions going back to 1.5.2, and
has been further backported to 1.4.4.5 if you are still on the 1.4 series.

Version 2 also includes a CRC checksum of each object so compressed data
can be copied directly from pack to pack during repacking without
Expand All @@ -20,8 +21,15 @@ larger than 4 Gb.
[fig:packfile-index]

In both formats, the fanout table is simply a way to find the offset of a
particular sha faster within the index file. In version 1, the offsets and
shas are in the same space, where in version two, there are seperate tables
particular sha faster within the index file. The offset/sha1[]
tables are sorted by sha1[] values (this is to allow binary search of this
table), and fanout[] table points at the offset/sha1[] table in a specific
way (so that part of the latter table that covers all hashes that start
with a given byte can be found to avoid 8 iterations of the binary
search).

In version 1, the offsets and shas are in the same space, where in version two,
there are seperate tables
for the shas, crc checksums and offsets. At the end of both files are
checksum shas for both the index file and the packfile it references.

Expand All @@ -33,12 +41,25 @@ a pack. The packfile format is used in upload-pack and receieve-pack programs

### The Packfile Format ###

The packfile itself is a very simple format. The first four bytes is the
string 'PACK', which is sort of used to make sure you're getting the start
of the packfile correctly. After that, you get a series of packed objects,
The packfile itself is a very simple format. There is a header, a series of
packed objects (each with it's own header and body) and then a checksum trailer.
The first four bytes is the string 'PACK', which is sort of used to make sure
you're getting the start of the packfile correctly. This is followed by a 4-byte
packfile version number and then a 4-byte number of entries in that file. In
Ruby, you might read the header data like this:

ruby
def read_pack_header
sig = @session.recv(4)
ver = @session.recv(4).unpack("N")[0]
entries = @session.recv(4).unpack("N")[0]
[sig, ver, entries]
end

After that, you get a series of packed objects, in order of thier SHAs
which each consist of an object header and object contents. At the end
of the packfile is a SHA1 sum of all the shas (in sorted order) in that
packfile.
of the packfile is a 20-byte SHA1 sum of all the shas (in sorted order) in that
packfile.

[fig:packfile-format]

Expand All @@ -64,4 +85,19 @@ It is important to note that the size specified in the header data is not
the size of the data that actually follows, but the size of that data *when
expanded*. This is why the offsets in the packfile index are so useful,
otherwise you have to expand every object just to tell when the next header
starts.
starts.

The data part is just zlib stream for non-delta object types; for the two
delta object representations, the data portion contains something that
identifies which base object this delta representation depends on, and the
delta to apply on the base object to resurrect this object. <code>ref-delta</code>
uses 20-byte hash of the base object at the beginning of data, while
<code>ofs-delta</code> stores an offset within the same packfile to identify the base
object. In either case, two important constraints a reimplementor must
adhere to are:

* delta representation must be based on some other object within the same
packfile;

* the base object must be of the same underlying type (blob, tree, commit
or tag);

0 comments on commit fef851a

Please sign in to comment.