integrated feedback from the git list to packfiles

schacon · Sep 5, 2008 · fef851a · fef851a
1 parent 640afaf
commit fef851a
Show file tree

Hide file tree

Showing 6 changed files with 50 additions and 15 deletions.
diff --git a/assets/images/figure/packfile-format.png b/assets/images/figure/packfile-format.png
diff --git a/assets/images/figure/packfile-index.png b/assets/images/figure/packfile-index.png
diff --git a/script/html.rb b/script/html.rb
@@ -5,7 +5,8 @@
 require 'rdiscount'
 require "uv"
 
-MIN_SIZE = 1200
+#MIN_SIZE = 1200
+MIN_SIZE = 800
 
 def do_replacements(html, type = :html)
 

diff --git a/text/26_Advanced_Branching_And_Merging/0_Advanced_Branching_And_Merging.markdown b/text/26_Advanced_Branching_And_Merging/0_Advanced_Branching_And_Merging.markdown
@@ -26,9 +26,9 @@ tip of the other branch, which is stored temporarily in MERGE_HEAD.
 During the merge, the index holds three versions of each file.  Each of
 these three "file stages" represents a different version of the file:
 
-$ git show :1:file.txt	# the file in a common ancestor of both branches
-$ git show :2:file.txt	# the version from HEAD.
-$ git show :3:file.txt	# the version from MERGE_HEAD.
+	$ git show :1:file.txt	# the file in a common ancestor of both branches
+	$ git show :2:file.txt	# the version from HEAD.
+	$ git show :3:file.txt	# the version from MERGE_HEAD.
 
 When you ask linkgit:git-diff[1] to show the conflicts, it runs a
 three-way diff between the conflicted merge results in the work tree with

diff --git a/...vanced_Branching_And_Merging/1_Advanced_Merging_Multiway_Merge_Subtree.markdown b/...vanced_Branching_And_Merging/1_Advanced_Merging_Multiway_Merge_Subtree.markdown
@@ -1,7 +1,5 @@
 ### Multiway Merge ###
 
-
-
 You can merge several heads at one time by simply listing them on the same 
 linkgit:git-merge[1] command.  For instance,
 

diff --git a/text/52_The_Packfile/0_The_Packfile.markdown b/text/52_The_Packfile/0_The_Packfile.markdown
@@ -10,7 +10,8 @@ bookmarks into a packfile.
 
 There are two versions of the packfile index - version one, which is the default
 in versions of Git earlier than 1.6, and version two, which is the default
-from 1.6 forward, but which can be read by Git versions going back to 1.5.2. 
+from 1.6 forward, but which can be read by Git versions going back to 1.5.2, and
+has been further backported to 1.4.4.5 if you are still on the 1.4 series.
 
 Version 2 also includes a CRC checksum of each object so compressed data 
 can be copied directly from pack to pack during repacking without 
@@ -20,8 +21,15 @@ larger than 4 Gb.
 [fig:packfile-index]
 
 In both formats, the fanout table is simply a way to find the offset of a
-particular sha faster within the index file.  In version 1, the offsets and
-shas are in the same space, where in version two, there are seperate tables
+particular sha faster within the index file.  The offset/sha1[]
+tables are sorted by sha1[] values (this is to allow binary search of this
+table), and fanout[] table points at the offset/sha1[] table in a specific
+way (so that part of the latter table that covers all hashes that start
+with a given byte can be found to avoid 8 iterations of the binary
+search).
+
+In version 1, the offsets and shas are in the same space, where in version two, 
+there are seperate tables
 for the shas, crc checksums and offsets.  At the end of both files are 
 checksum shas for both the index file and the packfile it references.
 
@@ -33,12 +41,25 @@ a pack.  The packfile format is used in upload-pack and receieve-pack programs
 
 ### The Packfile Format ###
 
-The packfile itself is a very simple format.  The first four bytes is the 
-string 'PACK', which is sort of used to make sure you're getting the start 
-of the packfile correctly.  After that, you get a series of packed objects,
+The packfile itself is a very simple format.  There is a header, a series of
+packed objects (each with it's own header and body) and then a checksum trailer.
+The first four bytes is the string 'PACK', which is sort of used to make sure 
+you're getting the start of the packfile correctly.  This is followed by a 4-byte
+packfile version number and then a 4-byte number of entries in that file.  In
+Ruby, you might read the header data like this:
+
+	ruby
+	def read_pack_header
+	  sig = @session.recv(4)
+	  ver = @session.recv(4).unpack("N")[0]
+	  entries = @session.recv(4).unpack("N")[0]
+	  [sig, ver, entries]
+	end
+
+After that, you get a series of packed objects, in order of thier SHAs
 which each consist of an object header and object contents.  At the end
-of the packfile is a SHA1 sum of all the shas (in sorted order) in that
-packfile.
+of the packfile is a 20-byte SHA1 sum of all the shas (in sorted order) in that
+packfile. 
 
 [fig:packfile-format]
 
@@ -64,4 +85,19 @@ It is important to note that the size specified in the header data is not
 the size of the data that actually follows, but the size of that data *when 
 expanded*. This is why the offsets in the packfile index are so useful, 
 otherwise you have to expand every object just to tell when the next header 
-starts.
+starts.
+
+The data part is just zlib stream for non-delta object types; for the two
+delta object representations, the data portion contains something that
+identifies which base object this delta representation depends on, and the
+delta to apply on the base object to resurrect this object.  <code>ref-delta</code>
+uses 20-byte hash of the base object at the beginning of data, while
+<code>ofs-delta</code> stores an offset within the same packfile to identify the base
+object.  In either case, two important constraints a reimplementor must
+adhere to are:
+
+* delta representation must be based on some other object within the same
+  packfile;
+
+* the base object must be of the same underlying type (blob, tree, commit
+  or tag);