Skip to content

Commit

Permalink
Documentation: add Packfile URIs design doc
Browse files Browse the repository at this point in the history
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
  • Loading branch information
jonathantanmy authored and gitster committed Jun 11, 2020
1 parent fd194dd commit cd8402e
Show file tree
Hide file tree
Showing 2 changed files with 109 additions and 1 deletion.
78 changes: 78 additions & 0 deletions Documentation/technical/packfile-uri.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
Packfile URIs
=============

This feature allows servers to serve part of their packfile response as URIs.
This allows server designs that improve scalability in bandwidth and CPU usage
(for example, by serving some data through a CDN), and (in the future) provides
some measure of resumability to clients.

This feature is available only in protocol version 2.

Protocol
--------

The server advertises the `packfile-uris` capability.

If the client then communicates which protocols (HTTPS, etc.) it supports with
a `packfile-uris` argument, the server MAY send a `packfile-uris` section
directly before the `packfile` section (right after `wanted-refs` if it is
sent) containing URIs of any of the given protocols. The URIs point to
packfiles that use only features that the client has declared that it supports
(e.g. ofs-delta and thin-pack). See protocol-v2.txt for the documentation of
this section.

Clients should then download and index all the given URIs (in addition to
downloading and indexing the packfile given in the `packfile` section of the
response) before performing the connectivity check.

Server design
-------------

The server can be trivially made compatible with the proposed protocol by
having it advertise `packfile-uris`, tolerating the client sending
`packfile-uris`, and never sending any `packfile-uris` section. But we should
include some sort of non-trivial implementation in the Minimum Viable Product,
at least so that we can test the client.

This is the implementation: a feature, marked experimental, that allows the
server to be configured by one or more `uploadpack.blobPackfileUri=<sha1>
<uri>` entries. Whenever the list of objects to be sent is assembled, all such
blobs are excluded, replaced with URIs. The client will download those URIs,
expecting them to each point to packfiles containing single blobs.

Client design
-------------

The client has a config variable `fetch.uriprotocols` that determines which
protocols the end user is willing to use. By default, this is empty.

When the client downloads the given URIs, it should store them with "keep"
files, just like it does with the packfile in the `packfile` section. These
additional "keep" files can only be removed after the refs have been updated -
just like the "keep" file for the packfile in the `packfile` section.

The division of work (initial fetch + additional URIs) introduces convenient
points for resumption of an interrupted clone - such resumption can be done
after the Minimum Viable Product (see "Future work").

Future work
-----------

The protocol design allows some evolution of the server and client without any
need for protocol changes, so only a small-scoped design is included here to
form the MVP. For example, the following can be done:

* On the server, more sophisticated means of excluding objects (e.g. by
specifying a commit to represent that commit and all objects that it
references).
* On the client, resumption of clone. If a clone is interrupted, information
could be recorded in the repository's config and a "clone-resume" command
can resume the clone in progress. (Resumption of subsequent fetches is more
difficult because that must deal with the user wanting to use the repository
even after the fetch was interrupted.)

There are some possible features that will require a change in protocol:

* Additional HTTP headers (e.g. authentication)
* Byte range support
* Different file formats referenced by URIs (e.g. raw object)
32 changes: 31 additions & 1 deletion Documentation/technical/protocol-v2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -323,13 +323,26 @@ included in the client's request:
indicating its sideband (1, 2, or 3), and the server may send "0005\2"
(a PKT-LINE of sideband 2 with no payload) as a keepalive packet.

If the 'packfile-uris' feature is advertised, the following argument
can be included in the client's request as well as the potential
addition of the 'packfile-uris' section in the server's response as
explained below.

packfile-uris <comma-separated list of protocols>
Indicates to the server that the client is willing to receive
URIs of any of the given protocols in place of objects in the
sent packfile. Before performing the connectivity check, the
client should download from all given URIs. Currently, the
protocols supported are "http" and "https".

The response of `fetch` is broken into a number of sections separated by
delimiter packets (0001), with each section beginning with its section
header. Most sections are sent only when the packfile is sent.

output = acknowledgements flush-pkt |
[acknowledgments delim-pkt] [shallow-info delim-pkt]
[wanted-refs delim-pkt] packfile flush-pkt
[wanted-refs delim-pkt] [packfile-uris delim-pkt]
packfile flush-pkt

acknowledgments = PKT-LINE("acknowledgments" LF)
(nak | *ack)
Expand All @@ -347,6 +360,9 @@ header. Most sections are sent only when the packfile is sent.
*PKT-LINE(wanted-ref LF)
wanted-ref = obj-id SP refname

packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri
packfile-uri = PKT-LINE(40*(HEXDIGIT) SP *%x20-ff LF)

packfile = PKT-LINE("packfile" LF)
*PKT-LINE(%x01-03 *%x00-ff)

Expand Down Expand Up @@ -418,6 +434,20 @@ header. Most sections are sent only when the packfile is sent.
* The server MUST NOT send any refs which were not requested
using 'want-ref' lines.

packfile-uris section
* This section is only included if the client sent
'packfile-uris' and the server has at least one such URI to
send.

* Always begins with the section header "packfile-uris".

* For each URI the server sends, it sends a hash of the pack's
contents (as output by git index-pack) followed by the URI.

* The hashes are 40 hex characters long. When Git upgrades to a new
hash algorithm, this might need to be updated. (It should match
whatever index-pack outputs after "pack\t" or "keep\t".

packfile section
* This section is only included if the client has sent 'want'
lines in its request and either requested that no more
Expand Down

0 comments on commit cd8402e

Please sign in to comment.