Skip to content

Commit 9877046

Browse files
convert: add "status=delayed" to filter process protocol
Some `clean` / `smudge` filters may require a significant amount of time to process a single blob (e.g. the Git LFS smudge filter might perform network requests). During this process the Git checkout operation is blocked and Git needs to wait until the filter is done to continue with the checkout. Teach the filter process protocol, introduced in edcc858 ("convert: add filter.<driver>.process option", 2016-10-16), to accept the status "delayed" as response to a filter request. Upon this response Git continues with the checkout operation. After the checkout operation Git calls "finish_delayed_checkout" which queries the filter for remaining blobs. If the filter is still working on the completion, then the filter is expected to block. If the filter has completed all remaining blobs then an empty response is expected. Git has a multiple code paths that checkout a blob. Support delayed checkouts only in `clone` (in unpack-trees.c) and `checkout` operations for now. The optimization is most effective in these code paths as all files of the tree are processed. Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
1 parent ed60c84 commit 9877046

File tree

9 files changed

+602
-102
lines changed

9 files changed

+602
-102
lines changed

Documentation/gitattributes.txt

Lines changed: 65 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -425,8 +425,8 @@ packet: git< capability=clean
425425
packet: git< capability=smudge
426426
packet: git< 0000
427427
------------------------
428-
Supported filter capabilities in version 2 are "clean" and
429-
"smudge".
428+
Supported filter capabilities in version 2 are "clean", "smudge",
429+
and "delay".
430430

431431
Afterwards Git sends a list of "key=value" pairs terminated with
432432
a flush packet. The list will contain at least the filter command
@@ -512,12 +512,73 @@ the protocol then Git will stop the filter process and restart it
512512
with the next file that needs to be processed. Depending on the
513513
`filter.<driver>.required` flag Git will interpret that as error.
514514

515-
After the filter has processed a blob it is expected to wait for
516-
the next "key=value" list containing a command. Git will close
515+
After the filter has processed a command it is expected to wait for
516+
a "key=value" list containing the next command. Git will close
517517
the command pipe on exit. The filter is expected to detect EOF
518518
and exit gracefully on its own. Git will wait until the filter
519519
process has stopped.
520520

521+
Delay
522+
^^^^^
523+
524+
If the filter supports the "delay" capability, then Git can send the
525+
flag "can-delay" after the filter command and pathname. This flag
526+
denotes that the filter can delay filtering the current blob (e.g. to
527+
compensate network latencies) by responding with no content but with
528+
the status "delayed" and a flush packet.
529+
------------------------
530+
packet: git> command=smudge
531+
packet: git> pathname=path/testfile.dat
532+
packet: git> can-delay=1
533+
packet: git> 0000
534+
packet: git> CONTENT
535+
packet: git> 0000
536+
packet: git< status=delayed
537+
packet: git< 0000
538+
------------------------
539+
540+
If the filter supports the "delay" capability then it must support the
541+
"list_available_blobs" command. If Git sends this command, then the
542+
filter is expected to return a list of pathnames representing blobs
543+
that have been delayed earlier and are now available.
544+
The list must be terminated with a flush packet followed
545+
by a "success" status that is also terminated with a flush packet. If
546+
no blobs for the delayed paths are available, yet, then the filter is
547+
expected to block the response until at least one blob becomes
548+
available. The filter can tell Git that it has no more delayed blobs
549+
by sending an empty list. As soon as the filter responds with an empty
550+
list, Git stops asking. All blobs that Git has not received at this
551+
point are considered missing and will result in an error.
552+
553+
------------------------
554+
packet: git> command=list_available_blobs
555+
packet: git> 0000
556+
packet: git< pathname=path/testfile.dat
557+
packet: git< pathname=path/otherfile.dat
558+
packet: git< 0000
559+
packet: git< status=success
560+
packet: git< 0000
561+
------------------------
562+
563+
After Git received the pathnames, it will request the corresponding
564+
blobs again. These requests contain a pathname and an empty content
565+
section. The filter is expected to respond with the smudged content
566+
in the usual way as explained above.
567+
------------------------
568+
packet: git> command=smudge
569+
packet: git> pathname=path/testfile.dat
570+
packet: git> 0000
571+
packet: git> 0000 # empty content!
572+
packet: git< status=success
573+
packet: git< 0000
574+
packet: git< SMUDGED_CONTENT
575+
packet: git< 0000
576+
packet: git< 0000 # empty list, keep "status=success" unchanged!
577+
------------------------
578+
579+
Example
580+
^^^^^^^
581+
521582
A long running filter demo implementation can be found in
522583
`contrib/long-running-filter/example.pl` located in the Git
523584
core repository. If you develop your own long running filter

builtin/checkout.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -376,6 +376,8 @@ static int checkout_paths(const struct checkout_opts *opts,
376376
state.force = 1;
377377
state.refresh_cache = 1;
378378
state.istate = &the_index;
379+
380+
enable_delayed_checkout(&state);
379381
for (pos = 0; pos < active_nr; pos++) {
380382
struct cache_entry *ce = active_cache[pos];
381383
if (ce->ce_flags & CE_MATCHED) {
@@ -390,6 +392,7 @@ static int checkout_paths(const struct checkout_opts *opts,
390392
pos = skip_same_name(ce, pos) - 1;
391393
}
392394
}
395+
errs |= finish_delayed_checkout(&state);
393396

394397
if (write_locked_index(&the_index, lock_file, COMMIT_LOCK))
395398
die(_("unable to write new index file"));

cache.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1544,6 +1544,7 @@ struct checkout {
15441544
struct index_state *istate;
15451545
const char *base_dir;
15461546
int base_dir_len;
1547+
struct delayed_checkout *delayed_checkout;
15471548
unsigned force:1,
15481549
quiet:1,
15491550
not_new:1,
@@ -1553,6 +1554,8 @@ struct checkout {
15531554

15541555
#define TEMPORARY_FILENAME_LENGTH 25
15551556
extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
1557+
extern void enable_delayed_checkout(struct checkout *state);
1558+
extern int finish_delayed_checkout(struct checkout *state);
15561559

15571560
struct cache_def {
15581561
struct strbuf path;

0 commit comments

Comments
 (0)