Skip to content

Commit 5b124cd

Browse files
convert: add "status=delayed" to filter process protocol
Some `clean` / `smudge` filters may require a significant amount of time to process a single blob (e.g. the Git LFS smudge filter might perform network requests). During this process the Git checkout operation is blocked and Git needs to wait until the filter is done to continue with the checkout. Teach the filter process protocol, introduced in edcc858 ("convert: add filter.<driver>.process option", 2016-10-16), to accept the status "delayed" as response to a filter request. Upon this response Git continues with the checkout operation. After the checkout operation Git calls "finish_delayed_checkout" which queries the filter for remaining blobs. If the filter is still working on the completion, then the filter is expected to block. If the filter has completed all remaining blobs then an empty response is expected. Git has a multiple code paths that checkout a blob. Support delayed checkouts only in `clone` (in unpack-trees.c) and `checkout` operations for now. The optimization is most effective in these code paths as all files of the tree are processed. Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
1 parent dd7ff54 commit 5b124cd

File tree

9 files changed

+579
-91
lines changed

9 files changed

+579
-91
lines changed

Documentation/gitattributes.txt

Lines changed: 65 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -425,8 +425,8 @@ packet: git< capability=clean
425425
packet: git< capability=smudge
426426
packet: git< 0000
427427
------------------------
428-
Supported filter capabilities in version 2 are "clean" and
429-
"smudge".
428+
Supported filter capabilities in version 2 are "clean", "smudge",
429+
and "delay".
430430

431431
Afterwards Git sends a list of "key=value" pairs terminated with
432432
a flush packet. The list will contain at least the filter command
@@ -512,12 +512,73 @@ the protocol then Git will stop the filter process and restart it
512512
with the next file that needs to be processed. Depending on the
513513
`filter.<driver>.required` flag Git will interpret that as error.
514514

515-
After the filter has processed a blob it is expected to wait for
516-
the next "key=value" list containing a command. Git will close
515+
After the filter has processed a command it is expected to wait for
516+
a "key=value" list containing the next command. Git will close
517517
the command pipe on exit. The filter is expected to detect EOF
518518
and exit gracefully on its own. Git will wait until the filter
519519
process has stopped.
520520

521+
Delay
522+
^^^^^
523+
524+
If the filter supports the "delay" capability, then Git can send the
525+
flag "can-delay" after the filter command and pathname. This flag
526+
denotes that the filter can delay filtering the current blob (e.g. to
527+
compensate network latencies) by responding with no content but with
528+
the status "delayed" and a flush packet.
529+
------------------------
530+
packet: git> command=smudge
531+
packet: git> pathname=path/testfile.dat
532+
packet: git> can-delay=1
533+
packet: git> 0000
534+
packet: git> CONTENT
535+
packet: git> 0000
536+
packet: git< status=delayed
537+
packet: git< 0000
538+
------------------------
539+
540+
If the filter supports the "delay" capability then it must support the
541+
"list_available_blobs" command. If Git sends this command, then the
542+
filter is expected to return a list of pathnames representing blobs
543+
that have been delayed earlier and are now available.
544+
The list must be terminated with a flush packet followed
545+
by a "success" status that is also terminated with a flush packet. If
546+
no blobs for the delayed paths are available, yet, then the filter is
547+
expected to block the response until at least one blob becomes
548+
available. The filter can tell Git that it has no more delayed blobs
549+
by sending an empty list. As soon as the filter responds with an empty
550+
list, Git stops asking. All blobs that Git has not received at this
551+
point are considered missing and will result in an error.
552+
553+
------------------------
554+
packet: git> command=list_available_blobs
555+
packet: git> 0000
556+
packet: git< pathname=path/testfile.dat
557+
packet: git< pathname=path/otherfile.dat
558+
packet: git< 0000
559+
packet: git< status=success
560+
packet: git< 0000
561+
------------------------
562+
563+
After Git received the pathnames, it will request the corresponding
564+
blobs again. These requests contain a pathname and an empty content
565+
section. The filter is expected to respond with the smudged content
566+
in the usual way as explained above.
567+
------------------------
568+
packet: git> command=smudge
569+
packet: git> pathname=path/testfile.dat
570+
packet: git> 0000
571+
packet: git> 0000 # empty content!
572+
packet: git< status=success
573+
packet: git< 0000
574+
packet: git< SMUDGED_CONTENT
575+
packet: git< 0000
576+
packet: git< 0000 # empty list, keep "status=success" unchanged!
577+
------------------------
578+
579+
Example
580+
^^^^^^^
581+
521582
A long running filter demo implementation can be found in
522583
`contrib/long-running-filter/example.pl` located in the Git
523584
core repository. If you develop your own long running filter

builtin/checkout.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -376,6 +376,8 @@ static int checkout_paths(const struct checkout_opts *opts,
376376
state.force = 1;
377377
state.refresh_cache = 1;
378378
state.istate = &the_index;
379+
380+
enable_delayed_checkout(&state);
379381
for (pos = 0; pos < active_nr; pos++) {
380382
struct cache_entry *ce = active_cache[pos];
381383
if (ce->ce_flags & CE_MATCHED) {
@@ -390,6 +392,7 @@ static int checkout_paths(const struct checkout_opts *opts,
390392
pos = skip_same_name(ce, pos) - 1;
391393
}
392394
}
395+
errs |= finish_delayed_checkout(&state);
393396

394397
if (write_locked_index(&the_index, lock_file, COMMIT_LOCK))
395398
die(_("unable to write new index file"));

cache.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1544,6 +1544,7 @@ struct checkout {
15441544
struct index_state *istate;
15451545
const char *base_dir;
15461546
int base_dir_len;
1547+
struct delayed_checkout *delayed_checkout;
15471548
unsigned force:1,
15481549
quiet:1,
15491550
not_new:1,
@@ -1553,6 +1554,8 @@ struct checkout {
15531554

15541555
#define TEMPORARY_FILENAME_LENGTH 25
15551556
extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
1557+
extern void enable_delayed_checkout(struct checkout *state);
1558+
extern int finish_delayed_checkout(struct checkout *state);
15561559

15571560
struct cache_def {
15581561
struct strbuf path;

convert.c

Lines changed: 98 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -496,6 +496,7 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le
496496

497497
#define CAP_CLEAN (1u<<0)
498498
#define CAP_SMUDGE (1u<<1)
499+
#define CAP_DELAY (1u<<2)
499500

500501
struct cmd2process {
501502
struct subprocess_entry subprocess; /* must be the first member! */
@@ -533,7 +534,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
533534
if (err)
534535
goto done;
535536

536-
err = packet_writel(process->in, "capability=clean", "capability=smudge", NULL);
537+
err = packet_writel(process->in,
538+
"capability=clean", "capability=smudge", "capability=delay", NULL);
537539

538540
for (;;) {
539541
cap_buf = packet_read_line(process->out, NULL);
@@ -549,6 +551,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
549551
entry->supported_capabilities |= CAP_CLEAN;
550552
} else if (!strcmp(cap_name, "smudge")) {
551553
entry->supported_capabilities |= CAP_SMUDGE;
554+
} else if (!strcmp(cap_name, "delay")) {
555+
entry->supported_capabilities |= CAP_DELAY;
552556
} else {
553557
warning(
554558
"external filter '%s' requested unsupported filter capability '%s'",
@@ -590,9 +594,11 @@ static void handle_filter_error(const struct strbuf *filter_status,
590594

591595
static int apply_multi_file_filter(const char *path, const char *src, size_t len,
592596
int fd, struct strbuf *dst, const char *cmd,
593-
const unsigned int wanted_capability)
597+
const unsigned int wanted_capability,
598+
struct delayed_checkout *dco)
594599
{
595600
int err;
601+
int can_delay = 0;
596602
struct cmd2process *entry;
597603
struct child_process *process;
598604
struct strbuf nbuf = STRBUF_INIT;
@@ -647,6 +653,14 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
647653
if (err)
648654
goto done;
649655

656+
if ((entry->supported_capabilities & CAP_DELAY) &&
657+
dco && dco->state == CE_CAN_DELAY) {
658+
can_delay = 1;
659+
err = packet_write_fmt_gently(process->in, "can-delay=1\n");
660+
if (err)
661+
goto done;
662+
}
663+
650664
err = packet_flush_gently(process->in);
651665
if (err)
652666
goto done;
@@ -662,14 +676,74 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
662676
if (err)
663677
goto done;
664678

665-
err = strcmp(filter_status.buf, "success");
679+
if (can_delay && !strcmp(filter_status.buf, "delayed")) {
680+
dco->is_delayed = 1;
681+
string_list_insert(&dco->filters, cmd);
682+
string_list_insert(&dco->paths, path);
683+
} else {
684+
/* The filter got the blob and wants to send us a response. */
685+
err = strcmp(filter_status.buf, "success");
686+
if (err)
687+
goto done;
688+
689+
err = read_packetized_to_strbuf(process->out, &nbuf) < 0;
690+
if (err)
691+
goto done;
692+
693+
err = subprocess_read_status(process->out, &filter_status);
694+
if (err)
695+
goto done;
696+
697+
err = strcmp(filter_status.buf, "success");
698+
}
699+
700+
done:
701+
sigchain_pop(SIGPIPE);
702+
703+
if (err)
704+
handle_filter_error(&filter_status, entry, wanted_capability);
705+
else
706+
strbuf_swap(dst, &nbuf);
707+
strbuf_release(&nbuf);
708+
return !err;
709+
}
710+
711+
712+
int async_query_available_blobs(const char *cmd, struct string_list *available_paths)
713+
{
714+
int err;
715+
char *line;
716+
struct cmd2process *entry;
717+
struct child_process *process;
718+
struct strbuf filter_status = STRBUF_INIT;
719+
720+
assert(subprocess_map_initialized);
721+
entry = (struct cmd2process *)subprocess_find_entry(&subprocess_map, cmd);
722+
if (!entry) {
723+
error("external filter '%s' is not available anymore although "
724+
"not all paths have been filtered", cmd);
725+
return 0;
726+
}
727+
process = &entry->subprocess.process;
728+
sigchain_push(SIGPIPE, SIG_IGN);
729+
730+
err = packet_write_fmt_gently(
731+
process->in, "command=list_available_blobs\n");
666732
if (err)
667733
goto done;
668734

669-
err = read_packetized_to_strbuf(process->out, &nbuf) < 0;
735+
err = packet_flush_gently(process->in);
670736
if (err)
671737
goto done;
672738

739+
while ((line = packet_read_line(process->out, NULL))) {
740+
const char *path;
741+
if (skip_prefix(line, "pathname=", &path))
742+
string_list_insert(available_paths, xstrdup(path));
743+
else
744+
; /* ignore unknown keys */
745+
}
746+
673747
err = subprocess_read_status(process->out, &filter_status);
674748
if (err)
675749
goto done;
@@ -680,10 +754,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
680754
sigchain_pop(SIGPIPE);
681755

682756
if (err)
683-
handle_filter_error(&filter_status, entry, wanted_capability);
684-
else
685-
strbuf_swap(dst, &nbuf);
686-
strbuf_release(&nbuf);
757+
handle_filter_error(&filter_status, entry, 0);
687758
return !err;
688759
}
689760

@@ -698,7 +769,8 @@ static struct convert_driver {
698769

699770
static int apply_filter(const char *path, const char *src, size_t len,
700771
int fd, struct strbuf *dst, struct convert_driver *drv,
701-
const unsigned int wanted_capability)
772+
const unsigned int wanted_capability,
773+
struct delayed_checkout *dco)
702774
{
703775
const char *cmd = NULL;
704776

@@ -716,7 +788,8 @@ static int apply_filter(const char *path, const char *src, size_t len,
716788
if (cmd && *cmd)
717789
return apply_single_file_filter(path, src, len, fd, dst, cmd);
718790
else if (drv->process && *drv->process)
719-
return apply_multi_file_filter(path, src, len, fd, dst, drv->process, wanted_capability);
791+
return apply_multi_file_filter(path, src, len, fd, dst,
792+
drv->process, wanted_capability, dco);
720793

721794
return 0;
722795
}
@@ -1057,7 +1130,7 @@ int would_convert_to_git_filter_fd(const char *path)
10571130
if (!ca.drv->required)
10581131
return 0;
10591132

1060-
return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN);
1133+
return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN, NULL);
10611134
}
10621135

10631136
const char *get_convert_attr_ascii(const char *path)
@@ -1094,7 +1167,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
10941167

10951168
convert_attrs(&ca, path);
10961169

1097-
ret |= apply_filter(path, src, len, -1, dst, ca.drv, CAP_CLEAN);
1170+
ret |= apply_filter(path, src, len, -1, dst, ca.drv, CAP_CLEAN, NULL);
10981171
if (!ret && ca.drv && ca.drv->required)
10991172
die("%s: clean filter '%s' failed", path, ca.drv->name);
11001173

@@ -1119,7 +1192,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
11191192
assert(ca.drv);
11201193
assert(ca.drv->clean || ca.drv->process);
11211194

1122-
if (!apply_filter(path, NULL, 0, fd, dst, ca.drv, CAP_CLEAN))
1195+
if (!apply_filter(path, NULL, 0, fd, dst, ca.drv, CAP_CLEAN, NULL))
11231196
die("%s: clean filter '%s' failed", path, ca.drv->name);
11241197

11251198
crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
@@ -1128,7 +1201,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
11281201

11291202
static int convert_to_working_tree_internal(const char *path, const char *src,
11301203
size_t len, struct strbuf *dst,
1131-
int normalizing)
1204+
int normalizing, struct delayed_checkout *dco)
11321205
{
11331206
int ret = 0, ret_filter = 0;
11341207
struct conv_attrs ca;
@@ -1153,21 +1226,29 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
11531226
}
11541227
}
11551228

1156-
ret_filter = apply_filter(path, src, len, -1, dst, ca.drv, CAP_SMUDGE);
1229+
ret_filter = apply_filter(
1230+
path, src, len, -1, dst, ca.drv, CAP_SMUDGE, dco);
11571231
if (!ret_filter && ca.drv && ca.drv->required)
11581232
die("%s: smudge filter %s failed", path, ca.drv->name);
11591233

11601234
return ret | ret_filter;
11611235
}
11621236

1237+
int async_convert_to_working_tree(const char *path, const char *src,
1238+
size_t len, struct strbuf *dst,
1239+
void *dco)
1240+
{
1241+
return convert_to_working_tree_internal(path, src, len, dst, 0, dco);
1242+
}
1243+
11631244
int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst)
11641245
{
1165-
return convert_to_working_tree_internal(path, src, len, dst, 0);
1246+
return convert_to_working_tree_internal(path, src, len, dst, 0, NULL);
11661247
}
11671248

11681249
int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst)
11691250
{
1170-
int ret = convert_to_working_tree_internal(path, src, len, dst, 1);
1251+
int ret = convert_to_working_tree_internal(path, src, len, dst, 1, NULL);
11711252
if (ret) {
11721253
src = dst->buf;
11731254
len = dst->len;

0 commit comments

Comments
 (0)