New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cat-file: force flush of stdout on empty string #1124
base: next
Are you sure you want to change the base?
Conversation
"git fetch --quiet" optimization to avoid useless computation of info that will never be displayed. * ps/fetch-omit-formatting-under-quiet: fetch: skip formatting updated refs with `--quiet`
Code simplification with reduced memory usage. * tb/add-objects-in-unpacked-packs-simplify: builtin/pack-objects.c: remove duplicate hash lookup builtin/pack-objects.c: simplify add_objects_in_unpacked_packs() object-store.h: teach for_each_packed_object to ignore kept packs
Typofix. * es/walken-tutorial-fix: doc: fix syntax error and the format of printf
The sparse-index support can corrupt the index structure by storing a stale and/or uninitialized data, which has been corrected. * jh/sparse-index-resize-fix: sparse-index: copy dir_hash in ensure_full_index()
Buggy tests could damage repositories outside the throw-away test area we created. We now by default export GIT_CEILING_DIRECTORIES to limit the damage from such a stray test. * sg/set-ceiling-during-tests: test-lib: set GIT_CEILING_DIRECTORIES to protect the surrounding repository
Code simplification. * rs/more-fspathcmp: merge-recursive: use fspathcmp() in path_hashmap_cmp()
Even when running "git send-email" without its own threaded discussion support, a threading related header in one message is carried over to the subsequent message to result in an unwanted threading, which has been corrected. * mh/send-email-reset-in-reply-to: send-email: avoid incorrect header propagation
Fixes on usage message from "git commit-graph". * ab/commit-graph-usage: commit-graph: show "unexpected subcommand" error commit-graph: show usage on "commit-graph [write|verify] garbage" commit-graph: early exit to "usage" on !argc multi-pack-index: refactor "goto usage" pattern commit-graph: use parse_options_concat() commit-graph: remove redundant handling of -h commit-graph: define common usage with a macro
Leakfix. * ba/object-info: protocol-caps.c: fix memory leak in send_info()
The output from "git fast-export", when its anonymization feature is in use, showed an annotated tag incorrectly. * tk/fast-export-anonymized-tag-fix: fast-export: fix anonymized tag using original length
Update the userdiff pattern for PHP. * uk/userdiff-php-enum: userdiff: support enum keyword in PHP hunk header
A pathname in an advice message has been made cut-and-paste ready. * ab/gc-log-rephrase: gc: remove trailing dot from "gc.log" line
Leakfix. * ab/mailmap-leakfix: mailmap.c: fix a memory leak in free_mailap_{info,entry}()
Use ssh public crypto for object and push-cert signing. * fs/ssh-signing: ssh signing: test that gpg fails for unknown keys ssh signing: tests for logs, tags & push certs ssh signing: duplicate t7510 tests for commits ssh signing: verify signatures using ssh-keygen ssh signing: provide a textual signing_key_id ssh signing: retrieve a default key from ssh-agent ssh signing: add ssh key format and signing code ssh signing: add test prereqs ssh signing: preliminary refactoring and clean-up
After "git clone --recurse-submodules", all submodules are cloned but they are not by default recursed into by other commands. With submodule.stickyRecursiveClone configuration set, submodule.recurse configuration is set to true in a repository created by "clone" with "--recurse-submodules" option. * mk/clone-recurse-submodules: clone: set submodule.recurse=true if submodule.stickyRecursiveClone enabled
Code clean up to migrate callers from older advice_config[] based API to newer advice_if_enabled() and advice_enabled() API. * ab/retire-advice-config: advice: move advice.graftFileDeprecated squashing to commit.[ch] advice: remove use of global advice_add_embedded_repo advice: remove read uses of most global `advice_` variables advice: add enum variants for missing advice variables
Build fix. * cb/remote-ndebug-fix: remote: avoid -Wunused-but-set-variable in gcc with -DNDEBUG
Doc update plus improved error reporting. * jk/log-warn-on-bogus-encoding: docs: use "character encoding" to refer to commit-object encoding logmsg_reencode(): warn when iconv() fails
Code cleanup. * rs/show-branch-simplify: show-branch: simplify rev_is_head()
Code cleanup. * rs/archive-use-object-id: archive: convert queue_directory to struct object_id
Optimize code that handles large number of refs in the "git fetch" code path. * ps/fetch-optim: fetch: avoid second connectivity check if we already have all objects fetch: merge fetching and consuming refs fetch: refactor fetch refs to be more extendable fetch-pack: optimize loading of refs via commit graph connected: refactor iterator to return next object ID directly fetch: avoid unpacking headers in object existence check fetch: speed up lookup of want refs via commit-graph
The reachability bitmap file used to be generated only for a single pack, but now we've learned to generate bitmaps for history that span across multiple packfiles. * tb/multi-pack-bitmaps: (27 commits) p5326: perf tests for MIDX bitmaps p5310: extract full and partial bitmap tests midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' t7700: update to work with MIDX bitmap test knob t5319: don't write MIDX bitmaps in t5319 t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP t5326: test multi-pack bitmap behavior t/helper/test-read-midx.c: add --checksum mode t5310: move some tests to lib-bitmap.sh pack-bitmap: write multi-pack bitmaps pack-bitmap: read multi-pack bitmaps pack-bitmap.c: avoid redundant calls to try_partial_reuse pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' pack-bitmap.c: introduce 'nth_bitmap_object_oid()' pack-bitmap.c: introduce 'bitmap_num_objects()' midx: avoid opening multiple MIDXs when writing midx: close linked MIDXs, avoid leaking memory midx: infer preferred pack when not given one midx: reject empty `--preferred-pack`'s ...
Recent "diff -m" changes broke "gitk", which has been corrected. * so/diff-index-regression-fix: diff-index: restore -c/--cc options handling
Regression fix. * ab/send-email-config-fix: send-email: fix a "first config key wins" regression in v2.33.0
Build simplification. * ab/no-more-check-bindir: Makefile: remove the check_bindir script
Docfix. * bs/doc-bugreport-outdir: Documentation: fix default directory of git bugreport -o
Tie-break branches that point at the same object in the list of branches on GitWeb to show the one pointed at by HEAD early. * gh/gitweb-branch-sort: gitweb: use HEAD as secondary sort key in git_get_heads_list()
The code to make "git grep" recurse into submodules has been updated to migrate away from the "add submodule's object store as an alternate object store" mechanism (which is suboptimal). * jt/grep-wo-submodule-odb-as-alternate: t7814: show lack of alternate ODB-adding submodule-config: pass repo upon blob config read grep: add repository to OID grep sources grep: allocate subrepos on heap grep: read submodule entry with explicit repo grep: typesafe versions of grep_source_init grep: use submodule-ODB-as-alternate lazy-addition submodule: lazily add submodule ODBs as alternates
* rs/ssh-signing-fix: gpg-interface: avoid buffer overrun in parse_ssh_output() gpg-interface: handle missing " with " gracefully in parse_ssh_output()
* jx/message-fixes: i18n: fix typos found during l10n for git 2.34.0
"git pull --no-verify" did not affect the underlying "git merge". * ar/fix-git-pull-no-verify: pull: honor --no-verify and do not call the commit-msg hook
Doc update. * ar/no-verify-doc: Document positive variant of commit and merge option "--no-verify"
The compatibility implementation for unsetenv(3) were written to mimic ancient, non-POSIX, variant seen in an old glibc; it has been changed to return an integer to match the more modern era. * jc/unsetenv-returns-an-int: unsetenv(3) returns int, not void
The command line complation for "git send-email" options have been tweaked to make it easier to keep it in sync with the command itself. * tp/send-email-completion: send-email docs: add format-patch options send-email: programmatically generate bash completions
Leakfix. * tb/plug-pack-bitmap-leaks: pack-bitmap.c: more aggressively free in free_bitmap_index() pack-bitmap.c: don't leak type-level bitmaps midx.c: write MIDX filenames to strbuf builtin/multi-pack-index.c: don't leak concatenated options builtin/repack.c: avoid leaking child arguments builtin/pack-objects.c: don't leak memory via arguments t/helper/test-read-midx.c: free MIDX within read_midx_file() midx.c: don't leak MIDX from verify_midx_file midx.c: clean up chunkfile after reading the MIDX
Code simplification. * rd/http-backend-code-simplification: http-backend: remove a duplicated code branch
72e3ebc
to
70f967d
Compare
When in --buffer mode, it is very useful for the caller to have control over when the buffer is flushed. Currently there is no convenient way to signal for the buffer to be flushed. One workaround is to provide any nonexisting commit to git-cat-file's stdin, in which case the buffer will be flushed and a "$FOO missing" message will be displayed. However, this is not an ideal workaround. Instead, this commit teaches git-cat-file to look for an empty string in stdin, which will trigger a flush of stdout. Signed-off-by: John Cai <johncai86@gmail.com>
When an empty string is entered into stdin, git-cat-file --buffer will flush stdout immediately. This commit updates the man page accordingly. Signed-off-by: John Cai <johncai86@gmail.com>
70f967d
to
18d6c28
Compare
/preview |
Preview email sent as pull.1124.git.git.1636147086.gitgitgadget@gmail.com |
/submit |
Submitted as pull.1124.git.git.1636149400.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
@@ -405,6 +405,11 @@ static void batch_one_object(const char *obj_name, | |||
int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
> @@ -405,6 +405,11 @@ static void batch_one_object(const char *obj_name,
> int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0;
> enum get_oid_result result;
>
> + if (opt->buffer_output && obj_name[0] == '\0') {
> + fflush(stdout);
> + return;
> + }
> +
This might work in practice, but it a bad design taste to add this
change here. The function is designed to take an object name
string, and it even prepares a flag variable needed to make a call
to turn that object name into object data. We do not need to
contaminate the interface with "usually this takes an object name,
but there are these other special cases ...". The higher in the
callchain we place special cases, the better the lower level
functions become, as that allows them to concentrate on doing one
single thing well.
> result = get_oid_with_context(the_repository, obj_name,
> flags, &data->oid, &ctx);
> if (result != FOUND) {
> @@ -609,7 +614,11 @@ static int batch_objects(struct batch_options *opt)
> data.rest = p;
> }
>
> - batch_one_object(input.buf, &output, opt, &data);
> + /*
> + * When in buffer mode and input.buf is an empty string,
> + * flush to stdout.
> + */
Checking "do we have the flush instruction (in which case we'd do
the flush here), or do we have textual name of an object (in which
case we'd call batch_one_object())?" here would be far cleaner and
results in an easier-to-explain code. With a cleanly written code
to do so, it probably does not even need a new comment here.
This brings up another issue. Is "flushing" the *ONLY* special
thing we would ever do in this codepath in the future? I doubt so.
Squatting on an "empty string" is a selfish design that hurts those
who will come after you in the future, as they need to find other
ways to ask for a "special thing".
If we are inventing a special syntax that allows us to spell
commands that are distinguishable from a validly-spelled object name
to cause something special (like "flushing the output stream"),
perhaps we want to use a bit more extensible and explicit syntax and
use it from day one?
For example, if no string that begins with three dots can ever be a
valid way to spell an object name, perhaps "...flush" might be a
better "please do this special thing" syntax than an empty string.
It is easily extensible (the next special thing can follow suit to
say "...$verb" to tell the machinery to $verb the input). When we
compare between an empty string and "...flush", the latter clearly
is more descriptive, too.
Note that I offhand do not know if "a valid string that name an
object would never begin with three-dot" is true. Please check
if that is true if you choose to use it, or you can find and use
another convention that allows us to clearly distinguish the
"special" instruction and object names.
Thanks.
> + batch_one_object(input.buf, &output, opt, &data);
> }
>
> strbuf_release(&input);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):
On Fri, Nov 05 2021, Junio C Hamano wrote:
> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> @@ -405,6 +405,11 @@ static void batch_one_object(const char *obj_name,
>> int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0;
>> enum get_oid_result result;
>>
>> + if (opt->buffer_output && obj_name[0] == '\0') {
>> + fflush(stdout);
>> + return;
>> + }
>> +
>
> This might work in practice, but it a bad design taste to add this
> change here. The function is designed to take an object name
> string, and it even prepares a flag variable needed to make a call
> to turn that object name into object data. We do not need to
> contaminate the interface with "usually this takes an object name,
> but there are these other special cases ...". The higher in the
> callchain we place special cases, the better the lower level
> functions become, as that allows them to concentrate on doing one
> single thing well.
>
>> result = get_oid_with_context(the_repository, obj_name,
>> flags, &data->oid, &ctx);
>> if (result != FOUND) {
>> @@ -609,7 +614,11 @@ static int batch_objects(struct batch_options *opt)
>> data.rest = p;
>> }
>>
>> - batch_one_object(input.buf, &output, opt, &data);
>> + /*
>> + * When in buffer mode and input.buf is an empty string,
>> + * flush to stdout.
>> + */
>
> Checking "do we have the flush instruction (in which case we'd do
> the flush here), or do we have textual name of an object (in which
> case we'd call batch_one_object())?" here would be far cleaner and
> results in an easier-to-explain code. With a cleanly written code
> to do so, it probably does not even need a new comment here.
>
> This brings up another issue. Is "flushing" the *ONLY* special
> thing we would ever do in this codepath in the future? I doubt so.
> Squatting on an "empty string" is a selfish design that hurts those
> who will come after you in the future, as they need to find other
> ways to ask for a "special thing".
>
> If we are inventing a special syntax that allows us to spell
> commands that are distinguishable from a validly-spelled object name
> to cause something special (like "flushing the output stream"),
> perhaps we want to use a bit more extensible and explicit syntax and
> use it from day one?
>
> For example, if no string that begins with three dots can ever be a
> valid way to spell an object name, perhaps "...flush" might be a
> better "please do this special thing" syntax than an empty string.
> It is easily extensible (the next special thing can follow suit to
> say "...$verb" to tell the machinery to $verb the input). When we
> compare between an empty string and "...flush", the latter clearly
> is more descriptive, too.
>
> Note that I offhand do not know if "a valid string that name an
> object would never begin with three-dot" is true. Please check
> if that is true if you choose to use it, or you can find and use
> another convention that allows us to clearly distinguish the
> "special" instruction and object names.
I had much the same thought, this is a useful feature, but let's not
squat on the one bit of open syntax we have.
John: I think a better direction here is to add a mode to cat-file to
emulate what "git update-ref --stdin" supports. Here's a demo of that
(also quoted below):
https://github.com/git/git/commit/7794f6cfdbdca0dd6bab0dea16193ebf018b86a9
That's on top of some general UI improvements to cat-file I've got
locally:
https://github.com/git/git/compare/master...avar:avar/cat-file-usage-and-options-handling
That WIP patch on top follows below, of course it's a *lot* more initial
scaffolding, but I think once we get past that initial step it's a much
better path forward. As noted the code is also almost entirely
copy/pasted from update-ref.c, and perhaps some of the shared parts
could be moved to some library both could use.
I couldn't think of a better name than --stdin-cmd, suggestions most
welcome.
From 7794f6cfdbdca0dd6bab0dea16193ebf018b86a9 Mon Sep 17 00:00:00 2001
Message-Id: <patch-1.1-7794f6cfdbd-20211106T040307Z-avarab@gmail.com>
From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
<avarab@gmail.com>
Date: Sat, 6 Nov 2021 04:54:04 +0100
Subject: [PATCH] WIP cat-file: add a --stdin-cmd mode
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This WIP patch is mostly stealing code from builtin/update-ref.c and
implementing the same sort of prefixed command-mode that it
supports. I.e. in addition to --batch now supporting:
<object> LF
It'll support with --stdin-cmd, with and without -z, respectively:
object <object> NL
object <object> NUL
The plus being that we can now implement additional commands. Let's
start that by scratching the itch John Cai wanted to address in [1]
and implement a (with and without -z):
fflush NL
fflush NUL
That command simply calls fflush(stdout), which could be done as an
emergent effect before by feeding the input a "NL".
I think this will be useful for other things, e.g. I've observed in
the past that a not-trivial part of "cat-file --batch" time is spent
on parsing its <object> argument and seeing if it's a revision, ref
etc.
So we could e.g. add a command that only accepts a full-length 40
character SHA-1, or switch the --format output mid-request etc.
1. https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/cat-file.c | 116 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 115 insertions(+), 1 deletion(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index b76f2a00046..afdb976c6e7 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -26,7 +26,10 @@ struct batch_options {
int unordered;
int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
const char *format;
+ int stdin_cmd;
+ int end_null;
};
+static char line_termination = '\n';
static const char *force_path;
@@ -507,6 +510,106 @@ static int batch_unordered_packed(const struct object_id *oid,
data);
}
+enum batch_state {
+ /* Non-transactional state open for commands. */
+ BATCH_STATE_OPEN,
+};
+
+static void parse_cmd_object(struct batch_options *opt,
+ const char *next, const char *end,
+ struct strbuf *output,
+ struct expand_data *data)
+{
+ size_t len = end - next - 1;
+ char *p = (char *)next;
+ char old = p[len];
+
+ p[len] = '\0';
+ batch_one_object(next, output, opt, data);
+ p[len] = old;
+}
+
+static void parse_cmd_fflush(struct batch_options *opt,
+ const char *next, const char *end,
+ struct strbuf *output,
+ struct expand_data *data)
+{
+ if (*next != line_termination)
+ die("fflush: extra input: %s", next);
+ fflush(stdout);
+}
+
+static const struct parse_cmd {
+ const char *prefix;
+ void (*fn)(struct batch_options *, const char *, const char *, struct strbuf *, struct expand_data *);
+ unsigned args;
+ enum batch_state state;
+} command[] = {
+ { "object", parse_cmd_object, 1, BATCH_STATE_OPEN },
+ { "fflush", parse_cmd_fflush, 0, BATCH_STATE_OPEN },
+};
+
+static void batch_objects_stdin_cmd(struct batch_options *opt,
+ struct strbuf *output,
+ struct expand_data *data)
+{
+ struct strbuf input = STRBUF_INIT;
+ enum batch_state state = BATCH_STATE_OPEN;
+
+ /* Read each line dispatch its command */
+ while (!strbuf_getwholeline(&input, stdin, line_termination)) {
+ size_t i, j;
+ const struct parse_cmd *cmd = NULL;
+
+ if (*input.buf == line_termination)
+ die("empty command in input");
+ else if (isspace(*input.buf))
+ die("whitespace before command: %s", input.buf);
+
+ for (i = 0; i < ARRAY_SIZE(command); i++) {
+ const char *prefix = command[i].prefix;
+ char c;
+
+ if (!starts_with(input.buf, prefix))
+ continue;
+
+ /*
+ * If the command has arguments, verify that it's
+ * followed by a space. Otherwise, it shall be followed
+ * by a line terminator.
+ */
+ c = command[i].args ? ' ' : line_termination;
+ if (input.buf[strlen(prefix)] != c)
+ continue;
+
+ cmd = &command[i];
+ break;
+ }
+ if (!cmd)
+ die("unknown command: %s", input.buf);
+
+ /*
+ * Read additional arguments if NUL-terminated. Do not raise an
+ * error in case there is an early EOF to let the command
+ * handle missing arguments with a proper error message.
+ */
+ for (j = 1; line_termination == '\0' && j < cmd->args; j++)
+ if (strbuf_appendwholeline(&input, stdin, line_termination))
+ break;
+
+ switch (state) {
+ case BATCH_STATE_OPEN:
+ /* TODO: command state management */
+ break;
+ }
+
+ cmd->fn(opt, input.buf + strlen(cmd->prefix) + !!cmd->args,
+ input.buf + input.len, output, data);
+ }
+
+ strbuf_release(&input);
+}
+
static int batch_objects(struct batch_options *opt)
{
struct strbuf input = STRBUF_INIT;
@@ -514,6 +617,7 @@ static int batch_objects(struct batch_options *opt)
struct expand_data data;
int save_warning;
int retval = 0;
+ const int stdin_cmd = opt->stdin_cmd;
if (!opt->format)
opt->format = "%(objectname) %(objecttype) %(objectsize)";
@@ -589,7 +693,8 @@ static int batch_objects(struct batch_options *opt)
save_warning = warn_on_object_refname_ambiguity;
warn_on_object_refname_ambiguity = 0;
- while (strbuf_getline(&input, stdin) != EOF) {
+ while (!stdin_cmd &&
+ strbuf_getline(&input, stdin) != EOF) {
if (data.split_on_whitespace) {
/*
* Split at first whitespace, tying off the beginning
@@ -607,6 +712,9 @@ static int batch_objects(struct batch_options *opt)
batch_one_object(input.buf, &output, opt, &data);
}
+ if (stdin_cmd)
+ batch_objects_stdin_cmd(opt, &output, &data);
+
strbuf_release(&input);
strbuf_release(&output);
warn_on_object_refname_ambiguity = save_warning;
@@ -684,6 +792,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
batch_option_callback),
OPT_CMDMODE(0, "batch-all-objects", &opt,
N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
+ OPT_BOOL(0, "stdin-cmd", &batch.stdin_cmd,
+ N_("with --batch[-check]: enters stdin 'command mode")),
+ OPT_BOOL('z', NULL, &batch.end_null, N_("with --stdin-cmd, use NUL termination")),
+
/* Batch-specific options */
OPT_GROUP(N_("Change or optimize batch output")),
OPT_BOOL(0, "buffer", &batch.buffer_output, N_("buffer --batch output")),
@@ -737,6 +849,8 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
/* Batch defaults */
if (batch.buffer_output < 0)
batch.buffer_output = batch.all_objects;
+ if (batch.end_null)
+ line_termination = '\0';
/* Return early if we're in batch mode? */
if (batch.enabled) {
--
2.34.0.rc1.741.gab7bfd97031
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, John Cai wrote (reply to this):
O Sat, Nov 06, 2021 at 05:01:10AM +0100, �var Arnfj�r� Bjarmason wrote:
>
> On Fri, Nov 05 2021, Junio C Hamano wrote:
>
> > "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> >> @@ -405,6 +405,11 @@ static void batch_one_object(const char *obj_name,
> >> int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0;
> >> enum get_oid_result result;
> >>
> >> + if (opt->buffer_output && obj_name[0] == '\0') {
> >> + fflush(stdout);
> >> + return;
> >> + }
> >> +
> >
> > This might work in practice, but it a bad design taste to add this
> > change here. The function is designed to take an object name
> > string, and it even prepares a flag variable needed to make a call
> > to turn that object name into object data. We do not need to
> > contaminate the interface with "usually this takes an object name,
> > but there are these other special cases ...". The higher in the
> > callchain we place special cases, the better the lower level
> > functions become, as that allows them to concentrate on doing one
> > single thing well.
> >
> >> result = get_oid_with_context(the_repository, obj_name,
> >> flags, &data->oid, &ctx);
> >> if (result != FOUND) {
> >> @@ -609,7 +614,11 @@ static int batch_objects(struct batch_options *opt)
> >> data.rest = p;
> >> }
> >>
> >> - batch_one_object(input.buf, &output, opt, &data);
> >> + /*
> >> + * When in buffer mode and input.buf is an empty string,
> >> + * flush to stdout.
> >> + */
> >
> > Checking "do we have the flush instruction (in which case we'd do
> > the flush here), or do we have textual name of an object (in which
> > case we'd call batch_one_object())?" here would be far cleaner and
> > results in an easier-to-explain code. With a cleanly written code
> > to do so, it probably does not even need a new comment here.
> >
> > This brings up another issue. Is "flushing" the *ONLY* special
> > thing we would ever do in this codepath in the future? I doubt so.
> > Squatting on an "empty string" is a selfish design that hurts those
> > who will come after you in the future, as they need to find other
> > ways to ask for a "special thing".
> >
> > If we are inventing a special syntax that allows us to spell
> > commands that are distinguishable from a validly-spelled object name
> > to cause something special (like "flushing the output stream"),
> > perhaps we want to use a bit more extensible and explicit syntax and
> > use it from day one?
> >
> > For example, if no string that begins with three dots can ever be a
> > valid way to spell an object name, perhaps "...flush" might be a
> > better "please do this special thing" syntax than an empty string.
> > It is easily extensible (the next special thing can follow suit to
> > say "...$verb" to tell the machinery to $verb the input). When we
> > compare between an empty string and "...flush", the latter clearly
> > is more descriptive, too.
> >
> > Note that I offhand do not know if "a valid string that name an
> > object would never begin with three-dot" is true. Please check
> > if that is true if you choose to use it, or you can find and use
> > another convention that allows us to clearly distinguish the
> > "special" instruction and object names.
>
> I had much the same thought, this is a useful feature, but let's not
> squat on the one bit of open syntax we have.
>
> John: I think a better direction here is to add a mode to cat-file to
> emulate what "git update-ref --stdin" supports. Here's a demo of that
> (also quoted below):
> https://github.com/git/git/commit/7794f6cfdbdca0dd6bab0dea16193ebf018b86a9
>
> That's on top of some general UI improvements to cat-file I've got
> locally:
> https://github.com/git/git/compare/master...avar:avar/cat-file-usage-and-options-handling
>
> That WIP patch on top follows below, of course it's a *lot* more initial
> scaffolding, but I think once we get past that initial step it's a much
> better path forward. As noted the code is also almost entirely
> copy/pasted from update-ref.c, and perhaps some of the shared parts
> could be moved to some library both could use.
>
> I couldn't think of a better name than --stdin-cmd, suggestions most
> welcome.
>
> From 7794f6cfdbdca0dd6bab0dea16193ebf018b86a9 Mon Sep 17 00:00:00 2001
> Message-Id: <patch-1.1-7794f6cfdbd-20211106T040307Z-avarab@gmail.com>
> From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
> <avarab@gmail.com>
> Date: Sat, 6 Nov 2021 04:54:04 +0100
> Subject: [PATCH] WIP cat-file: add a --stdin-cmd mode
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> This WIP patch is mostly stealing code from builtin/update-ref.c and
> implementing the same sort of prefixed command-mode that it
> supports. I.e. in addition to --batch now supporting:
>
> <object> LF
>
> It'll support with --stdin-cmd, with and without -z, respectively:
>
> object <object> NL
> object <object> NUL
>
> The plus being that we can now implement additional commands. Let's
> start that by scratching the itch John Cai wanted to address in [1]
> and implement a (with and without -z):
>
> fflush NL
> fflush NUL
>
> That command simply calls fflush(stdout), which could be done as an
> emergent effect before by feeding the input a "NL".
>
> I think this will be useful for other things, e.g. I've observed in
> the past that a not-trivial part of "cat-file --batch" time is spent
> on parsing its <object> argument and seeing if it's a revision, ref
> etc.
>
> So we could e.g. add a command that only accepts a full-length 40
> character SHA-1, or switch the --format output mid-request etc.
>
> 1. https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@gmail.com/
>
> Signed-off-by: �var Arnfj�r� Bjarmason <avarab@gmail.com>
> ---
> builtin/cat-file.c | 116 ++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 115 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index b76f2a00046..afdb976c6e7 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -26,7 +26,10 @@ struct batch_options {
> int unordered;
> int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
> const char *format;
> + int stdin_cmd;
> + int end_null;
> };
> +static char line_termination = '\n';
>
> static const char *force_path;
>
> @@ -507,6 +510,106 @@ static int batch_unordered_packed(const struct object_id *oid,
> data);
> }
>
> +enum batch_state {
> + /* Non-transactional state open for commands. */
> + BATCH_STATE_OPEN,
> +};
> +
> +static void parse_cmd_object(struct batch_options *opt,
> + const char *next, const char *end,
> + struct strbuf *output,
> + struct expand_data *data)
> +{
> + size_t len = end - next - 1;
> + char *p = (char *)next;
> + char old = p[len];
> +
> + p[len] = '\0';
> + batch_one_object(next, output, opt, data);
> + p[len] = old;
> +}
> +
> +static void parse_cmd_fflush(struct batch_options *opt,
> + const char *next, const char *end,
> + struct strbuf *output,
> + struct expand_data *data)
> +{
> + if (*next != line_termination)
> + die("fflush: extra input: %s", next);
> + fflush(stdout);
> +}
> +
> +static const struct parse_cmd {
> + const char *prefix;
> + void (*fn)(struct batch_options *, const char *, const char *, struct strbuf *, struct expand_data *);
> + unsigned args;
> + enum batch_state state;
> +} command[] = {
> + { "object", parse_cmd_object, 1, BATCH_STATE_OPEN },
> + { "fflush", parse_cmd_fflush, 0, BATCH_STATE_OPEN },
> +};
I think overall this approach is cleaner and makes sense. My only
question is, are there more commands in the future that will need some
special command syntax? Just wondering whether YAGNI applies here.
> +
> +static void batch_objects_stdin_cmd(struct batch_options *opt,
> + struct strbuf *output,
> + struct expand_data *data)
> +{
> + struct strbuf input = STRBUF_INIT;
> + enum batch_state state = BATCH_STATE_OPEN;
> +
> + /* Read each line dispatch its command */
> + while (!strbuf_getwholeline(&input, stdin, line_termination)) {
> + size_t i, j;
> + const struct parse_cmd *cmd = NULL;
> +
> + if (*input.buf == line_termination)
> + die("empty command in input");
> + else if (isspace(*input.buf))
> + die("whitespace before command: %s", input.buf);
> +
> + for (i = 0; i < ARRAY_SIZE(command); i++) {
> + const char *prefix = command[i].prefix;
> + char c;
> +
> + if (!starts_with(input.buf, prefix))
> + continue;
> +
> + /*
> + * If the command has arguments, verify that it's
> + * followed by a space. Otherwise, it shall be followed
> + * by a line terminator.
> + */
> + c = command[i].args ? ' ' : line_termination;
> + if (input.buf[strlen(prefix)] != c)
> + continue;
> +
> + cmd = &command[i];
> + break;
> + }
> + if (!cmd)
> + die("unknown command: %s", input.buf);
> +
> + /*
> + * Read additional arguments if NUL-terminated. Do not raise an
> + * error in case there is an early EOF to let the command
> + * handle missing arguments with a proper error message.
> + */
> + for (j = 1; line_termination == '\0' && j < cmd->args; j++)
> + if (strbuf_appendwholeline(&input, stdin, line_termination))
> + break;
> +
> + switch (state) {
> + case BATCH_STATE_OPEN:
> + /* TODO: command state management */
> + break;
> + }
> +
> + cmd->fn(opt, input.buf + strlen(cmd->prefix) + !!cmd->args,
> + input.buf + input.len, output, data);
> + }
> +
> + strbuf_release(&input);
> +}
> +
> static int batch_objects(struct batch_options *opt)
> {
> struct strbuf input = STRBUF_INIT;
> @@ -514,6 +617,7 @@ static int batch_objects(struct batch_options *opt)
> struct expand_data data;
> int save_warning;
> int retval = 0;
> + const int stdin_cmd = opt->stdin_cmd;
>
> if (!opt->format)
> opt->format = "%(objectname) %(objecttype) %(objectsize)";
> @@ -589,7 +693,8 @@ static int batch_objects(struct batch_options *opt)
> save_warning = warn_on_object_refname_ambiguity;
> warn_on_object_refname_ambiguity = 0;
>
> - while (strbuf_getline(&input, stdin) != EOF) {
> + while (!stdin_cmd &&
> + strbuf_getline(&input, stdin) != EOF) {
> if (data.split_on_whitespace) {
> /*
> * Split at first whitespace, tying off the beginning
> @@ -607,6 +712,9 @@ static int batch_objects(struct batch_options *opt)
> batch_one_object(input.buf, &output, opt, &data);
> }
>
> + if (stdin_cmd)
> + batch_objects_stdin_cmd(opt, &output, &data);
> +
> strbuf_release(&input);
> strbuf_release(&output);
> warn_on_object_refname_ambiguity = save_warning;
> @@ -684,6 +792,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
> batch_option_callback),
> OPT_CMDMODE(0, "batch-all-objects", &opt,
> N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
> + OPT_BOOL(0, "stdin-cmd", &batch.stdin_cmd,
> + N_("with --batch[-check]: enters stdin 'command mode")),
> + OPT_BOOL('z', NULL, &batch.end_null, N_("with --stdin-cmd, use NUL termination")),
> +
> /* Batch-specific options */
> OPT_GROUP(N_("Change or optimize batch output")),
> OPT_BOOL(0, "buffer", &batch.buffer_output, N_("buffer --batch output")),
> @@ -737,6 +849,8 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
> /* Batch defaults */
> if (batch.buffer_output < 0)
> batch.buffer_output = batch.all_objects;
> + if (batch.end_null)
> + line_termination = '\0';
>
> /* Return early if we're in batch mode? */
> if (batch.enabled) {
> --
> 2.34.0.rc1.741.gab7bfd97031
>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):
On Sun, Nov 07 2021, John Cai wrote:
> O Sat, Nov 06, 2021 at 05:01:10AM +0100, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Fri, Nov 05 2021, Junio C Hamano wrote:
>>
>> > "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> >
>> >> @@ -405,6 +405,11 @@ static void batch_one_object(const char *obj_name,
>> >> int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0;
>> >> enum get_oid_result result;
>> >>
>> >> + if (opt->buffer_output && obj_name[0] == '\0') {
>> >> + fflush(stdout);
>> >> + return;
>> >> + }
>> >> +
>> >
>> > This might work in practice, but it a bad design taste to add this
>> > change here. The function is designed to take an object name
>> > string, and it even prepares a flag variable needed to make a call
>> > to turn that object name into object data. We do not need to
>> > contaminate the interface with "usually this takes an object name,
>> > but there are these other special cases ...". The higher in the
>> > callchain we place special cases, the better the lower level
>> > functions become, as that allows them to concentrate on doing one
>> > single thing well.
>> >
>> >> result = get_oid_with_context(the_repository, obj_name,
>> >> flags, &data->oid, &ctx);
>> >> if (result != FOUND) {
>> >> @@ -609,7 +614,11 @@ static int batch_objects(struct batch_options *opt)
>> >> data.rest = p;
>> >> }
>> >>
>> >> - batch_one_object(input.buf, &output, opt, &data);
>> >> + /*
>> >> + * When in buffer mode and input.buf is an empty string,
>> >> + * flush to stdout.
>> >> + */
>> >
>> > Checking "do we have the flush instruction (in which case we'd do
>> > the flush here), or do we have textual name of an object (in which
>> > case we'd call batch_one_object())?" here would be far cleaner and
>> > results in an easier-to-explain code. With a cleanly written code
>> > to do so, it probably does not even need a new comment here.
>> >
>> > This brings up another issue. Is "flushing" the *ONLY* special
>> > thing we would ever do in this codepath in the future? I doubt so.
>> > Squatting on an "empty string" is a selfish design that hurts those
>> > who will come after you in the future, as they need to find other
>> > ways to ask for a "special thing".
>> >
>> > If we are inventing a special syntax that allows us to spell
>> > commands that are distinguishable from a validly-spelled object name
>> > to cause something special (like "flushing the output stream"),
>> > perhaps we want to use a bit more extensible and explicit syntax and
>> > use it from day one?
>> >
>> > For example, if no string that begins with three dots can ever be a
>> > valid way to spell an object name, perhaps "...flush" might be a
>> > better "please do this special thing" syntax than an empty string.
>> > It is easily extensible (the next special thing can follow suit to
>> > say "...$verb" to tell the machinery to $verb the input). When we
>> > compare between an empty string and "...flush", the latter clearly
>> > is more descriptive, too.
>> >
>> > Note that I offhand do not know if "a valid string that name an
>> > object would never begin with three-dot" is true. Please check
>> > if that is true if you choose to use it, or you can find and use
>> > another convention that allows us to clearly distinguish the
>> > "special" instruction and object names.
>>
>> I had much the same thought, this is a useful feature, but let's not
>> squat on the one bit of open syntax we have.
>>
>> John: I think a better direction here is to add a mode to cat-file to
>> emulate what "git update-ref --stdin" supports. Here's a demo of that
>> (also quoted below):
>> https://github.com/git/git/commit/7794f6cfdbdca0dd6bab0dea16193ebf018b86a9
>>
>> That's on top of some general UI improvements to cat-file I've got
>> locally:
>> https://github.com/git/git/compare/master...avar:avar/cat-file-usage-and-options-handling
>>
>> That WIP patch on top follows below, of course it's a *lot* more initial
>> scaffolding, but I think once we get past that initial step it's a much
>> better path forward. As noted the code is also almost entirely
>> copy/pasted from update-ref.c, and perhaps some of the shared parts
>> could be moved to some library both could use.
>>
>> I couldn't think of a better name than --stdin-cmd, suggestions most
>> welcome.
>>
>> From 7794f6cfdbdca0dd6bab0dea16193ebf018b86a9 Mon Sep 17 00:00:00 2001
>> Message-Id: <patch-1.1-7794f6cfdbd-20211106T040307Z-avarab@gmail.com>
>> From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
>> <avarab@gmail.com>
>> Date: Sat, 6 Nov 2021 04:54:04 +0100
>> Subject: [PATCH] WIP cat-file: add a --stdin-cmd mode
>> MIME-Version: 1.0
>> Content-Type: text/plain; charset=UTF-8
>> Content-Transfer-Encoding: 8bit
>>
>> This WIP patch is mostly stealing code from builtin/update-ref.c and
>> implementing the same sort of prefixed command-mode that it
>> supports. I.e. in addition to --batch now supporting:
>>
>> <object> LF
>>
>> It'll support with --stdin-cmd, with and without -z, respectively:
>>
>> object <object> NL
>> object <object> NUL
>>
>> The plus being that we can now implement additional commands. Let's
>> start that by scratching the itch John Cai wanted to address in [1]
>> and implement a (with and without -z):
>>
>> fflush NL
>> fflush NUL
>>
>> That command simply calls fflush(stdout), which could be done as an
>> emergent effect before by feeding the input a "NL".
>>
>> I think this will be useful for other things, e.g. I've observed in
>> the past that a not-trivial part of "cat-file --batch" time is spent
>> on parsing its <object> argument and seeing if it's a revision, ref
>> etc.
>>
>> So we could e.g. add a command that only accepts a full-length 40
>> character SHA-1, or switch the --format output mid-request etc.
>>
>> 1. https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@gmail.com/
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>> builtin/cat-file.c | 116 ++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 115 insertions(+), 1 deletion(-)
>>
>> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
>> index b76f2a00046..afdb976c6e7 100644
>> --- a/builtin/cat-file.c
>> +++ b/builtin/cat-file.c
>> @@ -26,7 +26,10 @@ struct batch_options {
>> int unordered;
>> int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
>> const char *format;
>> + int stdin_cmd;
>> + int end_null;
>> };
>> +static char line_termination = '\n';
>>
>> static const char *force_path;
>>
>> @@ -507,6 +510,106 @@ static int batch_unordered_packed(const struct object_id *oid,
>> data);
>> }
>>
>> +enum batch_state {
>> + /* Non-transactional state open for commands. */
>> + BATCH_STATE_OPEN,
>> +};
>> +
>> +static void parse_cmd_object(struct batch_options *opt,
>> + const char *next, const char *end,
>> + struct strbuf *output,
>> + struct expand_data *data)
>> +{
>> + size_t len = end - next - 1;
>> + char *p = (char *)next;
>> + char old = p[len];
>> +
>> + p[len] = '\0';
>> + batch_one_object(next, output, opt, data);
>> + p[len] = old;
>> +}
>> +
>> +static void parse_cmd_fflush(struct batch_options *opt,
>> + const char *next, const char *end,
>> + struct strbuf *output,
>> + struct expand_data *data)
>> +{
>> + if (*next != line_termination)
>> + die("fflush: extra input: %s", next);
>> + fflush(stdout);
>> +}
>> +
>> +static const struct parse_cmd {
>> + const char *prefix;
>> + void (*fn)(struct batch_options *, const char *, const char *, struct strbuf *, struct expand_data *);
>> + unsigned args;
>> + enum batch_state state;
>> +} command[] = {
>> + { "object", parse_cmd_object, 1, BATCH_STATE_OPEN },
>> + { "fflush", parse_cmd_fflush, 0, BATCH_STATE_OPEN },
>> +};
> I think overall this approach is cleaner and makes sense. My only
> question is, are there more commands in the future that will need some
> special command syntax? Just wondering whether YAGNI applies here.
An obvious addition is to at least add the ability to set the various
options on the fly, i.e. now you need to use --batch-check, and then
kill it and restart if you'd like the content with --batch, ditto for
--textconv.
E.g. the gitaly backend for gitlab.com keeps two cat-filfe processes
around just to flip-flop between those two, sometimes you want the
content, sometimes you're just checking if the object exists.
I'd also like to add something to expose the likes of -e and -t
directly, i.e. even with --batch-check you often want to just check
existence, but get the size too, you could supply a format, but like the
above you sometimes want the size or whatever, and killing/starting a
new process just for that is a hassle...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>> I think overall this approach is cleaner and makes sense. My only
>> question is, are there more commands in the future that will need some
>> special command syntax? Just wondering whether YAGNI applies here.
>
> An obvious addition is to at least add the ability to set the various
> options on the fly, i.e. now you need to use --batch-check, and then
> kill it and restart if you'd like the content with --batch, ditto for
> --textconv.
>
> E.g. the gitaly backend for gitlab.com keeps two cat-filfe processes
> around just to flip-flop between those two, sometimes you want the
> content, sometimes you're just checking if the object exists.
>
> I'd also like to add something to expose the likes of -e and -t
> directly, i.e. even with --batch-check you often want to just check
> existence, but get the size too, you could supply a format, but like the
> above you sometimes want the size or whatever, and killing/starting a
> new process just for that is a hassle...
Yeah, with "plug" and "unplug" instruction you do not have to keep
issuing "flush" when you want to go interactive, and other things
become easy to do, so even though it would make it a bit more
verbose to require "object " prefix for the kind of lines that were
historically the only ones accepted by the command, I think it is a
good direction to go in.
Thanks.
User |
User |
User |
15d354c
to
377b9f9
Compare
When in --buffer mode, it is very useful for the caller to have control
over when the buffer is flushed. Currently there is no convenient way to
signal for the buffer to be flushed. One workaround is to provide any
nonexisting commit to git-cat-file's stdin, in which case the buffer
will be flushed and a "$FOO missing" message will be displayed. However,
this is not an ideal workaround.
Instead, this commit teaches git-cat-file to look for an empty string in
stdin, which will trigger a flush of stdout.
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: John Cai jcai@gitlab.com
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com