[GSOC][RFC] cat-file: reuse ref-filter logic #980

gitgitgadget · 2021-06-21T05:58:13Z

On the Git mailing list, Christian Couder wrote (reply to this):

On Sat, Jun 19, 2021 at 9:03 AM ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: ZheNing Hu <adlternative@gmail.com> > > In order to let cat-file use ref-filter logic, the following > methods are used: Maybe: s/the following methods are used/let's do the following/ > 1. Add `cat_file_mode` member in struct `ref_format`, this can > help us reject atoms in verify_ref_format() which cat-file > cannot use, e.g. `%(refname)`, `%(push)`, `%(upstream)`... > 2. Change the type of member `format` in struct `batch_options` > to `ref_format`, We can add format data in it. Not sure what "We can add format data in it." means. > 3. Let `batch_objects()` add atoms to format, and use > `verify_ref_format()` to check atoms. > 4. Use `has_object_file()` in `batch_one_object()` to check > whether the input object exists. > 5. Let get_object() return 1 and print "<oid> missing" instead > of returning -1 and printing "missing object <oid> for <refname>", > this can help `format_ref_array_item()` just report that the > object is missing without letting Git exit. > 6. Use `format_ref_array_item()` in `batch_object_write()` to > get the formatted data corresponding to the object. If the > return value of `format_ref_array_item()` is equals to zero, > use `batch_write()` to print object data; else if the return > value less than zero, use `die()` to print the error message > and exit; else return value greater than zero, only print the s/else return value greater/else if the return value is greater/ > error message, but not exit. s/not exit/don't exit/ > 7. Use free_ref_array_item_value() to free ref_array_item's > value. That looks like a lot of changes in a single commit. I wonder if this commit could be split. > Most of the atoms in `for-each-ref --format` are now supported, > such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`, > `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected: > `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`, > `%(flag)`, `%(HEAD)`, because our objects don't have refname. s/refname/a refname/ It might be worth talking a bit about possible performance changes. [...] > + ret = format_ref_array_item(&item, &opt->format, scratch, &err); > + if (!ret) { > + strbuf_addch(scratch, '\n'); > + batch_write(opt, scratch->buf, scratch->len); > + } else if (ret < 0) { > + die("%s\n", err.buf); This if (ret < 0) could be checked first. > + } else { > + /* when ret > 0 , don't call die and print the err to stdout*/ I think it would be more helpful to tell what ret > 0 means, rather than what we do below (which can easily be seen). > + printf("%s\n", err.buf); > + fflush(stdout); > } For example: if (ret < 0) { die("%s\n", err.buf); if (ret) { /* ret > 0 means ... */ printf("%s\n", err.buf); fflush(stdout); } else { strbuf_addch(scratch, '\n'); batch_write(opt, scratch->buf, scratch->len); } > + free_ref_array_item_value(&item); > + strbuf_release(&err); > } [...] > +static int batch_objects(struct batch_options *opt, const struct option *options) It's unfortunate that one argument is called "opt" and the other one "options". I wonder if the first one could be called "batch" as it seems to be called this way somewhere else. > + if (!opt->format.format) > + strbuf_addstr(&format, "%(objectname) %(objecttype) %(objectsize)"); > + else > + strbuf_addstr(&format, opt->format.format); If there is no reason for the condition to be (!X) over just (X), I think the latter is a bit better. > if (opt->print_contents) > - data.info.typep = &data.type; > + strbuf_addstr(&format, "\n%(raw)"); > + opt->format.format = format.buf; I wonder if this should be: opt->format.format = strbuf_detach(&format, NULL); > + if (verify_ref_format(&opt->format)) > + usage_with_options(cat_file_usage, options); [...] > @@ -86,7 +87,7 @@ struct ref_format { > int need_color_reset_at_eol; > }; > > -#define REF_FORMAT_INIT { NULL, NULL, 0, 0, -1 } > +#define REF_FORMAT_INIT { NULL, NULL, 0, 0, 0, -1 } Maybe this can already be changed to a designated initializer, like Ævar suggested recently. [...] > +test_expect_success 'cat-file refs/heads/main refs/tags/testtag %(rest)' ' If this test is about checking that %(rest) works with both a branch and a tag, it might be better to say it more explicitly. > + cat >expected <<-EOF && > + 123 commit 123 > + 456 tag 456 > + EOF > + git cat-file --batch-check="%(rest) %(objecttype) %(rest)" >actual <<-EOF && > + refs/heads/main 123 > + refs/tags/testtag 456 > + EOF > + test_cmp expected actual > +'

On the Git mailing list, ZheNing Hu wrote (reply to this):

Christian Couder <christian.couder@gmail.com> 于2021年6月21日周一下午1:55写道： > > > 1. Add `cat_file_mode` member in struct `ref_format`, this can > > help us reject atoms in verify_ref_format() which cat-file > > cannot use, e.g. `%(refname)`, `%(push)`, `%(upstream)`... > > 2. Change the type of member `format` in struct `batch_options` > > to `ref_format`, We can add format data in it. > > Not sure what "We can add format data in it." means. Well, there is something wrong with the expression here. What I want to express is that we can fill its member "format" with the atoms like "%(objectname) %(refname)", and then pass it to the ref-filter. > > 7. Use free_ref_array_item_value() to free ref_array_item's > > value. > > That looks like a lot of changes in a single commit. I wonder if this > commit could be split. > Yeah, But I don’t know if I should take it apart step by step, If taken apart, those intermediate commits are likely to fail the test. > > Most of the atoms in `for-each-ref --format` are now supported, > > such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`, > > `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected: > > `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`, > > `%(flag)`, `%(HEAD)`, because our objects don't have refname. > > s/refname/a refname/ > > It might be worth talking a bit about possible performance changes. > Makes sense. The performance has indeed deteriorated. > [...] > > > + ret = format_ref_array_item(&item, &opt->format, scratch, &err); > > + if (!ret) { > > + strbuf_addch(scratch, '\n'); > > + batch_write(opt, scratch->buf, scratch->len); > > + } else if (ret < 0) { > > + die("%s\n", err.buf); > > This if (ret < 0) could be checked first. Yes, it is better to put error checking first. > > > + } else { > > + /* when ret > 0 , don't call die and print the err to stdout*/ > > I think it would be more helpful to tell what ret > 0 means, rather > than what we do below (which can easily be seen). > Ah, There is indeed only one situation for ret > 0 for the time being: Show "<oid> missing" without exiting Git. > > + printf("%s\n", err.buf); > > + fflush(stdout); > > } > > For example: > > if (ret < 0) { > die("%s\n", err.buf); > if (ret) { > /* ret > 0 means ... */ > printf("%s\n", err.buf); > fflush(stdout); > } else { > strbuf_addch(scratch, '\n'); > batch_write(opt, scratch->buf, scratch->len); > } > Yes, this might be better. > > + free_ref_array_item_value(&item); > > + strbuf_release(&err); > > } > > [...] > > > +static int batch_objects(struct batch_options *opt, const struct option *options) > > It's unfortunate that one argument is called "opt" and the other one > "options". I wonder if the first one could be called "batch" as it > seems to be called this way somewhere else. > OK. > > + if (!opt->format.format) > > + strbuf_addstr(&format, "%(objectname) %(objecttype) %(objectsize)"); > > + else > > + strbuf_addstr(&format, opt->format.format); > > If there is no reason for the condition to be (!X) over just (X), I > think the latter is a bit better. > Although I think it doesn’t matter which one to use first, I will still follow your suggestions. > > if (opt->print_contents) > > - data.info.typep = &data.type; > > + strbuf_addstr(&format, "\n%(raw)"); > > + opt->format.format = format.buf; > > I wonder if this should be: > > opt->format.format = strbuf_detach(&format, NULL); > No. here our opt->format.format will not be changed, it would be better for us to use `strbuf_release(&format)` for resource recovery. (strbuf_detach() will forced to let us free opt->format.format) > > + if (verify_ref_format(&opt->format)) > > + usage_with_options(cat_file_usage, options); > > [...] > > > @@ -86,7 +87,7 @@ struct ref_format { > > int need_color_reset_at_eol; > > }; > > > > -#define REF_FORMAT_INIT { NULL, NULL, 0, 0, -1 } > > +#define REF_FORMAT_INIT { NULL, NULL, 0, 0, 0, -1 } > > Maybe this can already be changed to a designated initializer, like > Ævar suggested recently. > I agree. > [...] > > > +test_expect_success 'cat-file refs/heads/main refs/tags/testtag %(rest)' ' > > If this test is about checking that %(rest) works with both a branch > and a tag, it might be better to say it more explicitly. > OK. Thanks. -- ZheNing Hu

gitgitgadget · 2021-06-26T17:30:06Z

On the Git mailing list, Hariom verma wrote (reply to this):

Hi, On Fri, Jun 25, 2021 at 9:32 PM ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: ZheNing Hu <adlternative@gmail.com> > > + if (ret < 0) { > + die("%s\n", err.buf); > + } if (ret) { > + /* ret > 0 means when the object corresponding to oid > + * cannot be found in format_ref_array_item(), we only print > + * the error message. > + */ > + printf("%s\n", err.buf); > + fflush(stdout); > + } else { > + strbuf_addch(scratch, '\n'); > + batch_write(opt, scratch->buf, scratch->len); > } > + free_ref_array_item_value(&item); > + strbuf_release(&err); > } I think you can get rid of braces in condition `ret < 0`: ``` if (ret < 0) die("%s\n", err->buf); if (ret) { /* ret > 0 means when the object corresponding to oid * cannot be found in format_ref_array_item(), we only print * the error message. */ printf("%s\n", err->buf); fflush(stdout); } else { strbuf_addch(scratch, '\n'); batch_write(opt, scratch->buf, scratch->len); } ``` Thanks, Hariom.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Hariom verma <hariom18599@gmail.com> 于2021年6月27日周日上午1:27写道： > > Hi, > > On Fri, Jun 25, 2021 at 9:32 PM ZheNing Hu via GitGitGadget > <gitgitgadget@gmail.com> wrote: > > > > From: ZheNing Hu <adlternative@gmail.com> > > > > + if (ret < 0) { > > + die("%s\n", err.buf); > > + } if (ret) { > > + /* ret > 0 means when the object corresponding to oid > > + * cannot be found in format_ref_array_item(), we only print > > + * the error message. > > + */ > > + printf("%s\n", err.buf); > > + fflush(stdout); > > + } else { > > + strbuf_addch(scratch, '\n'); > > + batch_write(opt, scratch->buf, scratch->len); > > } > > + free_ref_array_item_value(&item); > > + strbuf_release(&err); > > } > > I think you can get rid of braces in condition `ret < 0`: > Make sences. ;-) > ``` > if (ret < 0) > die("%s\n", err->buf); > if (ret) { > /* ret > 0 means when the object corresponding to oid > * cannot be found in format_ref_array_item(), we only print > * the error message. > */ > printf("%s\n", err->buf); > fflush(stdout); > } else { > strbuf_addch(scratch, '\n'); > batch_write(opt, scratch->buf, scratch->len); > } > ``` > > Thanks, > Hariom. Thanks, ZheNing Hu

gitgitgadget · 2021-06-26T18:11:44Z

On the Git mailing list, Hariom verma wrote (reply to this):

On Fri, Jun 25, 2021 at 9:32 PM ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: ZheNing Hu <adlternative@gmail.com> > > static void batch_object_write(const char *obj_name, > struct strbuf *scratch, > struct batch_options *opt, > struct expand_data *data) > { > - if (!data->skip_object_info && > - oid_object_info_extended(the_repository, &data->oid, &data->info, > - OBJECT_INFO_LOOKUP_REPLACE) < 0) { > - printf("%s missing\n", > - obj_name ? obj_name : oid_to_hex(&data->oid)); > - fflush(stdout); > - return; > - } > + int ret = 0; No need to initialize `ret` with 0. Later we are going to assign it with the return value of `format_ref_array_item()` anyway. > + struct strbuf err = STRBUF_INIT; > + struct ref_array_item item = { data->oid, data->rest }; > > strbuf_reset(scratch); > - strbuf_expand(scratch, opt->format, expand_format, data); > - strbuf_addch(scratch, '\n'); > - batch_write(opt, scratch->buf, scratch->len); > > - if (opt->print_contents) { > - print_object_or_die(opt, data); > - batch_write(opt, "\n", 1); > + ret = format_ref_array_item(&item, &opt->format, scratch, &err); Here. -- Hariom

On the Git mailing list, ZheNing Hu wrote (reply to this):

Hariom verma <hariom18599@gmail.com> 于2021年6月27日周日上午2:08写道： > > On Fri, Jun 25, 2021 at 9:32 PM ZheNing Hu via GitGitGadget > <gitgitgadget@gmail.com> wrote: > > > > From: ZheNing Hu <adlternative@gmail.com> > > > > static void batch_object_write(const char *obj_name, > > struct strbuf *scratch, > > struct batch_options *opt, > > struct expand_data *data) > > { > > - if (!data->skip_object_info && > > - oid_object_info_extended(the_repository, &data->oid, &data->info, > > - OBJECT_INFO_LOOKUP_REPLACE) < 0) { > > - printf("%s missing\n", > > - obj_name ? obj_name : oid_to_hex(&data->oid)); > > - fflush(stdout); > > - return; > > - } > > + int ret = 0; > > No need to initialize `ret` with 0. Later we are going to assign it > with the return value of `format_ref_array_item()` anyway. > I agree. It is worth noting that there are similar `int ret = 0` in ref-filter.c, they should be changed too. > > + struct strbuf err = STRBUF_INIT; > > + struct ref_array_item item = { data->oid, data->rest }; > > > > strbuf_reset(scratch); > > - strbuf_expand(scratch, opt->format, expand_format, data); > > - strbuf_addch(scratch, '\n'); > > - batch_write(opt, scratch->buf, scratch->len); > > > > - if (opt->print_contents) { > > - print_object_or_die(opt, data); > > - batch_write(opt, "\n", 1); > > + ret = format_ref_array_item(&item, &opt->format, scratch, &err); > > Here. > > -- > Hariom -- ZheNing Hu

gitgitgadget · 2021-06-28T07:49:26Z

On the Git mailing list, Hariom verma wrote (reply to this):

Hi, On Sun, Jun 27, 2021 at 6:06 PM ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: ZheNing Hu <adlternative@gmail.com> > > In order to let cat-file use ref-filter logic, let's do the > following: > > 1. Change the type of member `format` in struct `batch_options` > to `ref_format`, we will pass it to ref-filter later. > 2. Let `batch_objects()` add atoms to format, and use > `verify_ref_format()` to check atoms. > 3. Use `format_ref_array_item()` in `batch_object_write()` to > get the formatted data corresponding to the object. If the > return value of `format_ref_array_item()` is equals to zero, > use `batch_write()` to print object data; else if the return > value is less than zero, use `die()` to print the error message > and exit; else if return value is greater than zero, only print > the error message, but don't exit. > 4. Use free_ref_array_item_value() to free ref_array_item's > value. > > Most of the atoms in `for-each-ref --format` are now supported, > such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`, > `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected: > `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`, > `%(flag)`, `%(HEAD)`, because our objects don't have a refname. It's not clear why some atoms are rejected? Are we going to support them in later commits? (or sometime in the future) OR We are never going to support them. Because they make no sense to cat-file? (or whatever the reason) Whatever is the reason, I think it's a good idea to include it in the commit message. -- Hariom

On the Git mailing list, ZheNing Hu wrote (reply to this):

Hi, Hariom verma <hariom18599@gmail.com> 于2021年6月28日周一下午3:47写道： > > Hi, > > > > > Most of the atoms in `for-each-ref --format` are now supported, > > such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`, > > `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected: > > `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`, > > `%(flag)`, `%(HEAD)`, because our objects don't have a refname. > > It's not clear why some atoms are rejected? > > Are we going to support them in later commits? (or sometime in the future) > OR > We are never going to support them. Because they make no sense to > cat-file? (or whatever the reason) > Because in "git for-each-ref"'s "family", ref_array_item is generated by filter_refs(), which uses ref_filter_handler() to fill ref_array_item with ref's data. In "git cat-file", we care about the object, not the ref. Therefore, ref_array_item is only filled with {oid, rest} in batch_object_write() in cat-file.c. We cannot represent some specific ref-related data in "git cat-file", so we cannot have some atoms in ref_filter. Yes, we probably won't support them in the future. From an object-oriented point of view, the atom supported by "cat-file" should be a parent class, "for-each-ref"',"branch","tag"... they have more specific object details (ref), their supported atom should be a derived class, they can support more atoms. > Whatever is the reason, I think it's a good idea to include it in the > commit message. > Yeah. The sentence "because our objects don't have a refname." may not correctly express the reason for rejecting these atoms. I will add more descriptions. > -- > Hariom Thanks. -- ZheNing Hu

gitgitgadget · 2021-06-17T07:20:24Z

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Sat, Jun 12 2021, ZheNing Hu via GitGitGadget wrote: > From: ZheNing Hu <adlternative@gmail.com> > > Add new formatting option `%(raw)`, which will print the raw > object data without any changes. It will help further to migrate > all cat-file formatting logic from cat-file to ref-filter. Nice goal and feature to have. > The raw data of blob, tree objects may contain '\0', but most of > the logic in `ref-filter` depends on the output of the atom being > text (specifically, no embedded NULs in it). > > E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()` > add the data to the buffer. The raw data of a tree object is > `100644 one\0...`, only the `100644 one` will be added to the buffer, > which is incorrect. > > Therefore, add a new member in `struct atom_value`: `s_size`, which > can record raw object size, it can help us add raw object data to > the buffer or compare two buffers which contain raw object data. Most of the functions that deal with this already use a strbuf in some way, before we had a const char *, now there's a size_t to go along with it, why not simply use a strbuf in the struct for the data? You'll then get the size and \0 handling for free, and any functions to deal with conversion can stick to the strbuf API, there seems to be a lot of back and forth now. > Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`, > `--tcl`, `--perl` because if our binary raw data is passed to a variable > in the host language, the host language may not support arbitrary binary > data in the variables of its string type. Perl at least deals with that just fine, and to the extent that it doesn't any new problems here would have nothing to do with \0 being in the data. Perl doesn't have a notion of "binary has \0 in it", it always supports \0, it has a notion of "is it utf-8 or not?", so any encoding problems wouldn't be new. I'd think that the same would be true of Python, but I'm not sure. > +test_expect_success 'basic atom: refs/tags/testtag *raw' ' > + git cat-file commit refs/tags/testtag^{} >expected && > + git for-each-ref --format="%(*raw)" refs/tags/testtag >actual && > + sanitize_pgp <expected >expected.clean && > + sanitize_pgp <actual >actual.clean && > + echo "" >>expected.clean && Just "echo" will do, ditto for the rest. Also odd to go back and forth between populating expected.clean & actual.clean. > +test_expect_success 'set up refs pointing to binary blob' ' > + printf "a\0b\0c" >blob1 && > + printf "a\0c\0b" >blob2 && > + printf "\0a\0b\0c" >blob3 && > + printf "abc" >blob4 && > + printf "\0 \0 \0 " >blob5 && > + printf "\0 \0a\0 " >blob6 && > + printf " " >blob7 && > + >blob8 && > + git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 && > + git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 && > + git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 && > + git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 && > + git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 && > + git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 && > + git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 && > + git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8 Hrm, xargs just to avoid: git update-ref ... $(git hash-object) ? > +test_expect_success '%(raw) with --python must failed' ' > + test_must_fail git for-each-ref --format="%(raw)" --python > +' > + > +test_expect_success '%(raw) with --tcl must failed' ' > + test_must_fail git for-each-ref --format="%(raw)" --tcl > +' > + > +test_expect_success '%(raw) with --perl must failed' ' > + test_must_fail git for-each-ref --format="%(raw)" --perl > +' > + > +test_expect_success '%(raw) with --shell must failed' ' > + test_must_fail git for-each-ref --format="%(raw)" --shell > +' > + > +test_expect_success '%(raw) with --shell and --sort=raw must failed' ' > + test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell > +' s/must failed/must fail/, but see question above about encoding in these languages...

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`, >> `--tcl`, `--perl` because if our binary raw data is passed to a variable >> in the host language, the host language may not support arbitrary binary >> data in the variables of its string type. > > Perl at least deals with that just fine, and to the extent that it > doesn't any new problems here would have nothing to do with \0 being in > the data. Perl doesn't have a notion of "binary has \0 in it", it always > supports \0, it has a notion of "is it utf-8 or not?", so any encoding > problems wouldn't be new. I'd think that the same would be true of > Python, but I'm not sure. During an earlier iteration long time ago, as we knew Perl is capable of handling sequence of bytes including NUL, it was decided to start more strict by rejecting binary for all languages, which can later be loosened, to limit the scope of the initial implementation. >> + git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 && >> + git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 && >> + git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 && >> + git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 && >> + git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 && >> + git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 && >> + git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 && >> + git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8 > > Hrm, xargs just to avoid: > > git update-ref ... $(git hash-object) ? That's horrible. Thanks for noticing. We'd want to catch segfaults from both hash-object and update-ref. One way to do so may be O=$(git hash-object -w blob1) && git update-ref refs/myblobs/blob1 "$O" Thanks for a review.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年6月17日周四下午3:16写道： > > > The raw data of blob, tree objects may contain '\0', but most of > > the logic in `ref-filter` depends on the output of the atom being > > text (specifically, no embedded NULs in it). > > > > E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()` > > add the data to the buffer. The raw data of a tree object is > > `100644 one\0...`, only the `100644 one` will be added to the buffer, > > which is incorrect. > > > > Therefore, add a new member in `struct atom_value`: `s_size`, which > > can record raw object size, it can help us add raw object data to > > the buffer or compare two buffers which contain raw object data. > > Most of the functions that deal with this already use a strbuf in some > way, before we had a const char *, now there's a size_t to go along with > it, why not simply use a strbuf in the struct for the data? You'll then > get the size and \0 handling for free, and any functions to deal with > conversion can stick to the strbuf API, there seems to be a lot of back > and forth now. > Yes, strbuf is a suitable choice when using <str,len> pair. But if replace v->s with strbuf, the possible changes will be larger. > > Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`, > > `--tcl`, `--perl` because if our binary raw data is passed to a variable > > in the host language, the host language may not support arbitrary binary > > data in the variables of its string type. > > Perl at least deals with that just fine, and to the extent that it > doesn't any new problems here would have nothing to do with \0 being in > the data. Perl doesn't have a notion of "binary has \0 in it", it always > supports \0, it has a notion of "is it utf-8 or not?", so any encoding > problems wouldn't be new. I'd think that the same would be true of > Python, but I'm not sure. > Not python safe. See [1]. Regarding the perl language, I support Junio's point of view: it can be re-supported in the future. > > > +test_expect_success 'basic atom: refs/tags/testtag *raw' ' > > + git cat-file commit refs/tags/testtag^{} >expected && > > + git for-each-ref --format="%(*raw)" refs/tags/testtag >actual && > > + sanitize_pgp <expected >expected.clean && > > + sanitize_pgp <actual >actual.clean && > > + echo "" >>expected.clean && > > Just "echo" will do, ditto for the rest. Also odd to go back and forth > between populating expected.clean & actual.clean. > Are you saying that sanitize_pgp is not needed? > > > +test_expect_success 'set up refs pointing to binary blob' ' > > + printf "a\0b\0c" >blob1 && > > + printf "a\0c\0b" >blob2 && > > + printf "\0a\0b\0c" >blob3 && > > + printf "abc" >blob4 && > > + printf "\0 \0 \0 " >blob5 && > > + printf "\0 \0a\0 " >blob6 && > > + printf " " >blob7 && > > + >blob8 && > > + git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 && > > + git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 && > > + git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 && > > + git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 && > > + git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 && > > + git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 && > > + git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 && > > + git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8 > > Hrm, xargs just to avoid: > > git update-ref ... $(git hash-object) ? > I didn’t think about it, just for convenience. > > +test_expect_success '%(raw) with --python must failed' ' > > + test_must_fail git for-each-ref --format="%(raw)" --python > > +' > > + > > +test_expect_success '%(raw) with --tcl must failed' ' > > + test_must_fail git for-each-ref --format="%(raw)" --tcl > > +' > > + > > +test_expect_success '%(raw) with --perl must failed' ' > > + test_must_fail git for-each-ref --format="%(raw)" --perl > > +' > > + > > +test_expect_success '%(raw) with --shell must failed' ' > > + test_must_fail git for-each-ref --format="%(raw)" --shell > > +' > > + > > +test_expect_success '%(raw) with --shell and --sort=raw must failed' ' > > + test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell > > +' > > s/must failed/must fail/, but see question above about encoding in these > languages... [1]: https://lore.kernel.org/git/CAOLTT8QR_GRm4TYk0E_eazQ+unVQODc-3L+b4V5JUN5jtZR8uA@mail.gmail.com/ Thanks for a review. -- ZheNing Hu

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Thu, Jun 17 2021, ZheNing Hu wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年6月17日周四下午3:16写道： >> >> > The raw data of blob, tree objects may contain '\0', but most of >> > the logic in `ref-filter` depends on the output of the atom being >> > text (specifically, no embedded NULs in it). >> > >> > E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()` >> > add the data to the buffer. The raw data of a tree object is >> > `100644 one\0...`, only the `100644 one` will be added to the buffer, >> > which is incorrect. >> > >> > Therefore, add a new member in `struct atom_value`: `s_size`, which >> > can record raw object size, it can help us add raw object data to >> > the buffer or compare two buffers which contain raw object data. >> >> Most of the functions that deal with this already use a strbuf in some >> way, before we had a const char *, now there's a size_t to go along with >> it, why not simply use a strbuf in the struct for the data? You'll then >> get the size and \0 handling for free, and any functions to deal with >> conversion can stick to the strbuf API, there seems to be a lot of back >> and forth now. >> > > Yes, strbuf is a suitable choice when using <str,len> pair. > But if replace v->s with strbuf, the possible changes will be larger. I for one would like to see it done that way, those changes are usually easy to read. Also it seems a large part of 2/8 is extra new code because we didn't do that, e.g. getting length differently if something is a strbuf or not, passing char*/size_t pairs to new functions etc. >> > Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`, >> > `--tcl`, `--perl` because if our binary raw data is passed to a variable >> > in the host language, the host language may not support arbitrary binary >> > data in the variables of its string type. >> >> Perl at least deals with that just fine, and to the extent that it >> doesn't any new problems here would have nothing to do with \0 being in >> the data. Perl doesn't have a notion of "binary has \0 in it", it always >> supports \0, it has a notion of "is it utf-8 or not?", so any encoding >> problems wouldn't be new. I'd think that the same would be true of >> Python, but I'm not sure. >> > > Not python safe. See [1]. > Regarding the perl language, I support Junio's point of view: it can be > re-supported in the future. Ah, I'd missed that. Anyway, if it's easy it seems you discovered that Perl deals with it correctly, so we could just have it support this. >> >> > +test_expect_success 'basic atom: refs/tags/testtag *raw' ' >> > + git cat-file commit refs/tags/testtag^{} >expected && >> > + git for-each-ref --format="%(*raw)" refs/tags/testtag >actual && >> > + sanitize_pgp <expected >expected.clean && >> > + sanitize_pgp <actual >actual.clean && >> > + echo "" >>expected.clean && >> >> Just "echo" will do, ditto for the rest. Also odd to go back and forth >> between populating expected.clean & actual.clean. >> > > Are you saying that sanitize_pgp is not needed? No that instead of: echo "" >x You can do: echo >x And also that going back and forth between populating different files is confusing, i.e. this: echo a >x echo c >y echo b >>x is better as: echo a >x echo b >>x echo c >y >> >> > +test_expect_success 'set up refs pointing to binary blob' ' >> > + printf "a\0b\0c" >blob1 && >> > + printf "a\0c\0b" >blob2 && >> > + printf "\0a\0b\0c" >blob3 && >> > + printf "abc" >blob4 && >> > + printf "\0 \0 \0 " >blob5 && >> > + printf "\0 \0a\0 " >blob6 && >> > + printf " " >blob7 && >> > + >blob8 && >> > + git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 && >> > + git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 && >> > + git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 && >> > + git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 && >> > + git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 && >> > + git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 && >> > + git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 && >> > + git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8 >> >> Hrm, xargs just to avoid: >> >> git update-ref ... $(git hash-object) ? >> > > I didn’t think about it, just for convenience. *nod*, Junio had a good suggestion. >> > +test_expect_success '%(raw) with --python must failed' ' >> > + test_must_fail git for-each-ref --format="%(raw)" --python >> > +' >> > + >> > +test_expect_success '%(raw) with --tcl must failed' ' >> > + test_must_fail git for-each-ref --format="%(raw)" --tcl >> > +' >> > + >> > +test_expect_success '%(raw) with --perl must failed' ' >> > + test_must_fail git for-each-ref --format="%(raw)" --perl >> > +' >> > + >> > +test_expect_success '%(raw) with --shell must failed' ' >> > + test_must_fail git for-each-ref --format="%(raw)" --shell >> > +' >> > + >> > +test_expect_success '%(raw) with --shell and --sort=raw must failed' ' >> > + test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell >> > +' >> >> s/must failed/must fail/, but see question above about encoding in these >> languages... > > > [1]: https://lore.kernel.org/git/CAOLTT8QR_GRm4TYk0E_eazQ+unVQODc-3L+b4V5JUN5jtZR8uA@mail.gmail.com/ > > Thanks for a review.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年6月17日周四下午10:45写道： > > > On Thu, Jun 17 2021, ZheNing Hu wrote: > > > > > Yes, strbuf is a suitable choice when using <str,len> pair. > > But if replace v->s with strbuf, the possible changes will be larger. > > I for one would like to see it done that way, those changes are usually > easy to read. Also it seems a large part of 2/8 is extra new code > because we didn't do that, e.g. getting length differently if something > is a strbuf or not, passing char*/size_t pairs to new functions etc. > After some refactoring, I found that there are two problems: 1. There are a lot of codes like this in ref-filter to fill v->s: v->s = show_ref(...) v->s = copy_email(...) It is very difficult to modify here: We know that show_ref() or copy_email() will allocate a block of memory to v->s, but if v->s is a strbuf, what should we do? In copy_email(), we can pass the v->s to copy_email() and use strbuf_add()/strbuf_addstr() instead of xstrdup() and xmemdupz(). But show_ref() will call external functions like shorten_unambiguous_ref(), we don’t know whether it will return us NULL or a dynamically allocated memory. If continue to pass v->s to the inner function, it is not a feasible method. Or we can use strbuf_attach() + strlen(), I'm not sure this is a good method. 2. See: - for (i = 0; i < used_atom_cnt; i++) { + for (i = 0; i < used_atom_cnt; i++) { struct atom_value *v = &ref->value[i]; - if (v->s == NULL && used_atom[i].source == SOURCE_NONE) + if (v->s.len == 0 && used_atom[i].source == SOURCE_NONE) return strbuf_addf_ret(err, -1, _("missing object %s for %s"), oid_to_hex(&ref->objectname), ref->refname); } In the case of using strbuf, I don’t know how to distinguish between an empty strbuf and NULL. It can be easily distinguished by using c-style "const char*". > > > > Not python safe. See [1]. > > Regarding the perl language, I support Junio's point of view: it can be > > re-supported in the future. > > Ah, I'd missed that. Anyway, if it's easy it seems you discovered that > Perl deals with it correctly, so we could just have it support this. > Well, it's ok, support for perl will be put in a separate commit. > >> > >> > +test_expect_success 'basic atom: refs/tags/testtag *raw' ' > >> > + git cat-file commit refs/tags/testtag^{} >expected && > >> > + git for-each-ref --format="%(*raw)" refs/tags/testtag >actual && > >> > + sanitize_pgp <expected >expected.clean && > >> > + sanitize_pgp <actual >actual.clean && > >> > + echo "" >>expected.clean && > >> > >> Just "echo" will do, ditto for the rest. Also odd to go back and forth > >> between populating expected.clean & actual.clean. > >> > > > > Are you saying that sanitize_pgp is not needed? > > No that instead of: > > echo "" >x > > You can do: > > echo >x > > And also that going back and forth between populating different files is > confusing, i.e. this: > > > echo a >x > echo c >y > echo b >>x > > is better as: > > echo a >x > echo b >>x > echo c >y > > Thanks, I get what you meant now. > >> > >> > +test_expect_success 'set up refs pointing to binary blob' ' > >> > + printf "a\0b\0c" >blob1 && > >> > + printf "a\0c\0b" >blob2 && > >> > + printf "\0a\0b\0c" >blob3 && > >> > + printf "abc" >blob4 && > >> > + printf "\0 \0 \0 " >blob5 && > >> > + printf "\0 \0a\0 " >blob6 && > >> > + printf " " >blob7 && > >> > + >blob8 && > >> > + git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 && > >> > + git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 && > >> > + git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 && > >> > + git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 && > >> > + git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 && > >> > + git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 && > >> > + git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 && > >> > + git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8 > >> > >> Hrm, xargs just to avoid: > >> > >> git update-ref ... $(git hash-object) ? > >> > > > > I didn’t think about it, just for convenience. > > *nod*, Junio had a good suggestion. > ok. Thanks. -- ZheNing Hu

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Fri, Jun 18 2021, ZheNing Hu wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年6月17日周四下午10:45写道： >> >> >> On Thu, Jun 17 2021, ZheNing Hu wrote: >> >> > >> > Yes, strbuf is a suitable choice when using <str,len> pair. >> > But if replace v->s with strbuf, the possible changes will be larger. >> >> I for one would like to see it done that way, those changes are usually >> easy to read. Also it seems a large part of 2/8 is extra new code >> because we didn't do that, e.g. getting length differently if something >> is a strbuf or not, passing char*/size_t pairs to new functions etc. >> > > After some refactoring, I found that there are two problems: > 1. There are a lot of codes like this in ref-filter to fill v->s: > > v->s = show_ref(...) > v->s = copy_email(...) > > It is very difficult to modify here: We know that show_ref() > or copy_email() will allocate a block of memory to v->s, but > if v->s is a strbuf, what should we do? In copy_email(), we > can pass the v->s to copy_email() and use strbuf_add()/strbuf_addstr() > instead of xstrdup() and xmemdupz(). But show_ref() will call > external functions like shorten_unambiguous_ref(), we don’t know > whether it will return us NULL or a dynamically allocated memory. > If continue to pass v->s to the inner function, it is not a feasible > method. Or we can use strbuf_attach() + strlen(), I'm not sure > this is a good method. > > 2. See: > > - for (i = 0; i < used_atom_cnt; i++) { > + for (i = 0; i < used_atom_cnt; i++) { > struct atom_value *v = &ref->value[i]; > - if (v->s == NULL && used_atom[i].source == SOURCE_NONE) > + if (v->s.len == 0 && used_atom[i].source == SOURCE_NONE) > return strbuf_addf_ret(err, -1, _("missing > object %s for %s"), > > oid_to_hex(&ref->objectname), ref->refname); > } > > In the case of using strbuf, I don’t know how to distinguish between an empty > strbuf and NULL. It can be easily distinguished by using c-style "const char*". Yes, sometimes it's just too much of a hassle, looking at shorten_unambiguous_ref() which returns a xstrdup()'d value that could indeed be strbuf_attach'd. I haven't tried the conversion myself, perhaps it's too much hassle. Just a suggestion from reading your patch in isolation. >> > >> > Not python safe. See [1]. >> > Regarding the perl language, I support Junio's point of view: it can be >> > re-supported in the future. >> >> Ah, I'd missed that. Anyway, if it's easy it seems you discovered that >> Perl deals with it correctly, so we could just have it support this. >> > > Well, it's ok, support for perl will be put in a separate commit. > >> >> >> >> > +test_expect_success 'basic atom: refs/tags/testtag *raw' ' >> >> > + git cat-file commit refs/tags/testtag^{} >expected && >> >> > + git for-each-ref --format="%(*raw)" refs/tags/testtag >actual && >> >> > + sanitize_pgp <expected >expected.clean && >> >> > + sanitize_pgp <actual >actual.clean && >> >> > + echo "" >>expected.clean && >> >> >> >> Just "echo" will do, ditto for the rest. Also odd to go back and forth >> >> between populating expected.clean & actual.clean. >> >> >> > >> > Are you saying that sanitize_pgp is not needed? >> >> No that instead of: >> >> echo "" >x >> >> You can do: >> >> echo >x >> >> And also that going back and forth between populating different files is >> confusing, i.e. this: >> >> >> echo a >x >> echo c >y >> echo b >>x >> >> is better as: >> >> echo a >x >> echo b >>x >> echo c >y >> >> > > Thanks, I get what you meant now. > >> >> >> >> > +test_expect_success 'set up refs pointing to binary blob' ' >> >> > + printf "a\0b\0c" >blob1 && >> >> > + printf "a\0c\0b" >blob2 && >> >> > + printf "\0a\0b\0c" >blob3 && >> >> > + printf "abc" >blob4 && >> >> > + printf "\0 \0 \0 " >blob5 && >> >> > + printf "\0 \0a\0 " >blob6 && >> >> > + printf " " >blob7 && >> >> > + >blob8 && >> >> > + git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 && >> >> > + git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 && >> >> > + git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 && >> >> > + git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 && >> >> > + git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 && >> >> > + git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 && >> >> > + git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 && >> >> > + git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8 >> >> >> >> Hrm, xargs just to avoid: >> >> >> >> git update-ref ... $(git hash-object) ? >> >> >> > >> > I didn’t think about it, just for convenience. >> >> *nod*, Junio had a good suggestion. >> > > ok. > > Thanks.

On the Git mailing list, Christian Couder wrote (reply to this):

On Fri, Jun 18, 2021 at 12:51 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > On Fri, Jun 18 2021, ZheNing Hu wrote: > > After some refactoring, I found that there are two problems: > > 1. There are a lot of codes like this in ref-filter to fill v->s: > > > > v->s = show_ref(...) > > v->s = copy_email(...) > > > > It is very difficult to modify here: We know that show_ref() > > or copy_email() will allocate a block of memory to v->s, but > > if v->s is a strbuf, what should we do? In copy_email(), we > > can pass the v->s to copy_email() and use strbuf_add()/strbuf_addstr() > > instead of xstrdup() and xmemdupz(). But show_ref() will call > > external functions like shorten_unambiguous_ref(), we don’t know > > whether it will return us NULL or a dynamically allocated memory. > > If continue to pass v->s to the inner function, it is not a feasible > > method. Or we can use strbuf_attach() + strlen(), I'm not sure > > this is a good method. If you resend this patch, it might be a good idea to add a short version of the above explanations into the commit message. [...] > > In the case of using strbuf, I don’t know how to distinguish between an empty > > strbuf and NULL. It can be easily distinguished by using c-style "const char*". Maybe this could also be part of the explanation. > Yes, sometimes it's just too much of a hassle, looking at > shorten_unambiguous_ref() which returns a xstrdup()'d value that could > indeed be strbuf_attach'd. I haven't tried the conversion myself, > perhaps it's too much hassle. > > Just a suggestion from reading your patch in isolation. Yeah, thanks for the review anyway!

gitgitgadget · 2021-06-24T04:18:17Z

On the Git mailing list, Bagas Sanjaya wrote (reply to this):

On 22/06/21 10.20, ZheNing Hu via GitGitGadget wrote: > Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`, > `--tcl`, `--perl` because if our binary raw data is passed to a > variable in the host language, the host language may not support > arbitrary binary data in the variables of its string type. Better say: "Note that `--format=%(raw)` cannot be used with `--python`, `--shell`, `--tcl`, and `--perl` because if the binary raw data is passed to a variable in such languages, these may not support arbitrary binary data in their string variable type." -- An old man doll... just what I always wanted! - Clara

On the Git mailing list, ZheNing Hu wrote (reply to this):

Bagas Sanjaya <bagasdotme@gmail.com> 于2021年6月24日周四下午12:14写道： > > On 22/06/21 10.20, ZheNing Hu via GitGitGadget wrote: > > Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`, > > `--tcl`, `--perl` because if our binary raw data is passed to a > > variable in the host language, the host language may not support > > arbitrary binary data in the variables of its string type. > > Better say: > > "Note that `--format=%(raw)` cannot be used with `--python`, `--shell`, > `--tcl`, and `--perl` because if the binary raw data is passed to a > variable in such languages, these may not support arbitrary binary data > in their string variable type." > Thanks, Bagas Sanjaya, I will change all of them. > -- > An old man doll... just what I always wanted! - Clara -- ZheNing Hu

gitgitgadget · 2021-06-26T00:44:51Z

On the Git mailing list, Bagas Sanjaya wrote (reply to this):

On 25/06/21 23.02, ZheNing Hu via GitGitGadget wrote: > Note that `--format=%(raw)` cannot be used with `--python`, `--shell`, > `--tcl`, and `--perl` because if the binary raw data is passed to a > variable in such languages, these may not support arbitrary binary data > in their string variable type. > Commit message looks OK, but... > +Note that `--format=%(raw)` can not be used with `--python`, `--shell`, `--tcl`, > +`--perl` because the host language may not support arbitrary binary data in the > +variables of its string type. > + Seems like out of sync between commit message and the docs change above. Did you mean the (unsupported) host languages are Python, BASH script, TCL/TK, and Perl respectively? If so, the docs should say: "Note that `--format=%(raw) can not be used with `--python`, `--shell`, `-tcl`, and `--perl` because such languages may not support arbitrary binary data in their string variable type." Thanks. -- An old man doll... just what I always wanted! - Clara

On the Git mailing list, ZheNing Hu wrote (reply to this):

Bagas Sanjaya <bagasdotme@gmail.com> 于2021年6月26日周六上午8:42写道： > > On 25/06/21 23.02, ZheNing Hu via GitGitGadget wrote: > > Note that `--format=%(raw)` cannot be used with `--python`, `--shell`, > > `--tcl`, and `--perl` because if the binary raw data is passed to a > > variable in such languages, these may not support arbitrary binary data > > in their string variable type. > > > > Commit message looks OK, but... > > > +Note that `--format=%(raw)` can not be used with `--python`, `--shell`, `--tcl`, > > +`--perl` because the host language may not support arbitrary binary data in the > > +variables of its string type. > > + > > Seems like out of sync between commit message and the docs change above. > Did you mean the (unsupported) host languages are Python, BASH script, > TCL/TK, and Perl respectively? If so, the docs should say: > Indeed so. I will change them too. > "Note that `--format=%(raw) can not be used with `--python`, `--shell`, > `-tcl`, and `--perl` because such languages may not support arbitrary > binary data in their string variable type." > > Thanks. > > -- > An old man doll... just what I always wanted! - Clara Thanks. -- ZheNing Hu

gitgitgadget · 2021-06-28T06:52:28Z

On the Git mailing list, Bagas Sanjaya wrote (reply to this):

On 27/06/21 19.35, ZheNing Hu via GitGitGadget wrote: > +Note that `--format=%(raw)` can not be used with `--python`, `--shell`, `--tcl`, > +`--perl` because the such language may not support arbitrary binary data in their > +string variable type. > + s/the such language/such languages/ Thanks. -- An old man doll... just what I always wanted! - Clara

-Original file line number
+Diff line change
@@ Expand Up / @@ -109,6 +109,18 @@ UI, Workflows & Features @@
      * "git subtree" updates.
+     * It is now documented that "format-patch" skips merges.
+     * Options to "git pack-objects" that take numeric values like
+       --window and --depth should not accept negative values; the input
+       validation has been tightened.
+     * The way the command line specified by the trailer.<token>.command
+       configuration variable receives the end-user supplied value was
+       both error prone and misleading.  An alternative to achieve the
+       same goal in a safer and more intuitive way has been added, as
+       the trailer.<token>.cmd configuration variable, to replace it.
     Performance, Internal Implementation, Development Support etc.
@@ Expand Down Expand Up / @@ -319,6 +331,20 @@ Fixes since v2.31 @@
      * "git repack -A -d" in a partial clone unnecessarily loosened
        objects in promisor pack.
+     * "git bisect skip" when custom words are used for new/old did not
+       work, which has been corrected.
+     * A few variants of informational message "Already up-to-date" has
+       been rephrased.
+       (merge ad9322da03 js/merge-already-up-to-date-message-reword later to maint).
+     * "git submodule update --quiet" did not propagate the quiet option
+       down to underlying "git fetch", which has been corrected.
+       (merge 62af4bdd42 nc/submodule-update-quiet later to maint).
+     * Document that our test can use "local" keyword.
+       (merge a84fd3bcc6 jc/test-allows-local later to maint).
      * Other code cleanup, docfix, build fix, etc.
        (merge f451960708 dl/cat-file-doc-cleanup later to maint).
        (merge 12604a8d0c sv/t9801-test-path-is-file-cleanup later to maint).
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -226,6 +226,12 @@ newline. The available atoms are: @@
     	after that first run of whitespace (i.e., the "rest" of the
     	line) are output in place of the `%(rest)` atom.
+    Note that most of the atoms in `for-each-ref --format` are now supported,
+    such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`,
+    `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected:
+    `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`,
+    `%(flag)`, `%(HEAD)`. See linkgit:git-for-each-ref[1].
     If no format is specified, the default format is `%(objectname)
     %(objecttype) %(objectsize)`.
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up @@
     without angle brackets, and `:localpart` to get the part before the `@` symbol
     out of the trimmed email.
+    The raw data in an object is `raw`.
+    raw:size::
+    	The raw data size of the object.
+    Note that `--format=%(raw)` can not be used with `--python`, `--shell`, `--tcl`,
+    because such language may not support arbitrary binary data in their
+    string variable type.
     The message in a commit or a tag object is `contents`, from which
     `contents:<part>` can be used to extract various parts out of:
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -232,25 +232,38 @@ trailer.<token>.ifmissing:: @@
     	that option for trailers with the specified <token>.
     trailer.<token>.command::
-    	This option can be used to specify a shell command that will
-    	be called to automatically add or modify a trailer with the
-    	specified <token>.
+    	This option behaves in the same way as 'trailer.<token>.cmd', except
+    	that it doesn't pass anything as argument to the specified command.
+    	Instead the first occurrence of substring $ARG is replaced by the
+    	value that would be passed as argument.
     +
-    When this option is specified, the behavior is as if a special
-    '<token>=<value>' argument were added at the beginning of the command
-    line, where <value> is taken to be the standard output of the
-    specified command with any leading and trailing whitespace trimmed
-    off.
+    The 'trailer.<token>.command' option has been deprecated in favor of
+    'trailer.<token>.cmd' due to the fact that $ARG in the user's command is
+    only replaced once and that the original way of replacing $ARG is not safe.
     +
-    If the command contains the `$ARG` string, this string will be
-    replaced with the <value> part of an existing trailer with the same
-    <token>, if any, before the command is launched.
+    When both 'trailer.<token>.cmd' and 'trailer.<token>.command' are given
+    for the same <token>, 'trailer.<token>.cmd' is used and
+    'trailer.<token>.command' is ignored.
+    trailer.<token>.cmd::
+    	This option can be used to specify a shell command that will be called:
+    	once to automatically add a trailer with the specified <token>, and then
+    	each time a '--trailer <token>=<value>' argument to modify the <value> of
+    	the trailer that this option would produce.
     +
-    If some '<token>=<value>' arguments are also passed on the command
-    line, when a 'trailer.<token>.command' is configured, the command will
-    also be executed for each of these arguments. And the <value> part of
-    these arguments, if any, will be used to replace the `$ARG` string in
-    the command.
+    When the specified command is first called to add a trailer
+    with the specified <token>, the behavior is as if a special
+    '--trailer <token>=<value>' argument was added at the beginning
+    of the "git interpret-trailers" command, where <value>
+    is taken to be the standard output of the command with any
+    leading and trailing whitespace trimmed off.
+    +
+    If some '--trailer <token>=<value>' arguments are also passed
+    on the command line, the command is called again once for each
+    of these arguments with the same <token>. And the <value> part
+    of these arguments, if any, will be passed to the command as its
+    first argument. This way the command can produce a <value> computed
+    from the <value> passed in the '--trailer <token>=<value>' argument.
     EXAMPLES
     --------
@@ Expand Down Expand Up / @@ -333,6 +346,55 @@ subject @@
     Fix #42
     ------------
+    * Configure a 'help' trailer with a cmd use a script `glog-find-author`
+      which search specified author identity from git log in git repository
+      and show how it works:
+    +
+    ------------
+    $ cat ~/bin/glog-find-author
+    #!/bin/sh
+    test -n "$1" && git log --author="$1" --pretty="%an <%ae>" -1 || true
+    $ git config trailer.help.key "Helped-by: "
+    $ git config trailer.help.ifExists "addIfDifferentNeighbor"
+    $ git config trailer.help.cmd "~/bin/glog-find-author"
+    $ git interpret-trailers --trailer="help:Junio" --trailer="help:Couder" <<EOF
+    > subject
+    >
+    > message
+    >
+    > EOF
+    subject
+    message
+    Helped-by: Junio C Hamano <gitster@pobox.com>
+    Helped-by: Christian Couder <christian.couder@gmail.com>
+    ------------
+    * Configure a 'ref' trailer with a cmd use a script `glog-grep`
+      to grep last relevant commit from git log in the git repository
+      and show how it works:
+    +
+    ------------
+    $ cat ~/bin/glog-grep
+    #!/bin/sh
+    test -n "$1" && git log --grep "$1" --pretty=reference -1 || true
+    $ git config trailer.ref.key "Reference-to: "
+    $ git config trailer.ref.ifExists "replace"
+    $ git config trailer.ref.cmd "~/bin/glog-grep"
+    $ git interpret-trailers --trailer="ref:Add copyright notices." <<EOF
+    > subject
+    >
+    > message
+    >
+    > EOF
+    subject
+    message
+    Reference-to: 8bc9a0c769 (Add copyright notices., 2005-04-07)
+    ------------
     * Configure a 'see' trailer with a command to show the subject of a
       commit that is related, and show how it works:
     +
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -1247,6 +1247,12 @@ to: @@
     [attr]binary -diff -merge -text
     ------------
+    NOTES
+    -----
+    Git does not follow symbolic links when accessing a `.gitattributes`
+    file in the working tree. This keeps behavior consistent when the file
+    is accessed from the index or a tree versus from the filesystem.
     EXAMPLES
     --------
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -98,6 +98,14 @@ submodule.<name>.shallow:: @@
     	shallow clone (with a history depth of 1) unless the user explicitly
     	asks for a non-shallow clone.
+    NOTES
+    -----
+    Git does not allow the `.gitmodules` file within a working tree to be a
+    symbolic link, and will refuse to check out such a tree entry. This
+    keeps behavior consistent when the file is accessed from the index or a
+    tree versus from the filesystem, and helps Git reliably enforce security
+    checks of the file contents.
     EXAMPLES
     --------
@@ Expand Down @@

[GSOC][RFC] cat-file: reuse ref-filter logic #980

Uh oh!

[GSOC][RFC] cat-file: reuse ref-filter logic #980

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

gitgitgadget bot Jun 21, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 21, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 26, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 27, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 26, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 27, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 28, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 28, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 17, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 17, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 17, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 17, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 17, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 18, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 18, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 24, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 24, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 26, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 27, 2021

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Jun 28, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!