Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
31dd24e
t7415: remove out-dated comment about translation
peff May 1, 2021
ecc13a8
fsck_tree(): fix shadowed variable
peff May 1, 2021
e5f803e
fsck_tree(): wrap some long lines
peff May 1, 2021
903741f
t7415: rename to expand scope
peff May 1, 2021
1eb72ef
t7450: test verify_path() handling of gitmodules
peff May 3, 2021
8f9eaaf
t7450: test .gitmodules symlink matching against obscured names
peff May 3, 2021
ef23c95
t0060: test ntfs/hfs-obscured dotfiles
peff May 3, 2021
2ed21cc
fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW
peff May 3, 2021
274d4c3
docs: document symlink restrictions for dot-files
peff May 3, 2021
c5ed6cf
docs: correct descript of trailer.<token>.command
adlternative May 3, 2021
fcae595
trailer: add new .cmd config option
adlternative May 3, 2021
4713029
The sixteenth batch
gitster May 11, 2021
423767e
ref-filter: add objectsize to used_atom
adlternative May 13, 2021
c60e72e
ref-filter: introduce enum atom_type
adlternative May 13, 2021
6be0a72
cat-file: handle trivial --batch format with --batch-all-objects
adlternative Jun 3, 2021
3f886be
cat-file: merge two block into one
adlternative Jun 3, 2021
fd12fe8
[GSOC] ref-filter: add obj-type check in grab contents
adlternative Jun 5, 2021
d703156
[GSOC] ref-filter: add %(raw) atom
adlternative Jun 9, 2021
a091df8
[GSOC] ref-filter: --format=%(raw) re-support --perl
adlternative Jun 19, 2021
1729a5f
[GSOC] ref-filter: use non-const ref_format in *_atom_parser()
adlternative Jun 9, 2021
55f8b27
[GSOC] ref-filter: add %(rest) atom
adlternative Jun 9, 2021
f02e7c3
[GSOC] ref-filter: pass get_object() return value to their callers
adlternative Jun 12, 2021
d3a986d
[GSOC] ref-filter: introduce free_ref_array_item_value() function
adlternative Jun 15, 2021
4ae142a
[GSOC] ref-filter: add cat_file_mode in struct ref_format
adlternative Jun 21, 2021
1bb0754
[GSOC] ref-filter: modify the error message and value in get_object
adlternative Jun 21, 2021
1a668d8
[GSOC] cat-file: add has_object_file() check
adlternative Jun 21, 2021
8f6239d
[GSOC] cat-file: change batch_objects parameter name
adlternative Jun 22, 2021
8c2ebaf
[GSOC] cat-file: reuse ref-filter logic
adlternative Jun 10, 2021
13e9468
[GSOC] cat-file: reuse err buf in batch_object_write()
adlternative Jun 12, 2021
60bf953
[GSOC] cat-file: re-implement --textconv, --filters options
adlternative Jun 10, 2021
c3b60b9
[GSOC] ref-filter: remove grab_oid() function
adlternative Jun 25, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions Documentation/RelNotes/2.32.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,18 @@ UI, Workflows & Features

* "git subtree" updates.

* It is now documented that "format-patch" skips merges.

* Options to "git pack-objects" that take numeric values like
--window and --depth should not accept negative values; the input
validation has been tightened.

* The way the command line specified by the trailer.<token>.command
configuration variable receives the end-user supplied value was
both error prone and misleading. An alternative to achieve the
same goal in a safer and more intuitive way has been added, as
the trailer.<token>.cmd configuration variable, to replace it.


Performance, Internal Implementation, Development Support etc.

Expand Down Expand Up @@ -319,6 +331,20 @@ Fixes since v2.31
* "git repack -A -d" in a partial clone unnecessarily loosened
objects in promisor pack.

* "git bisect skip" when custom words are used for new/old did not
work, which has been corrected.

* A few variants of informational message "Already up-to-date" has
been rephrased.
(merge ad9322da03 js/merge-already-up-to-date-message-reword later to maint).

* "git submodule update --quiet" did not propagate the quiet option
down to underlying "git fetch", which has been corrected.
(merge 62af4bdd42 nc/submodule-update-quiet later to maint).

* Document that our test can use "local" keyword.
(merge a84fd3bcc6 jc/test-allows-local later to maint).

* Other code cleanup, docfix, build fix, etc.
(merge f451960708 dl/cat-file-doc-cleanup later to maint).
(merge 12604a8d0c sv/t9801-test-path-is-file-cleanup later to maint).
Expand Down
6 changes: 6 additions & 0 deletions Documentation/git-cat-file.txt
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,12 @@ newline. The available atoms are:
after that first run of whitespace (i.e., the "rest" of the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Christian Couder wrote (reply to this):

On Sat, Jun 19, 2021 at 9:03 AM ZheNing Hu via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: ZheNing Hu <adlternative@gmail.com>
>
> In order to let cat-file use ref-filter logic, the following
> methods are used:

Maybe: s/the following methods are used/let's do the following/

> 1. Add `cat_file_mode` member in struct `ref_format`, this can
> help us reject atoms in verify_ref_format() which cat-file
> cannot use, e.g. `%(refname)`, `%(push)`, `%(upstream)`...
> 2. Change the type of member `format` in struct `batch_options`
> to `ref_format`, We can add format data in it.

Not sure what "We can add format data in it." means.

> 3. Let `batch_objects()` add atoms to format, and use
> `verify_ref_format()` to check atoms.
> 4. Use `has_object_file()` in `batch_one_object()` to check
> whether the input object exists.
> 5. Let get_object() return 1 and print "<oid> missing" instead
> of returning -1 and printing "missing object <oid> for <refname>",
> this can help `format_ref_array_item()` just report that the
> object is missing without letting Git exit.
> 6. Use `format_ref_array_item()` in `batch_object_write()` to
> get the formatted data corresponding to the object. If the
> return value of `format_ref_array_item()` is equals to zero,
> use `batch_write()` to print object data; else if the return
> value less than zero, use `die()` to print the error message
> and exit; else return value greater than zero, only print the

s/else return value greater/else if the return value is greater/

> error message, but not exit.

s/not exit/don't exit/

> 7. Use free_ref_array_item_value() to free ref_array_item's
> value.

That looks like a lot of changes in a single commit. I wonder if this
commit could be split.

> Most of the atoms in `for-each-ref --format` are now supported,
> such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`,
> `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected:
> `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`,
> `%(flag)`, `%(HEAD)`, because our objects don't have refname.

s/refname/a refname/

It might be worth talking a bit about possible performance changes.

[...]

> +       ret = format_ref_array_item(&item, &opt->format, scratch, &err);
> +       if (!ret) {
> +               strbuf_addch(scratch, '\n');
> +               batch_write(opt, scratch->buf, scratch->len);
> +       } else if (ret < 0) {
> +               die("%s\n", err.buf);

This if (ret < 0) could be checked first.

> +       } else {
> +               /* when ret > 0 , don't call die and print the err to stdout*/

I think it would be more helpful to tell what ret > 0 means, rather
than what we do below (which can easily be seen).

> +               printf("%s\n", err.buf);
> +               fflush(stdout);
>         }

For example:

       if (ret < 0) {
               die("%s\n", err.buf);
       if (ret) {
               /* ret > 0 means ... */
               printf("%s\n", err.buf);
               fflush(stdout);
       } else {
               strbuf_addch(scratch, '\n');
               batch_write(opt, scratch->buf, scratch->len);
       }

> +       free_ref_array_item_value(&item);
> +       strbuf_release(&err);
>  }

[...]

> +static int batch_objects(struct batch_options *opt, const struct option *options)

It's unfortunate that one argument is called "opt" and the other one
"options". I wonder if the first one could be called "batch" as it
seems to be called this way somewhere else.

> +       if (!opt->format.format)
> +               strbuf_addstr(&format, "%(objectname) %(objecttype) %(objectsize)");
> +       else
> +               strbuf_addstr(&format, opt->format.format);

If there is no reason for the condition to be (!X) over just (X), I
think the latter is a bit better.

>         if (opt->print_contents)
> -               data.info.typep = &data.type;
> +               strbuf_addstr(&format, "\n%(raw)");
> +       opt->format.format = format.buf;

I wonder if this should be:

      opt->format.format = strbuf_detach(&format, NULL);

> +       if (verify_ref_format(&opt->format))
> +               usage_with_options(cat_file_usage, options);

[...]

> @@ -86,7 +87,7 @@ struct ref_format {
>         int need_color_reset_at_eol;
>  };
>
> -#define REF_FORMAT_INIT { NULL, NULL, 0, 0, -1 }
> +#define REF_FORMAT_INIT { NULL, NULL, 0, 0, 0, -1 }

Maybe this can already be changed to a designated initializer, like
Ævar suggested recently.

[...]

> +test_expect_success 'cat-file refs/heads/main refs/tags/testtag %(rest)' '

If this test is about checking that %(rest) works with both a branch
and a tag, it might be better to say it more explicitly.

> +       cat >expected <<-EOF &&
> +       123 commit 123
> +       456 tag 456
> +       EOF
> +       git cat-file --batch-check="%(rest) %(objecttype) %(rest)" >actual <<-EOF &&
> +       refs/heads/main 123
> +       refs/tags/testtag 456
> +       EOF
> +       test_cmp expected actual
> +'

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Christian Couder <christian.couder@gmail.com> 于2021年6月21日周一 下午1:55写道:
>
> > 1. Add `cat_file_mode` member in struct `ref_format`, this can
> > help us reject atoms in verify_ref_format() which cat-file
> > cannot use, e.g. `%(refname)`, `%(push)`, `%(upstream)`...
> > 2. Change the type of member `format` in struct `batch_options`
> > to `ref_format`, We can add format data in it.
>
> Not sure what "We can add format data in it." means.

Well, there is something wrong with the expression here. What I want
to express is that we can fill its member "format" with the atoms like
"%(objectname) %(refname)", and then pass it to the ref-filter.

> > 7. Use free_ref_array_item_value() to free ref_array_item's
> > value.
>
> That looks like a lot of changes in a single commit. I wonder if this
> commit could be split.
>

Yeah, But I don’t know if I should take it apart step by step, If taken apart,
those intermediate commits are likely to fail the test.

> > Most of the atoms in `for-each-ref --format` are now supported,
> > such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`,
> > `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected:
> > `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`,
> > `%(flag)`, `%(HEAD)`, because our objects don't have refname.
>
> s/refname/a refname/
>
> It might be worth talking a bit about possible performance changes.
>

Makes sense. The performance has indeed deteriorated.

> [...]
>
> > +       ret = format_ref_array_item(&item, &opt->format, scratch, &err);
> > +       if (!ret) {
> > +               strbuf_addch(scratch, '\n');
> > +               batch_write(opt, scratch->buf, scratch->len);
> > +       } else if (ret < 0) {
> > +               die("%s\n", err.buf);
>
> This if (ret < 0) could be checked first.

Yes, it is better to put error checking first.

>
> > +       } else {
> > +               /* when ret > 0 , don't call die and print the err to stdout*/
>
> I think it would be more helpful to tell what ret > 0 means, rather
> than what we do below (which can easily be seen).
>

Ah, There is indeed only one situation for ret > 0 for the time being:
Show "<oid> missing" without exiting Git.

> > +               printf("%s\n", err.buf);
> > +               fflush(stdout);
> >         }
>
> For example:
>
>        if (ret < 0) {
>                die("%s\n", err.buf);
>        if (ret) {
>                /* ret > 0 means ... */
>                printf("%s\n", err.buf);
>                fflush(stdout);
>        } else {
>                strbuf_addch(scratch, '\n');
>                batch_write(opt, scratch->buf, scratch->len);
>        }
>

Yes, this might be better.

> > +       free_ref_array_item_value(&item);
> > +       strbuf_release(&err);
> >  }
>
> [...]
>
> > +static int batch_objects(struct batch_options *opt, const struct option *options)
>
> It's unfortunate that one argument is called "opt" and the other one
> "options". I wonder if the first one could be called "batch" as it
> seems to be called this way somewhere else.
>

OK.

> > +       if (!opt->format.format)
> > +               strbuf_addstr(&format, "%(objectname) %(objecttype) %(objectsize)");
> > +       else
> > +               strbuf_addstr(&format, opt->format.format);
>
> If there is no reason for the condition to be (!X) over just (X), I
> think the latter is a bit better.
>

Although I think it doesn’t matter which one to use first,
I will still follow your suggestions.

> >         if (opt->print_contents)
> > -               data.info.typep = &data.type;
> > +               strbuf_addstr(&format, "\n%(raw)");
> > +       opt->format.format = format.buf;
>
> I wonder if this should be:
>
>       opt->format.format = strbuf_detach(&format, NULL);
>

No. here our opt->format.format will not be changed, it would
be better for us to use `strbuf_release(&format)` for resource
recovery. (strbuf_detach() will forced to let us free opt->format.format)

> > +       if (verify_ref_format(&opt->format))
> > +               usage_with_options(cat_file_usage, options);
>
> [...]
>
> > @@ -86,7 +87,7 @@ struct ref_format {
> >         int need_color_reset_at_eol;
> >  };
> >
> > -#define REF_FORMAT_INIT { NULL, NULL, 0, 0, -1 }
> > +#define REF_FORMAT_INIT { NULL, NULL, 0, 0, 0, -1 }
>
> Maybe this can already be changed to a designated initializer, like
> Ævar suggested recently.
>

I agree.

> [...]
>
> > +test_expect_success 'cat-file refs/heads/main refs/tags/testtag %(rest)' '
>
> If this test is about checking that %(rest) works with both a branch
> and a tag, it might be better to say it more explicitly.
>

OK.

Thanks.
--
ZheNing Hu

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Hariom verma wrote (reply to this):

Hi,

On Fri, Jun 25, 2021 at 9:32 PM ZheNing Hu via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: ZheNing Hu <adlternative@gmail.com>
>
> +       if (ret < 0) {
> +               die("%s\n", err.buf);
> +       } if (ret) {
> +               /* ret > 0 means when the object corresponding to oid
> +                * cannot be found in format_ref_array_item(), we only print
> +                * the error message.
> +                */
> +               printf("%s\n", err.buf);
> +               fflush(stdout);
> +       } else {
> +               strbuf_addch(scratch, '\n');
> +               batch_write(opt, scratch->buf, scratch->len);
>         }
> +       free_ref_array_item_value(&item);
> +       strbuf_release(&err);
>  }

I think you can get rid of braces in condition `ret < 0`:

```
        if (ret < 0)
                die("%s\n", err->buf);
        if (ret) {
                /* ret > 0 means when the object corresponding to oid
                 * cannot be found in format_ref_array_item(), we only print
                 * the error message.
                 */
                printf("%s\n", err->buf);
                fflush(stdout);
        } else {
                strbuf_addch(scratch, '\n');
                batch_write(opt, scratch->buf, scratch->len);
        }
```

Thanks,
Hariom.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Hariom verma <hariom18599@gmail.com> 于2021年6月27日周日 上午1:27写道:
>
> Hi,
>
> On Fri, Jun 25, 2021 at 9:32 PM ZheNing Hu via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > +       if (ret < 0) {
> > +               die("%s\n", err.buf);
> > +       } if (ret) {
> > +               /* ret > 0 means when the object corresponding to oid
> > +                * cannot be found in format_ref_array_item(), we only print
> > +                * the error message.
> > +                */
> > +               printf("%s\n", err.buf);
> > +               fflush(stdout);
> > +       } else {
> > +               strbuf_addch(scratch, '\n');
> > +               batch_write(opt, scratch->buf, scratch->len);
> >         }
> > +       free_ref_array_item_value(&item);
> > +       strbuf_release(&err);
> >  }
>
> I think you can get rid of braces in condition `ret < 0`:
>

Make sences. ;-)

> ```
>         if (ret < 0)
>                 die("%s\n", err->buf);
>         if (ret) {
>                 /* ret > 0 means when the object corresponding to oid
>                  * cannot be found in format_ref_array_item(), we only print
>                  * the error message.
>                  */
>                 printf("%s\n", err->buf);
>                 fflush(stdout);
>         } else {
>                 strbuf_addch(scratch, '\n');
>                 batch_write(opt, scratch->buf, scratch->len);
>         }
> ```
>
> Thanks,
> Hariom.

Thanks,
ZheNing Hu

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Hariom verma wrote (reply to this):

On Fri, Jun 25, 2021 at 9:32 PM ZheNing Hu via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: ZheNing Hu <adlternative@gmail.com>
>
>  static void batch_object_write(const char *obj_name,
>                                struct strbuf *scratch,
>                                struct batch_options *opt,
>                                struct expand_data *data)
>  {
> -       if (!data->skip_object_info &&
> -           oid_object_info_extended(the_repository, &data->oid, &data->info,
> -                                    OBJECT_INFO_LOOKUP_REPLACE) < 0) {
> -               printf("%s missing\n",
> -                      obj_name ? obj_name : oid_to_hex(&data->oid));
> -               fflush(stdout);
> -               return;
> -       }
> +       int ret = 0;

No need to initialize `ret` with 0. Later we are going to assign it
with the return value of `format_ref_array_item()` anyway.

> +       struct strbuf err = STRBUF_INIT;
> +       struct ref_array_item item = { data->oid, data->rest };
>
>         strbuf_reset(scratch);
> -       strbuf_expand(scratch, opt->format, expand_format, data);
> -       strbuf_addch(scratch, '\n');
> -       batch_write(opt, scratch->buf, scratch->len);
>
> -       if (opt->print_contents) {
> -               print_object_or_die(opt, data);
> -               batch_write(opt, "\n", 1);
> +       ret = format_ref_array_item(&item, &opt->format, scratch, &err);

Here.

-- 
Hariom

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Hariom verma <hariom18599@gmail.com> 于2021年6月27日周日 上午2:08写道:
>
> On Fri, Jun 25, 2021 at 9:32 PM ZheNing Hu via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> >  static void batch_object_write(const char *obj_name,
> >                                struct strbuf *scratch,
> >                                struct batch_options *opt,
> >                                struct expand_data *data)
> >  {
> > -       if (!data->skip_object_info &&
> > -           oid_object_info_extended(the_repository, &data->oid, &data->info,
> > -                                    OBJECT_INFO_LOOKUP_REPLACE) < 0) {
> > -               printf("%s missing\n",
> > -                      obj_name ? obj_name : oid_to_hex(&data->oid));
> > -               fflush(stdout);
> > -               return;
> > -       }
> > +       int ret = 0;
>
> No need to initialize `ret` with 0. Later we are going to assign it
> with the return value of `format_ref_array_item()` anyway.
>

I agree. It is worth noting that there are similar `int ret = 0` in
ref-filter.c,
they should be changed too.

> > +       struct strbuf err = STRBUF_INIT;
> > +       struct ref_array_item item = { data->oid, data->rest };
> >
> >         strbuf_reset(scratch);
> > -       strbuf_expand(scratch, opt->format, expand_format, data);
> > -       strbuf_addch(scratch, '\n');
> > -       batch_write(opt, scratch->buf, scratch->len);
> >
> > -       if (opt->print_contents) {
> > -               print_object_or_die(opt, data);
> > -               batch_write(opt, "\n", 1);
> > +       ret = format_ref_array_item(&item, &opt->format, scratch, &err);
>
> Here.
>
> --
> Hariom

--
ZheNing Hu

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Hariom verma wrote (reply to this):

Hi,

On Sun, Jun 27, 2021 at 6:06 PM ZheNing Hu via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: ZheNing Hu <adlternative@gmail.com>
>
> In order to let cat-file use ref-filter logic, let's do the
> following:
>
> 1. Change the type of member `format` in struct `batch_options`
> to `ref_format`, we will pass it to ref-filter later.
> 2. Let `batch_objects()` add atoms to format, and use
> `verify_ref_format()` to check atoms.
> 3. Use `format_ref_array_item()` in `batch_object_write()` to
> get the formatted data corresponding to the object. If the
> return value of `format_ref_array_item()` is equals to zero,
> use `batch_write()` to print object data; else if the return
> value is less than zero, use `die()` to print the error message
> and exit; else if return value is greater than zero, only print
> the error message, but don't exit.
> 4. Use free_ref_array_item_value() to free ref_array_item's
> value.
>
> Most of the atoms in `for-each-ref --format` are now supported,
> such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`,
> `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected:
> `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`,
> `%(flag)`, `%(HEAD)`, because our objects don't have a refname.

It's not clear why some atoms are rejected?

Are we going to support them in later commits? (or sometime in the future)
OR
We are never going to support them. Because they make no sense to
cat-file? (or whatever the reason)

Whatever is the reason, I think it's a good idea to include it in the
commit message.

-- 
Hariom

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Hi,

Hariom verma <hariom18599@gmail.com> 于2021年6月28日周一 下午3:47写道:
>
> Hi,
>
> >
> > Most of the atoms in `for-each-ref --format` are now supported,
> > such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`,
> > `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected:
> > `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`,
> > `%(flag)`, `%(HEAD)`, because our objects don't have a refname.
>
> It's not clear why some atoms are rejected?
>
> Are we going to support them in later commits? (or sometime in the future)
> OR
> We are never going to support them. Because they make no sense to
> cat-file? (or whatever the reason)
>

Because in "git for-each-ref"'s "family", ref_array_item is generated
by filter_refs(),
which uses ref_filter_handler() to fill ref_array_item with ref's
data. In "git cat-file",
we care about the object, not the ref. Therefore, ref_array_item is
only filled with
{oid, rest} in batch_object_write() in cat-file.c. We cannot represent
some specific
ref-related data in "git cat-file", so we cannot have some atoms in ref_filter.
Yes, we probably won't support them in the future.

From an object-oriented point of view, the atom supported by
"cat-file" should be a
parent class, "for-each-ref"',"branch","tag"... they have more
specific object details (ref),
their supported atom should be a derived class, they can support more atoms.

> Whatever is the reason, I think it's a good idea to include it in the
> commit message.
>

Yeah.  The sentence "because our objects don't have a refname." may not
correctly express the reason for rejecting these atoms.  I will add
more descriptions.

> --
> Hariom

Thanks.
--
ZheNing Hu

line) are output in place of the `%(rest)` atom.

Note that most of the atoms in `for-each-ref --format` are now supported,
such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`,
`%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected:
`%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`,
`%(flag)`, `%(HEAD)`. See linkgit:git-for-each-ref[1].

If no format is specified, the default format is `%(objectname)
%(objecttype) %(objectsize)`.

Expand Down
9 changes: 9 additions & 0 deletions Documentation/git-for-each-ref.txt
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,15 @@ and `date` to extract the named component. For email fields (`authoremail`,
without angle brackets, and `:localpart` to get the part before the `@` symbol
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Sat, Jun 12 2021, ZheNing Hu via GitGitGadget wrote:

> From: ZheNing Hu <adlternative@gmail.com>
>
> Add new formatting option `%(raw)`, which will print the raw
> object data without any changes. It will help further to migrate
> all cat-file formatting logic from cat-file to ref-filter.

Nice goal and feature to have.

> The raw data of blob, tree objects may contain '\0', but most of
> the logic in `ref-filter` depends on the output of the atom being
> text (specifically, no embedded NULs in it).
>
> E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()`
> add the data to the buffer. The raw data of a tree object is
> `100644 one\0...`, only the `100644 one` will be added to the buffer,
> which is incorrect.
>
> Therefore, add a new member in `struct atom_value`: `s_size`, which
> can record raw object size, it can help us add raw object data to
> the buffer or compare two buffers which contain raw object data.

Most of the functions that deal with this already use a strbuf in some
way, before we had a const char *, now there's a size_t to go along with
it, why not simply use a strbuf in the struct for the data? You'll then
get the size and \0 handling for free, and any functions to deal with
conversion can stick to the strbuf API, there seems to be a lot of back
and forth now.

> Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
> `--tcl`, `--perl` because if our binary raw data is passed to a variable
> in the host language, the host language may not support arbitrary binary
> data in the variables of its string type.

Perl at least deals with that just fine, and to the extent that it
doesn't any new problems here would have nothing to do with \0 being in
the data. Perl doesn't have a notion of "binary has \0 in it", it always
supports \0, it has a notion of "is it utf-8 or not?", so any encoding
problems wouldn't be new. I'd think that the same would be true of
Python, but I'm not sure.


> +test_expect_success 'basic atom: refs/tags/testtag *raw' '
> +	git cat-file commit refs/tags/testtag^{} >expected &&
> +	git for-each-ref --format="%(*raw)" refs/tags/testtag >actual &&
> +	sanitize_pgp <expected >expected.clean &&
> +	sanitize_pgp <actual >actual.clean &&
> +	echo "" >>expected.clean &&

Just "echo" will do, ditto for the rest. Also odd to go back and forth
between populating expected.clean & actual.clean.


> +test_expect_success 'set up refs pointing to binary blob' '
> +	printf "a\0b\0c" >blob1 &&
> +	printf "a\0c\0b" >blob2 &&
> +	printf "\0a\0b\0c" >blob3 &&
> +	printf "abc" >blob4 &&
> +	printf "\0 \0 \0 " >blob5 &&
> +	printf "\0 \0a\0 " >blob6 &&
> +	printf "  " >blob7 &&
> +	>blob8 &&
> +	git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 &&
> +	git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 &&
> +	git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 &&
> +	git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 &&
> +	git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 &&
> +	git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 &&
> +	git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 &&
> +	git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8

Hrm, xargs just to avoid:

    git update-ref ... $(git hash-object) ?

> +test_expect_success '%(raw) with --python must failed' '
> +	test_must_fail git for-each-ref --format="%(raw)" --python
> +'
> +
> +test_expect_success '%(raw) with --tcl must failed' '
> +	test_must_fail git for-each-ref --format="%(raw)" --tcl
> +'
> +
> +test_expect_success '%(raw) with --perl must failed' '
> +	test_must_fail git for-each-ref --format="%(raw)" --perl
> +'
> +
> +test_expect_success '%(raw) with --shell must failed' '
> +	test_must_fail git for-each-ref --format="%(raw)" --shell
> +'
> +
> +test_expect_success '%(raw) with --shell and --sort=raw must failed' '
> +	test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell
> +'

s/must failed/must fail/, but see question above about encoding in these
languages...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
>> `--tcl`, `--perl` because if our binary raw data is passed to a variable
>> in the host language, the host language may not support arbitrary binary
>> data in the variables of its string type.
>
> Perl at least deals with that just fine, and to the extent that it
> doesn't any new problems here would have nothing to do with \0 being in
> the data. Perl doesn't have a notion of "binary has \0 in it", it always
> supports \0, it has a notion of "is it utf-8 or not?", so any encoding
> problems wouldn't be new. I'd think that the same would be true of
> Python, but I'm not sure.

During an earlier iteration long time ago, as we knew Perl is
capable of handling sequence of bytes including NUL, it was decided
to start more strict by rejecting binary for all languages, which
can later be loosened, to limit the scope of the initial
implementation.

>> +	git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 &&
>> +	git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 &&
>> +	git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 &&
>> +	git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 &&
>> +	git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 &&
>> +	git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 &&
>> +	git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 &&
>> +	git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8
>
> Hrm, xargs just to avoid:
>
>     git update-ref ... $(git hash-object) ?

That's horrible.  Thanks for noticing.

We'd want to catch segfaults from both hash-object and update-ref.
One way to do so may be

	O=$(git hash-object -w blob1) &&
	git update-ref refs/myblobs/blob1 "$O"

Thanks for a review.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年6月17日周四 下午3:16写道:
>
> > The raw data of blob, tree objects may contain '\0', but most of
> > the logic in `ref-filter` depends on the output of the atom being
> > text (specifically, no embedded NULs in it).
> >
> > E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()`
> > add the data to the buffer. The raw data of a tree object is
> > `100644 one\0...`, only the `100644 one` will be added to the buffer,
> > which is incorrect.
> >
> > Therefore, add a new member in `struct atom_value`: `s_size`, which
> > can record raw object size, it can help us add raw object data to
> > the buffer or compare two buffers which contain raw object data.
>
> Most of the functions that deal with this already use a strbuf in some
> way, before we had a const char *, now there's a size_t to go along with
> it, why not simply use a strbuf in the struct for the data? You'll then
> get the size and \0 handling for free, and any functions to deal with
> conversion can stick to the strbuf API, there seems to be a lot of back
> and forth now.
>

Yes, strbuf is a suitable choice when using <str,len> pair.
But if replace v->s with strbuf, the possible changes will be larger.

> > Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
> > `--tcl`, `--perl` because if our binary raw data is passed to a variable
> > in the host language, the host language may not support arbitrary binary
> > data in the variables of its string type.
>
> Perl at least deals with that just fine, and to the extent that it
> doesn't any new problems here would have nothing to do with \0 being in
> the data. Perl doesn't have a notion of "binary has \0 in it", it always
> supports \0, it has a notion of "is it utf-8 or not?", so any encoding
> problems wouldn't be new. I'd think that the same would be true of
> Python, but I'm not sure.
>

Not python safe. See [1].
Regarding the perl language, I support Junio's point of view: it can be
re-supported in the future.

>
> > +test_expect_success 'basic atom: refs/tags/testtag *raw' '
> > +     git cat-file commit refs/tags/testtag^{} >expected &&
> > +     git for-each-ref --format="%(*raw)" refs/tags/testtag >actual &&
> > +     sanitize_pgp <expected >expected.clean &&
> > +     sanitize_pgp <actual >actual.clean &&
> > +     echo "" >>expected.clean &&
>
> Just "echo" will do, ditto for the rest. Also odd to go back and forth
> between populating expected.clean & actual.clean.
>

Are you saying that sanitize_pgp is not needed?

>
> > +test_expect_success 'set up refs pointing to binary blob' '
> > +     printf "a\0b\0c" >blob1 &&
> > +     printf "a\0c\0b" >blob2 &&
> > +     printf "\0a\0b\0c" >blob3 &&
> > +     printf "abc" >blob4 &&
> > +     printf "\0 \0 \0 " >blob5 &&
> > +     printf "\0 \0a\0 " >blob6 &&
> > +     printf "  " >blob7 &&
> > +     >blob8 &&
> > +     git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 &&
> > +     git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 &&
> > +     git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 &&
> > +     git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 &&
> > +     git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 &&
> > +     git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 &&
> > +     git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 &&
> > +     git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8
>
> Hrm, xargs just to avoid:
>
>     git update-ref ... $(git hash-object) ?
>

I didn’t think about it, just for convenience.

> > +test_expect_success '%(raw) with --python must failed' '
> > +     test_must_fail git for-each-ref --format="%(raw)" --python
> > +'
> > +
> > +test_expect_success '%(raw) with --tcl must failed' '
> > +     test_must_fail git for-each-ref --format="%(raw)" --tcl
> > +'
> > +
> > +test_expect_success '%(raw) with --perl must failed' '
> > +     test_must_fail git for-each-ref --format="%(raw)" --perl
> > +'
> > +
> > +test_expect_success '%(raw) with --shell must failed' '
> > +     test_must_fail git for-each-ref --format="%(raw)" --shell
> > +'
> > +
> > +test_expect_success '%(raw) with --shell and --sort=raw must failed' '
> > +     test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell
> > +'
>
> s/must failed/must fail/, but see question above about encoding in these
> languages...


[1]: https://lore.kernel.org/git/CAOLTT8QR_GRm4TYk0E_eazQ+unVQODc-3L+b4V5JUN5jtZR8uA@mail.gmail.com/

Thanks for a review.
--
ZheNing Hu

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Thu, Jun 17 2021, ZheNing Hu wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年6月17日周四 下午3:16写道:
>>
>> > The raw data of blob, tree objects may contain '\0', but most of
>> > the logic in `ref-filter` depends on the output of the atom being
>> > text (specifically, no embedded NULs in it).
>> >
>> > E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()`
>> > add the data to the buffer. The raw data of a tree object is
>> > `100644 one\0...`, only the `100644 one` will be added to the buffer,
>> > which is incorrect.
>> >
>> > Therefore, add a new member in `struct atom_value`: `s_size`, which
>> > can record raw object size, it can help us add raw object data to
>> > the buffer or compare two buffers which contain raw object data.
>>
>> Most of the functions that deal with this already use a strbuf in some
>> way, before we had a const char *, now there's a size_t to go along with
>> it, why not simply use a strbuf in the struct for the data? You'll then
>> get the size and \0 handling for free, and any functions to deal with
>> conversion can stick to the strbuf API, there seems to be a lot of back
>> and forth now.
>>
>
> Yes, strbuf is a suitable choice when using <str,len> pair.
> But if replace v->s with strbuf, the possible changes will be larger.

I for one would like to see it done that way, those changes are usually
easy to read. Also it seems a large part of 2/8 is extra new code
because we didn't do that, e.g. getting length differently if something
is a strbuf or not, passing char*/size_t pairs to new functions etc.

>> > Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
>> > `--tcl`, `--perl` because if our binary raw data is passed to a variable
>> > in the host language, the host language may not support arbitrary binary
>> > data in the variables of its string type.
>>
>> Perl at least deals with that just fine, and to the extent that it
>> doesn't any new problems here would have nothing to do with \0 being in
>> the data. Perl doesn't have a notion of "binary has \0 in it", it always
>> supports \0, it has a notion of "is it utf-8 or not?", so any encoding
>> problems wouldn't be new. I'd think that the same would be true of
>> Python, but I'm not sure.
>>
>
> Not python safe. See [1].
> Regarding the perl language, I support Junio's point of view: it can be
> re-supported in the future.

Ah, I'd missed that. Anyway, if it's easy it seems you discovered that
Perl deals with it correctly, so we could just have it support this.

>>
>> > +test_expect_success 'basic atom: refs/tags/testtag *raw' '
>> > +     git cat-file commit refs/tags/testtag^{} >expected &&
>> > +     git for-each-ref --format="%(*raw)" refs/tags/testtag >actual &&
>> > +     sanitize_pgp <expected >expected.clean &&
>> > +     sanitize_pgp <actual >actual.clean &&
>> > +     echo "" >>expected.clean &&
>>
>> Just "echo" will do, ditto for the rest. Also odd to go back and forth
>> between populating expected.clean & actual.clean.
>>
>
> Are you saying that sanitize_pgp is not needed?

No that instead of:

    echo "" >x

You can do:

    echo >x

And also that going back and forth between populating different files is
confusing, i.e. this:


    echo a >x
    echo c >y
    echo b >>x

is better as:

    echo a >x
    echo b >>x
    echo c >y


>>
>> > +test_expect_success 'set up refs pointing to binary blob' '
>> > +     printf "a\0b\0c" >blob1 &&
>> > +     printf "a\0c\0b" >blob2 &&
>> > +     printf "\0a\0b\0c" >blob3 &&
>> > +     printf "abc" >blob4 &&
>> > +     printf "\0 \0 \0 " >blob5 &&
>> > +     printf "\0 \0a\0 " >blob6 &&
>> > +     printf "  " >blob7 &&
>> > +     >blob8 &&
>> > +     git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 &&
>> > +     git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 &&
>> > +     git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 &&
>> > +     git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 &&
>> > +     git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 &&
>> > +     git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 &&
>> > +     git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 &&
>> > +     git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8
>>
>> Hrm, xargs just to avoid:
>>
>>     git update-ref ... $(git hash-object) ?
>>
>
> I didn’t think about it, just for convenience.

*nod*, Junio had a good suggestion.

>> > +test_expect_success '%(raw) with --python must failed' '
>> > +     test_must_fail git for-each-ref --format="%(raw)" --python
>> > +'
>> > +
>> > +test_expect_success '%(raw) with --tcl must failed' '
>> > +     test_must_fail git for-each-ref --format="%(raw)" --tcl
>> > +'
>> > +
>> > +test_expect_success '%(raw) with --perl must failed' '
>> > +     test_must_fail git for-each-ref --format="%(raw)" --perl
>> > +'
>> > +
>> > +test_expect_success '%(raw) with --shell must failed' '
>> > +     test_must_fail git for-each-ref --format="%(raw)" --shell
>> > +'
>> > +
>> > +test_expect_success '%(raw) with --shell and --sort=raw must failed' '
>> > +     test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell
>> > +'
>>
>> s/must failed/must fail/, but see question above about encoding in these
>> languages...
>
>
> [1]: https://lore.kernel.org/git/CAOLTT8QR_GRm4TYk0E_eazQ+unVQODc-3L+b4V5JUN5jtZR8uA@mail.gmail.com/
>
> Thanks for a review.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年6月17日周四 下午10:45写道:
>
>
> On Thu, Jun 17 2021, ZheNing Hu wrote:
>
> >
> > Yes, strbuf is a suitable choice when using <str,len> pair.
> > But if replace v->s with strbuf, the possible changes will be larger.
>
> I for one would like to see it done that way, those changes are usually
> easy to read. Also it seems a large part of 2/8 is extra new code
> because we didn't do that, e.g. getting length differently if something
> is a strbuf or not, passing char*/size_t pairs to new functions etc.
>

After some refactoring, I found that there are two problems:
1. There are a lot of codes like this in ref-filter to fill v->s:

v->s = show_ref(...)
v->s = copy_email(...)

It is very difficult to modify here: We know that show_ref()
or copy_email() will allocate a block of memory to v->s, but
if v->s is a strbuf, what should we do? In copy_email(), we
can pass the v->s to copy_email() and use strbuf_add()/strbuf_addstr()
instead of xstrdup() and xmemdupz(). But show_ref() will call
external functions like shorten_unambiguous_ref(), we don’t know
whether it will return us NULL or a dynamically allocated memory.
If continue to pass v->s to the inner function, it is not a feasible
method. Or we can use strbuf_attach() + strlen(), I'm not sure
this is a good method.

2. See:

-       for (i = 0; i < used_atom_cnt; i++) {
+       for (i = 0; i < used_atom_cnt; i++) {
                struct atom_value *v = &ref->value[i];
-               if (v->s == NULL && used_atom[i].source == SOURCE_NONE)
+               if (v->s.len == 0 && used_atom[i].source == SOURCE_NONE)
                        return strbuf_addf_ret(err, -1, _("missing
object %s for %s"),

oid_to_hex(&ref->objectname), ref->refname);
        }

In the case of using strbuf, I don’t know how to distinguish between an empty
strbuf and NULL. It can be easily distinguished by using c-style "const char*".

> >
> > Not python safe. See [1].
> > Regarding the perl language, I support Junio's point of view: it can be
> > re-supported in the future.
>
> Ah, I'd missed that. Anyway, if it's easy it seems you discovered that
> Perl deals with it correctly, so we could just have it support this.
>

Well, it's ok, support for perl will be put in a separate commit.

> >>
> >> > +test_expect_success 'basic atom: refs/tags/testtag *raw' '
> >> > +     git cat-file commit refs/tags/testtag^{} >expected &&
> >> > +     git for-each-ref --format="%(*raw)" refs/tags/testtag >actual &&
> >> > +     sanitize_pgp <expected >expected.clean &&
> >> > +     sanitize_pgp <actual >actual.clean &&
> >> > +     echo "" >>expected.clean &&
> >>
> >> Just "echo" will do, ditto for the rest. Also odd to go back and forth
> >> between populating expected.clean & actual.clean.
> >>
> >
> > Are you saying that sanitize_pgp is not needed?
>
> No that instead of:
>
>     echo "" >x
>
> You can do:
>
>     echo >x
>
> And also that going back and forth between populating different files is
> confusing, i.e. this:
>
>
>     echo a >x
>     echo c >y
>     echo b >>x
>
> is better as:
>
>     echo a >x
>     echo b >>x
>     echo c >y
>
>

Thanks, I get what you meant now.

> >>
> >> > +test_expect_success 'set up refs pointing to binary blob' '
> >> > +     printf "a\0b\0c" >blob1 &&
> >> > +     printf "a\0c\0b" >blob2 &&
> >> > +     printf "\0a\0b\0c" >blob3 &&
> >> > +     printf "abc" >blob4 &&
> >> > +     printf "\0 \0 \0 " >blob5 &&
> >> > +     printf "\0 \0a\0 " >blob6 &&
> >> > +     printf "  " >blob7 &&
> >> > +     >blob8 &&
> >> > +     git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 &&
> >> > +     git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 &&
> >> > +     git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 &&
> >> > +     git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 &&
> >> > +     git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 &&
> >> > +     git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 &&
> >> > +     git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 &&
> >> > +     git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8
> >>
> >> Hrm, xargs just to avoid:
> >>
> >>     git update-ref ... $(git hash-object) ?
> >>
> >
> > I didn’t think about it, just for convenience.
>
> *nod*, Junio had a good suggestion.
>

ok.

Thanks.
--
ZheNing Hu

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Fri, Jun 18 2021, ZheNing Hu wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年6月17日周四 下午10:45写道:
>>
>>
>> On Thu, Jun 17 2021, ZheNing Hu wrote:
>>
>> >
>> > Yes, strbuf is a suitable choice when using <str,len> pair.
>> > But if replace v->s with strbuf, the possible changes will be larger.
>>
>> I for one would like to see it done that way, those changes are usually
>> easy to read. Also it seems a large part of 2/8 is extra new code
>> because we didn't do that, e.g. getting length differently if something
>> is a strbuf or not, passing char*/size_t pairs to new functions etc.
>>
>
> After some refactoring, I found that there are two problems:
> 1. There are a lot of codes like this in ref-filter to fill v->s:
>
> v->s = show_ref(...)
> v->s = copy_email(...)
>
> It is very difficult to modify here: We know that show_ref()
> or copy_email() will allocate a block of memory to v->s, but
> if v->s is a strbuf, what should we do? In copy_email(), we
> can pass the v->s to copy_email() and use strbuf_add()/strbuf_addstr()
> instead of xstrdup() and xmemdupz(). But show_ref() will call
> external functions like shorten_unambiguous_ref(), we don’t know
> whether it will return us NULL or a dynamically allocated memory.
> If continue to pass v->s to the inner function, it is not a feasible
> method. Or we can use strbuf_attach() + strlen(), I'm not sure
> this is a good method.
>
> 2. See:
>
> -       for (i = 0; i < used_atom_cnt; i++) {
> +       for (i = 0; i < used_atom_cnt; i++) {
>                 struct atom_value *v = &ref->value[i];
> -               if (v->s == NULL && used_atom[i].source == SOURCE_NONE)
> +               if (v->s.len == 0 && used_atom[i].source == SOURCE_NONE)
>                         return strbuf_addf_ret(err, -1, _("missing
> object %s for %s"),
>
> oid_to_hex(&ref->objectname), ref->refname);
>         }
>
> In the case of using strbuf, I don’t know how to distinguish between an empty
> strbuf and NULL. It can be easily distinguished by using c-style "const char*".

Yes, sometimes it's just too much of a hassle, looking at
shorten_unambiguous_ref() which returns a xstrdup()'d value that could
indeed be strbuf_attach'd. I haven't tried the conversion myself,
perhaps it's too much hassle.

Just a suggestion from reading your patch in isolation.


>> >
>> > Not python safe. See [1].
>> > Regarding the perl language, I support Junio's point of view: it can be
>> > re-supported in the future.
>>
>> Ah, I'd missed that. Anyway, if it's easy it seems you discovered that
>> Perl deals with it correctly, so we could just have it support this.
>>
>
> Well, it's ok, support for perl will be put in a separate commit.
>
>> >>
>> >> > +test_expect_success 'basic atom: refs/tags/testtag *raw' '
>> >> > +     git cat-file commit refs/tags/testtag^{} >expected &&
>> >> > +     git for-each-ref --format="%(*raw)" refs/tags/testtag >actual &&
>> >> > +     sanitize_pgp <expected >expected.clean &&
>> >> > +     sanitize_pgp <actual >actual.clean &&
>> >> > +     echo "" >>expected.clean &&
>> >>
>> >> Just "echo" will do, ditto for the rest. Also odd to go back and forth
>> >> between populating expected.clean & actual.clean.
>> >>
>> >
>> > Are you saying that sanitize_pgp is not needed?
>>
>> No that instead of:
>>
>>     echo "" >x
>>
>> You can do:
>>
>>     echo >x
>>
>> And also that going back and forth between populating different files is
>> confusing, i.e. this:
>>
>>
>>     echo a >x
>>     echo c >y
>>     echo b >>x
>>
>> is better as:
>>
>>     echo a >x
>>     echo b >>x
>>     echo c >y
>>
>>
>
> Thanks, I get what you meant now.
>
>> >>
>> >> > +test_expect_success 'set up refs pointing to binary blob' '
>> >> > +     printf "a\0b\0c" >blob1 &&
>> >> > +     printf "a\0c\0b" >blob2 &&
>> >> > +     printf "\0a\0b\0c" >blob3 &&
>> >> > +     printf "abc" >blob4 &&
>> >> > +     printf "\0 \0 \0 " >blob5 &&
>> >> > +     printf "\0 \0a\0 " >blob6 &&
>> >> > +     printf "  " >blob7 &&
>> >> > +     >blob8 &&
>> >> > +     git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 &&
>> >> > +     git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 &&
>> >> > +     git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 &&
>> >> > +     git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 &&
>> >> > +     git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 &&
>> >> > +     git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 &&
>> >> > +     git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 &&
>> >> > +     git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8
>> >>
>> >> Hrm, xargs just to avoid:
>> >>
>> >>     git update-ref ... $(git hash-object) ?
>> >>
>> >
>> > I didn’t think about it, just for convenience.
>>
>> *nod*, Junio had a good suggestion.
>>
>
> ok.
>
> Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Christian Couder wrote (reply to this):

On Fri, Jun 18, 2021 at 12:51 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:

> On Fri, Jun 18 2021, ZheNing Hu wrote:

> > After some refactoring, I found that there are two problems:
> > 1. There are a lot of codes like this in ref-filter to fill v->s:
> >
> > v->s = show_ref(...)
> > v->s = copy_email(...)
> >
> > It is very difficult to modify here: We know that show_ref()
> > or copy_email() will allocate a block of memory to v->s, but
> > if v->s is a strbuf, what should we do? In copy_email(), we
> > can pass the v->s to copy_email() and use strbuf_add()/strbuf_addstr()
> > instead of xstrdup() and xmemdupz(). But show_ref() will call
> > external functions like shorten_unambiguous_ref(), we don’t know
> > whether it will return us NULL or a dynamically allocated memory.
> > If continue to pass v->s to the inner function, it is not a feasible
> > method. Or we can use strbuf_attach() + strlen(), I'm not sure
> > this is a good method.

If you resend this patch, it might be a good idea to add a short
version of the above explanations into the commit message.

[...]

> > In the case of using strbuf, I don’t know how to distinguish between an empty
> > strbuf and NULL. It can be easily distinguished by using c-style "const char*".

Maybe this could also be part of the explanation.

> Yes, sometimes it's just too much of a hassle, looking at
> shorten_unambiguous_ref() which returns a xstrdup()'d value that could
> indeed be strbuf_attach'd. I haven't tried the conversion myself,
> perhaps it's too much hassle.
>
> Just a suggestion from reading your patch in isolation.

Yeah, thanks for the review anyway!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Bagas Sanjaya wrote (reply to this):

On 22/06/21 10.20, ZheNing Hu via GitGitGadget wrote:
> Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
> `--tcl`, `--perl` because if our binary raw data is passed to a
> variable in the host language, the host language may not support
> arbitrary binary data in the variables of its string type.

Better say:

"Note that `--format=%(raw)` cannot be used with `--python`, `--shell`, 
`--tcl`, and `--perl` because if the binary raw data is passed to a 
variable in such languages, these may not support arbitrary binary data 
in their string variable type."

-- 
An old man doll... just what I always wanted! - Clara

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Bagas Sanjaya <bagasdotme@gmail.com> 于2021年6月24日周四 下午12:14写道:
>
> On 22/06/21 10.20, ZheNing Hu via GitGitGadget wrote:
> > Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
> > `--tcl`, `--perl` because if our binary raw data is passed to a
> > variable in the host language, the host language may not support
> > arbitrary binary data in the variables of its string type.
>
> Better say:
>
> "Note that `--format=%(raw)` cannot be used with `--python`, `--shell`,
> `--tcl`, and `--perl` because if the binary raw data is passed to a
> variable in such languages, these may not support arbitrary binary data
> in their string variable type."
>

Thanks, Bagas Sanjaya, I will change all of them.


> --
> An old man doll... just what I always wanted! - Clara

--
ZheNing Hu

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Bagas Sanjaya wrote (reply to this):

On 25/06/21 23.02, ZheNing Hu via GitGitGadget wrote:
> Note that `--format=%(raw)` cannot be used with `--python`, `--shell`,
> `--tcl`, and `--perl` because if the binary raw data is passed to a
> variable in such languages, these may not support arbitrary binary data
> in their string variable type.
> 

Commit message looks OK, but...

> +Note that `--format=%(raw)` can not be used with `--python`, `--shell`, `--tcl`,
> +`--perl` because the host language may not support arbitrary binary data in the
> +variables of its string type.
> +

Seems like out of sync between commit message and the docs change above. 
Did you mean the (unsupported) host languages are Python, BASH script, 
TCL/TK, and Perl respectively? If so, the docs should say:

"Note that `--format=%(raw) can not be used with `--python`, `--shell`, 
`-tcl`, and `--perl` because such languages may not support arbitrary 
binary data in their string variable type."

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Bagas Sanjaya <bagasdotme@gmail.com> 于2021年6月26日周六 上午8:42写道:
>
> On 25/06/21 23.02, ZheNing Hu via GitGitGadget wrote:
> > Note that `--format=%(raw)` cannot be used with `--python`, `--shell`,
> > `--tcl`, and `--perl` because if the binary raw data is passed to a
> > variable in such languages, these may not support arbitrary binary data
> > in their string variable type.
> >
>
> Commit message looks OK, but...
>
> > +Note that `--format=%(raw)` can not be used with `--python`, `--shell`, `--tcl`,
> > +`--perl` because the host language may not support arbitrary binary data in the
> > +variables of its string type.
> > +
>
> Seems like out of sync between commit message and the docs change above.
> Did you mean the (unsupported) host languages are Python, BASH script,
> TCL/TK, and Perl respectively? If so, the docs should say:
>

Indeed so. I will change them too.

> "Note that `--format=%(raw) can not be used with `--python`, `--shell`,
> `-tcl`, and `--perl` because such languages may not support arbitrary
> binary data in their string variable type."
>
> Thanks.
>
> --
> An old man doll... just what I always wanted! - Clara

Thanks.
--
ZheNing Hu

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Bagas Sanjaya wrote (reply to this):

On 27/06/21 19.35, ZheNing Hu via GitGitGadget wrote:
> +Note that `--format=%(raw)` can not be used with `--python`, `--shell`, `--tcl`,
> +`--perl` because the such language may not support arbitrary binary data in their
> +string variable type.
> +

s/the such language/such languages/

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

out of the trimmed email.

The raw data in an object is `raw`.

raw:size::
The raw data size of the object.

Note that `--format=%(raw)` can not be used with `--python`, `--shell`, `--tcl`,
because such language may not support arbitrary binary data in their
string variable type.

The message in a commit or a tag object is `contents`, from which
`contents:<part>` can be used to extract various parts out of:

Expand Down
94 changes: 78 additions & 16 deletions Documentation/git-interpret-trailers.txt
Original file line number Diff line number Diff line change
Expand Up @@ -232,25 +232,38 @@ trailer.<token>.ifmissing::
that option for trailers with the specified <token>.

trailer.<token>.command::
This option can be used to specify a shell command that will
be called to automatically add or modify a trailer with the
specified <token>.
This option behaves in the same way as 'trailer.<token>.cmd', except
that it doesn't pass anything as argument to the specified command.
Instead the first occurrence of substring $ARG is replaced by the
value that would be passed as argument.
+
When this option is specified, the behavior is as if a special
'<token>=<value>' argument were added at the beginning of the command
line, where <value> is taken to be the standard output of the
specified command with any leading and trailing whitespace trimmed
off.
The 'trailer.<token>.command' option has been deprecated in favor of
'trailer.<token>.cmd' due to the fact that $ARG in the user's command is
only replaced once and that the original way of replacing $ARG is not safe.
+
If the command contains the `$ARG` string, this string will be
replaced with the <value> part of an existing trailer with the same
<token>, if any, before the command is launched.
When both 'trailer.<token>.cmd' and 'trailer.<token>.command' are given
for the same <token>, 'trailer.<token>.cmd' is used and
'trailer.<token>.command' is ignored.

trailer.<token>.cmd::
This option can be used to specify a shell command that will be called:
once to automatically add a trailer with the specified <token>, and then
each time a '--trailer <token>=<value>' argument to modify the <value> of
the trailer that this option would produce.
+
If some '<token>=<value>' arguments are also passed on the command
line, when a 'trailer.<token>.command' is configured, the command will
also be executed for each of these arguments. And the <value> part of
these arguments, if any, will be used to replace the `$ARG` string in
the command.
When the specified command is first called to add a trailer
with the specified <token>, the behavior is as if a special
'--trailer <token>=<value>' argument was added at the beginning
of the "git interpret-trailers" command, where <value>
is taken to be the standard output of the command with any
leading and trailing whitespace trimmed off.
+
If some '--trailer <token>=<value>' arguments are also passed
on the command line, the command is called again once for each
of these arguments with the same <token>. And the <value> part
of these arguments, if any, will be passed to the command as its
first argument. This way the command can produce a <value> computed
from the <value> passed in the '--trailer <token>=<value>' argument.

EXAMPLES
--------
Expand Down Expand Up @@ -333,6 +346,55 @@ subject
Fix #42
------------

* Configure a 'help' trailer with a cmd use a script `glog-find-author`
which search specified author identity from git log in git repository
and show how it works:
+
------------
$ cat ~/bin/glog-find-author
#!/bin/sh
test -n "$1" && git log --author="$1" --pretty="%an <%ae>" -1 || true
$ git config trailer.help.key "Helped-by: "
$ git config trailer.help.ifExists "addIfDifferentNeighbor"
$ git config trailer.help.cmd "~/bin/glog-find-author"
$ git interpret-trailers --trailer="help:Junio" --trailer="help:Couder" <<EOF
> subject
>
> message
>
> EOF
subject

message

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Christian Couder <christian.couder@gmail.com>
------------

* Configure a 'ref' trailer with a cmd use a script `glog-grep`
to grep last relevant commit from git log in the git repository
and show how it works:
+
------------
$ cat ~/bin/glog-grep
#!/bin/sh
test -n "$1" && git log --grep "$1" --pretty=reference -1 || true
$ git config trailer.ref.key "Reference-to: "
$ git config trailer.ref.ifExists "replace"
$ git config trailer.ref.cmd "~/bin/glog-grep"
$ git interpret-trailers --trailer="ref:Add copyright notices." <<EOF
> subject
>
> message
>
> EOF
subject

message

Reference-to: 8bc9a0c769 (Add copyright notices., 2005-04-07)
------------

* Configure a 'see' trailer with a command to show the subject of a
commit that is related, and show how it works:
+
Expand Down
6 changes: 6 additions & 0 deletions Documentation/gitattributes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1247,6 +1247,12 @@ to:
[attr]binary -diff -merge -text
------------

NOTES
-----

Git does not follow symbolic links when accessing a `.gitattributes`
file in the working tree. This keeps behavior consistent when the file
is accessed from the index or a tree versus from the filesystem.

EXAMPLES
--------
Expand Down
4 changes: 4 additions & 0 deletions Documentation/gitignore.txt
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,10 @@ not tracked by Git remain untracked.
To stop tracking a file that is currently tracked, use
'git rm --cached'.

Git does not follow symbolic links when accessing a `.gitignore` file in
the working tree. This keeps behavior consistent when the file is
accessed from the index or a tree versus from the filesystem.

EXAMPLES
--------

Expand Down
7 changes: 7 additions & 0 deletions Documentation/gitmailmap.txt
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,13 @@ this would also match the 'Commit Name <commit&#64;email.xx>' above:
Proper Name <proper@email.xx> CoMmIt NaMe <CoMmIt@EmAiL.xX>
--

NOTES
-----

Git does not follow symbolic links when accessing a `.mailmap` file in
the working tree. This keeps behavior consistent when the file is
accessed from the index or a tree versus from the filesystem.

EXAMPLES
--------

Expand Down
8 changes: 8 additions & 0 deletions Documentation/gitmodules.txt
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,14 @@ submodule.<name>.shallow::
shallow clone (with a history depth of 1) unless the user explicitly
asks for a non-shallow clone.

NOTES
-----

Git does not allow the `.gitmodules` file within a working tree to be a
symbolic link, and will refuse to check out such a tree entry. This
keeps behavior consistent when the file is accessed from the index or a
tree versus from the filesystem, and helps Git reliably enforce security
checks of the file contents.

EXAMPLES
--------
Expand Down
Loading