Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSOC][RFC] ref-filter: add %(raw:textconv) and %(raw:filters) #972

Conversation

adlternative
Copy link

@adlternative adlternative commented Jun 5, 2021

In order to let git cat-file --batch reuse ref-filter logic,
This patch, %(rest), %(raw:textconv), %(raw:filters) atoms
and --rest=<rest> option are added to ref-filter.

  • %(rest) int the format will be replaced by the <rest>
    in --rest=<rest>.
  • the <rest> in --rest=<rest> can also be used as
    the <path> for %(raw:textconv) and %(raw:filters).
  • %(raw:textconv) can show the object's contents as
    transformed by a textconv filter.
  • %(raw:filters) can show the content as converted
    by the filters configured in the current working tree for
    the given <path> (i.e. smudge filters, end-of-line
    conversion, etc).

The current series is based on 0efed94 ([GSOC] ref-filter: add %(raw) atom)
https://lore.kernel.org/git/pull.966.v2.git.1622808751.gitgitgadget@gmail.com/
If necessary, "%(rest)" part can be an independent patch later.

cc: Junio C Hamano gitster@pobox.com
cc: Christian Couder christian.couder@gmail.com
cc: Hariom Verma hariom18599@gmail.com
cc: Bagas Sanjaya bagasdotme@gmail.com

Only tag and commit objects use `grab_sub_body_contents()` to grab
object contents in the current codebase.  We want to teach the
function to also handle blobs and trees to get their raw data,
without parsing a blob (whose contents looks like a commit or a tag)
incorrectly as a commit or a tag.

Skip the block of code that is specific to handling commits and tags
early when the given object is of a wrong type to help later
addition to handle other types of objects in this function.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Add new formatting option `%(raw)`, which will print the raw
object data without any changes. It will help further to migrate
all cat-file formatting logic from cat-file to ref-filter.

The raw data of blob, tree objects may contain '\0', but most of
the logic in `ref-filter` depands on the output of the atom being
text (specifically, no embedded NULs in it).

E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()`
add the data to the buffer. The raw data of a tree object is
`100644 one\0...`, only the `100644 one` will be added to the buffer,
which is incorrect.

Therefore, add a new member in `struct atom_value`: `s_size`, which
can record raw object size, it can help us add raw object data to
the buffer or compare two buffers which contain raw object data.

Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
`--tcl`, `--perl` because if our binary raw data is passed to a variable
in the host language, the host language may not support arbitrary binary
data in the variables of its string type.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Felipe Contreras <felipe.contreras@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Based-on-patch-by: Olga Telezhnaya <olyatelezhnaya@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Use non-const ref_format in *_atom_parser(), which can help us
modify the members of ref_format in *_atom_parser().

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
In order to let "cat-file --batch=%(rest)" use the ref-filter
interface, add %(rest) atom for ref-filter and --rest option
for "git for-each-ref", "git branch", "git tag" and "git verify-tag".
`--rest` specify a string to replace %(rest) placeholders of
the --format option.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Teach grab_sub_body_contents() return value and err can help us
report more useful error messages when when add %(raw:textconv)
and %(raw:filter) later.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
In order to let `git cat-file --batch --filter` and
`git cat-file --batch --textconv` use ref-filter
interface, `%(raw:textconv)` and `%(raw:filters)` are
added to ref-filter.

`--rest` contents is used as the `<path>` for them.

`%(raw:textconv)` can show the object' contents as transformed
by a textconv filter.

`%(raw:filters)` can show the content as converted by the filters
configured in the current working tree for the given `<path>`
(i.e. smudge filters, end-of-line conversion, etc).

In addition, they cannot be used with `--python`, `--tcl`,
`--shell`, `--perl`.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
@adlternative
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 5, 2021

Submitted as pull.972.git.1622884415.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-972/adlternative/ref-filter-texconv-filters-v1

To fetch this version to local tag pr-972/adlternative/ref-filter-texconv-filters-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-972/adlternative/ref-filter-texconv-filters-v1

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 5, 2021

On the Git mailing list, Bagas Sanjaya wrote (reply to this):

Hi,

On 05/06/21 16.13, ZheNing Hu via GitGitGadget wrote:
> In order to let git cat-file --batch reuse ref-filter logic, This patch,
> %(rest), %(raw:textconv), %(raw:filters) atoms and --rest=<rest> option are
> added to ref-filter.
> 

Better say "Add ... atoms and --rest=<rest> option to ref-filter, in 
order to let git cat-file reuse ref-filter logic."

>   * %(rest) int the format will be replaced by the <rest> in --rest=<rest>.
>   * the <rest> in --rest=<rest> can also be used as the <path> for
>     %(raw:textconv) and %(raw:filters).

s/int/in/

Did you mean that %(rest) atom can also be used as <path> for the latter?

-- 
An old man doll... just what I always wanted! - Clara

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 5, 2021

User Bagas Sanjaya <bagasdotme@gmail.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 5, 2021

On the Git mailing list, ZheNing Hu wrote (reply to this):

Bagas Sanjaya <bagasdotme@gmail.com> 于2021年6月5日周六 下午6:29写道:
>
> Hi,
>
> On 05/06/21 16.13, ZheNing Hu via GitGitGadget wrote:
> > In order to let git cat-file --batch reuse ref-filter logic, This patch,
> > %(rest), %(raw:textconv), %(raw:filters) atoms and --rest=<rest> option are
> > added to ref-filter.
> >
>
> Better say "Add ... atoms and --rest=<rest> option to ref-filter, in
> order to let git cat-file reuse ref-filter logic."
>

OK.

> >   * %(rest) int the format will be replaced by the <rest> in --rest=<rest>.
> >   * the <rest> in --rest=<rest> can also be used as the <path> for
> >     %(raw:textconv) and %(raw:filters).
>
> s/int/in/
>
> Did you mean that %(rest) atom can also be used as <path> for the latter?
>

No. just the <rest> in `--rest=<rest>` will be treated as <path> for
%(raw:textconv)
and %(raw:filters).

> --
> An old man doll... just what I always wanted! - Clara

Thanks.
--
ZheNing Hu

@@ -15,7 +15,7 @@ SYNOPSIS
[--contains [<commit>]] [--no-contains [<commit>]]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Hariom verma wrote (reply to this):

Hi,

On Sat, Jun 5, 2021 at 2:43 PM ZheNing Hu via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: ZheNing Hu <adlternative@gmail.com>
>
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -670,6 +674,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
>                         N_("print only branches of the object"), parse_opt_object_name),
>                 OPT_BOOL('i', "ignore-case", &icase, N_("sorting and filtering are case insensitive")),
>                 OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
> +               OPT_STRING(0, "rest", &format.rest, N_("rest"), N_("specify %(rest) contents")),
>                 OPT_END(),
>         };
>

Although it's not related to this patch. But I just noticed an unusual
extra space(s) before the first argument of `OPT_STRING()`. (above the
line you added)

> --- a/builtin/for-each-ref.c
> +++ b/builtin/for-each-ref.c
> @@ -37,6 +37,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>
>                 OPT_GROUP(""),
>                 OPT_INTEGER( 0 , "count", &maxcount, N_("show only <n> matched refs")),
> +               OPT_STRING(  0 , "rest", &format.rest, N_("rest"), N_("specify %(rest) contents")),
>                 OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
>                 OPT__COLOR(&format.use_color, N_("respect format colors")),
>                 OPT_REF_SORT(sorting_tail),

Here too in `OPT_INTEGER()` and `OPT_INTEGER()`.

Also, I don't think these extra space(s) are intended. So you don't
need to imitate them.

> --- a/builtin/tag.c
> +++ b/builtin/tag.c
> @@ -481,6 +486,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
>                 OPT_STRING(  0 , "format", &format.format, N_("format"),
>                            N_("format to use for the output")),
>                 OPT__COLOR(&format.use_color, N_("respect format colors")),
> +               OPT_STRING(0, "rest", &format.rest, N_("rest"), N_("specify %(rest) contents")),
>                 OPT_BOOL('i', "ignore-case", &icase, N_("sorting and filtering are case insensitive")),
>                 OPT_END()
>         };

Here too in the first line.

Thanks,
Hariom

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Hi,

Hariom verma <hariom18599@gmail.com> 于2021年6月5日周六 下午11:20写道:
>
> Hi,
>
> On Sat, Jun 5, 2021 at 2:43 PM ZheNing Hu via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > --- a/builtin/branch.c
> > +++ b/builtin/branch.c
> > @@ -670,6 +674,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
> >                         N_("print only branches of the object"), parse_opt_object_name),
> >                 OPT_BOOL('i', "ignore-case", &icase, N_("sorting and filtering are case insensitive")),
> >                 OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
> > +               OPT_STRING(0, "rest", &format.rest, N_("rest"), N_("specify %(rest) contents")),
> >                 OPT_END(),
> >         };
> >
>
> Although it's not related to this patch. But I just noticed an unusual
> extra space(s) before the first argument of `OPT_STRING()`. (above the
> line you added)
>

Yeah, I noticed it too.

> > --- a/builtin/for-each-ref.c
> > +++ b/builtin/for-each-ref.c
> > @@ -37,6 +37,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
> >
> >                 OPT_GROUP(""),
> >                 OPT_INTEGER( 0 , "count", &maxcount, N_("show only <n> matched refs")),
> > +               OPT_STRING(  0 , "rest", &format.rest, N_("rest"), N_("specify %(rest) contents")),
> >                 OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
> >                 OPT__COLOR(&format.use_color, N_("respect format colors")),
> >                 OPT_REF_SORT(sorting_tail),
>
> Here too in `OPT_INTEGER()` and `OPT_INTEGER()`.
>
> Also, I don't think these extra space(s) are intended. So you don't
> need to imitate them.
>

Maybe... I find it's begin at c3428da8 (Make builtin-for-each-ref.c
use parse-opts.)
This is something from 2007. So It may be wrong to imitate its format.
But this format fix work may be can put in good-first-issue. :)

> > --- a/builtin/tag.c
> > +++ b/builtin/tag.c
> > @@ -481,6 +486,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
> >                 OPT_STRING(  0 , "format", &format.format, N_("format"),
> >                            N_("format to use for the output")),
> >                 OPT__COLOR(&format.use_color, N_("respect format colors")),
> > +               OPT_STRING(0, "rest", &format.rest, N_("rest"), N_("specify %(rest) contents")),
> >                 OPT_BOOL('i', "ignore-case", &icase, N_("sorting and filtering are case insensitive")),
> >                 OPT_END()
> >         };
>
> Here too in the first line.
>
> Thanks,
> Hariom

Thanks.
--
ZheNing Hu

@@ -15,7 +15,7 @@ SYNOPSIS
[--contains [<commit>]] [--no-contains [<commit>]]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: ZheNing Hu <adlternative@gmail.com>
>
> In order to let "cat-file --batch=%(rest)" use the ref-filter
> interface, add %(rest) atom for ref-filter and --rest option
> for "git for-each-ref", "git branch", "git tag" and "git verify-tag".
> `--rest` specify a string to replace %(rest) placeholders of
> the --format option.

I cannot think of a sane reason why we need to allow "%(rest)" in
anithing but "cat-file --batch", where a natural source of %(rest)
exists in its input stream (i.e. each input record begins with an
object name to be processed, and the rest of the record can become
"%(rest)").

The "cat-file --batch" thing is much more understandable.  You could
for example:

    git ls-files -s |
    sed -e 's/^[0-7]* \([0-9a-f]*\) [0-3]	/\1 /' |
    git cat-file --batch='%(objectname) %(objecttype) %(rest)'

to massage output from "ls-files -s" like this

    100644 c2f5fe385af1bbc161f6c010bdcf0048ab6671ed 0	.cirrus.yml
    100644 c592dda681fecfaa6bf64fb3f539eafaf4123ed8 0	.clang-format
    100644 f9d819623d832113014dd5d5366e8ee44ac9666a 0	.editorconfig
    ...

into recods of "<objectname> <path>", and each output record will
replay the <path> part from each corresponding input record.

Unless for-each-ref family of commands read the list of refs that it
shows from their standard input (they do not, and I do not think it
makes any sense to teach them to), there is no place to feed the
"rest" information that is associated with each output record.  The
only thing the commands taught about %(rest) by this patch can do is
to parrot the same string into each and every output record.  I am
not seeing what this new feature is attempting to give us.

If anything, I would imagine that it would be a very useful addition
to teach the ref-filter machinery an ability to optionally error out
depending on the caller when the caller attempts to use certain
placeholder.  Then, we can reject "git branch --sort=rest" sensibly,
instead of accepting "git branch --sort=rest --rest=constant", which
is not technically wrong per-se, but smells like a total nonsense from
practical usefulness's point of view.

> -	[--list] [<pattern>...]
> +	[--list] [<pattern>...] [--rest=<rest>]
>  'git branch' [--track | --no-track] [-f] <branchname> [<start-point>]
>  'git branch' (--set-upstream-to=<upstream> | -u <upstream>) [<branchname>]
>  'git branch' --unset-upstream [<branchname>]
> @@ -298,6 +298,10 @@ start-point is either a local or remote-tracking branch.
>  	and the object it points at.  The format is the same as
>  	that of linkgit:git-for-each-ref[1].
>  
> +--rest=<rest>::
> +	If given, the `%(rest)` placeholders in the `--format` option
> +	will be replaced.

If not given, what happens?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Junio C Hamano <gitster@pobox.com> 于2021年6月7日周一 下午1:52写道:
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > In order to let "cat-file --batch=%(rest)" use the ref-filter
> > interface, add %(rest) atom for ref-filter and --rest option
> > for "git for-each-ref", "git branch", "git tag" and "git verify-tag".
> > `--rest` specify a string to replace %(rest) placeholders of
> > the --format option.
>
> I cannot think of a sane reason why we need to allow "%(rest)" in
> anithing but "cat-file --batch", where a natural source of %(rest)
> exists in its input stream (i.e. each input record begins with an
> object name to be processed, and the rest of the record can become
> "%(rest)").
>

First of all, although %(rest) is meaningless in ordinary circumstances,
ref-filter must learn %(rest), it is impossible for us to leave the parsing
of %(rest) in cat-file.c alone.

Then, `--rest` is a strategy that make %(rest) can use in `git for-each-ref`
or `git branch -l`. As you said, it is just a boring placeholder used for string
replacement. We can make it output only empty content, If we really don’t
need `--rest`.

> The "cat-file --batch" thing is much more understandable.  You could
> for example:
>
>     git ls-files -s |
>     sed -e 's/^[0-7]* \([0-9a-f]*\) [0-3]       /\1 /' |
>     git cat-file --batch='%(objectname) %(objecttype) %(rest)'
>

s/[0-3]       /[0-3]\t/

> to massage output from "ls-files -s" like this
>
>     100644 c2f5fe385af1bbc161f6c010bdcf0048ab6671ed 0   .cirrus.yml
>     100644 c592dda681fecfaa6bf64fb3f539eafaf4123ed8 0   .clang-format
>     100644 f9d819623d832113014dd5d5366e8ee44ac9666a 0   .editorconfig
>     ...
>
> into recods of "<objectname> <path>", and each output record will
> replay the <path> part from each corresponding input record.
>

Yeah, the <path> in the input will be treated as "rest".

> Unless for-each-ref family of commands read the list of refs that it
> shows from their standard input (they do not, and I do not think it
> makes any sense to teach them to), there is no place to feed the
> "rest" information that is associated with each output record.  The
> only thing the commands taught about %(rest) by this patch can do is
> to parrot the same string into each and every output record.  I am
> not seeing what this new feature is attempting to give us.
>

"parrot the same string"? I think we should use an empty string here,
"parrot the same string" more like what the "git log --format" family does.

> If anything, I would imagine that it would be a very useful addition
> to teach the ref-filter machinery an ability to optionally error out
> depending on the caller when the caller attempts to use certain
> placeholder.  Then, we can reject "git branch --sort=rest" sensibly,
> instead of accepting "git branch --sort=rest --rest=constant", which
> is not technically wrong per-se, but smells like a total nonsense from
> practical usefulness's point of view.
>

This sounds like it might help `cat-file` to reject some useless atoms
like %(refname). So something like:

$ git for-each-ref --format="%(objectname) %(objectsize)"
--refject-atoms="%(objectsize) %(objectname)"

will fail.

"git for-each-ref" family use hardcoded to reject %(rest).
I can try to achieve this function.

> > -     [--list] [<pattern>...]
> > +     [--list] [<pattern>...] [--rest=<rest>]
> >  'git branch' [--track | --no-track] [-f] <branchname> [<start-point>]
> >  'git branch' (--set-upstream-to=<upstream> | -u <upstream>) [<branchname>]
> >  'git branch' --unset-upstream [<branchname>]
> > @@ -298,6 +298,10 @@ start-point is either a local or remote-tracking branch.
> >       and the object it points at.  The format is the same as
> >       that of linkgit:git-for-each-ref[1].
> >
> > +--rest=<rest>::
> > +     If given, the `%(rest)` placeholders in the `--format` option
> > +     will be replaced.
>
> If not given, what happens?

Will output an empty string.

Hope we can reach an agreement:
delete `--rest` and add `--reject-atoms`. ;-)

Thanks.
--
ZheNing Hu

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

ZheNing Hu <adlternative@gmail.com> 于2021年6月7日周一 下午9:02写道:
>
> Hope we can reach an agreement:
> delete `--rest` and add `--reject-atoms`. ;-)
>

I forget one thing: %(raw:textconv) and %(raw:filters) can use
the value of "--rest" as their <path>. But now if we want delete --rest,
they can not be used for "for-each-ref" family, Git will die with
"missing path for 'xxx'".

> Thanks.
> --
> ZheNing Hu

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

ZheNing Hu <adlternative@gmail.com> 于2021年6月7日周一 下午9:18写道:
>
> ZheNing Hu <adlternative@gmail.com> 于2021年6月7日周一 下午9:02写道:
> >
> > Hope we can reach an agreement:
> > delete `--rest` and add `--reject-atoms`. ;-)
> >
>
> I forget one thing: %(raw:textconv) and %(raw:filters) can use
> the value of "--rest" as their <path>. But now if we want delete --rest,
> they can not be used for "for-each-ref" family, Git will die with
> "missing path for 'xxx'".
>

If we actually delete "--rest", we will have no way to test %(raw:textconv)
and %(raw:filters)... So now I think we can keep --rest (or use
another name --path)
and let "git for-each-ref" family reject %(rest) by default.

Thanks.
--
ZheNing Hu

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

ZheNing Hu <adlternative@gmail.com> writes:

> First of all, although %(rest) is meaningless in ordinary circumstances,
> ref-filter must learn %(rest), it is impossible for us to leave the parsing
> of %(rest) in cat-file.c alone.

Oh, there is no question about that.

> Then, `--rest` is a strategy that make %(rest) can use in `git for-each-ref`
> or `git branch -l`.

If there is no need to expose %(rest) to the users who write
--format for these two commands, it would be much better to detect
attempted use of %(rest) and error out.

> This sounds like it might help `cat-file` to reject some useless atoms
> like %(refname).

Yes.

> So something like:
>
> $ git for-each-ref --format="%(objectname) %(objectsize)"
> --refject-atoms="%(objectsize) %(objectname)"
>
> will fail.

I don't understand.  Why do you even need to add --reject?  Why
would any user would want to use it with for-each-ref?

Without any end-user input, %(rest) for for-each-ref would not make
sense, and %(refname) for cat-file --batch would not make sense, I
would imagine, so there is no need to be able to tell --reject=rest
to for-each-ref.  It is not like giving --no-reject=rest to for-each-ref
and make it interpolate to an empty string is a useful feature anyway,
so I do not see a need for such an option.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

ZheNing Hu <adlternative@gmail.com> writes:

> ZheNing Hu <adlternative@gmail.com> 于2021年6月7日周一 下午9:18写道:
>>
>> ZheNing Hu <adlternative@gmail.com> 于2021年6月7日周一 下午9:02写道:
>> >
>> > Hope we can reach an agreement:
>> > delete `--rest` and add `--reject-atoms`. ;-)
>> >
>>
>> I forget one thing: %(raw:textconv) and %(raw:filters) can use
>> the value of "--rest" as their <path>. But now if we want delete --rest,
>> they can not be used for "for-each-ref" family, Git will die with
>> "missing path for 'xxx'".
>>
>
> If we actually delete "--rest", we will have no way to test %(raw:textconv)
> and %(raw:filters)... So now I think we can keep --rest (or use
> another name --path)
> and let "git for-each-ref" family reject %(rest) by default.

I didn't read beyond the %(rest) thing, but do we even need
%(raw:textconv) to begin with?  It is totally useless in the context
of for-each-ref because textconv by its nature is tied to attributes
that by definition needs a blob that is sitting at a path, but the
objects for-each-ref and friends visit are mostly commits and tags,
and even for refs that point at a blob, there isn't any "path"
information to pull attribute for.

Is that what you want to add to give "cat-file --batch"?  Even in
the context of "cat-file --batch", you can throw an object name for
a blob to the command, but there is no path for the blob (a blob can
appear at different places in different trees---think "rename), so I
am not sure what benefit you are trying to derive from it.

Thanks.


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Junio C Hamano <gitster@pobox.com> 于2021年6月8日周二 下午2:50写道:
>
> ZheNing Hu <adlternative@gmail.com> writes:
>
> > First of all, although %(rest) is meaningless in ordinary circumstances,
> > ref-filter must learn %(rest), it is impossible for us to leave the parsing
> > of %(rest) in cat-file.c alone.
>
> Oh, there is no question about that.
>

OK.

>
> I don't understand.  Why do you even need to add --reject?  Why
> would any user would want to use it with for-each-ref?
>
> Without any end-user input, %(rest) for for-each-ref would not make
> sense, and %(refname) for cat-file --batch would not make sense, I
> would imagine, so there is no need to be able to tell --reject=rest
> to for-each-ref.  It is not like giving --no-reject=rest to for-each-ref
> and make it interpolate to an empty string is a useful feature anyway,
> so I do not see a need for such an option.

Yeah... I might have thought it complicated before. We can reject %(rest) in
verify_ref_format() easily. It's like we refused --<lang> before. :)

Thanks.
--
ZheNing Hu

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Junio C Hamano <gitster@pobox.com> 于2021年6月8日周二 下午2:59写道:
>
> >
> > If we actually delete "--rest", we will have no way to test %(raw:textconv)
> > and %(raw:filters)... So now I think we can keep --rest (or use
> > another name --path)
> > and let "git for-each-ref" family reject %(rest) by default.
>
> I didn't read beyond the %(rest) thing, but do we even need
> %(raw:textconv) to begin with?  It is totally useless in the context
> of for-each-ref because textconv by its nature is tied to attributes
> that by definition needs a blob that is sitting at a path, but the
> objects for-each-ref and friends visit are mostly commits and tags,
> and even for refs that point at a blob, there isn't any "path"
> information to pull attribute for.
>

After thinking about your words, now I think maybe we can leave
%(raw:textconv) and %(raw:filter) after cat-file --batch start using
ref-filter logic, so that we can provide them with suitable tests,
and we don't need `--rest` anymore.

> Is that what you want to add to give "cat-file --batch"?  Even in
> the context of "cat-file --batch", you can throw an object name for
> a blob to the command, but there is no path for the blob (a blob can
> appear at different places in different trees---think "rename), so I
> am not sure what benefit you are trying to derive from it.
>

So I will remove the last two commits.

> Thanks.
>
>

Thanks.
--
ZheNing Hu

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes:

> Is that what you want to add to give "cat-file --batch"?  Even in
> the context of "cat-file --batch", you can throw an object name for
> a blob to the command, but there is no path for the blob (a blob can
> appear at different places in different trees---think "rename), so I
> am not sure what benefit you are trying to derive from it.

I think I kind-of see what is going on here.  There is

    git cat-file blob --textconv --path="$path" "$blob_object_name"

that allows a blob to be fed to the command, pretend as if it
appears at $path in a tree object and grab attribute for it, and
show the blob contents converted using the textconv filter.  If we
were to mimic it by extending the format based substitutions, a
design consistent with the behaviour is to teach --format=%(raw)
to show the contents after applying the textconv filter instead of
the raw blob contents.

And there is a corresponding

    git cat-file --batch --textconv

The "--path=$path" parameter is omitted when using --batch, as each
object would sit at different path in the tree (so the input stream
would be given as a run of "<blob> <path>" to give each item its own
path).

So to answer my question in the previous message, yes, this is an
attempt to support the "cat-file --textconv".  So in the context of
that command, something may need to be added.  But I do not think it
makes any sense to expose that to for-each-ref and friends, even if
we were to share the internal machinery (after all, sharing of the
internal machinery is a mere means to an end that is to make it
easier to give the same syntax and same behaviour to end users and
is not a goal itself; "because we use the same machinery, the users
have to tolerate that irrelevant %(atoms) are accepted by the parser"
is not making a good excuse for a sloppy implementation).

Having said all that, I somehow doubt that the "--batch=<format>"
was designed to interact sensibly with the "--textconv" option.
builtin/cat-file.c::expand_atom() does not know anything at all that
the data could be modified from the raw contents of the blob, so
--batch="%(contents) %(size)" --textconv, if existed, may show the
conveted contents with size of blob before conversion, or something
incoherent like that.  And if your rewrite using the shared internal
machinery results in a more coherent behaviour, that would be
excellent.  For example, we could imagine that the machinery, when
textconv (or filter) is in use, would first grab the blob contents
and run the requested conversion, and then work solely on that
conveted contents when computing what to fill with %(raw:size) and
other blob-related atoms.

Thanks.


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Junio C Hamano <gitster@pobox.com> 于2021年6月9日周三 下午3:00写道:
>
> I think I kind-of see what is going on here.  There is
>
>     git cat-file blob --textconv --path="$path" "$blob_object_name"
>
> that allows a blob to be fed to the command, pretend as if it
> appears at $path in a tree object and grab attribute for it, and
> show the blob contents converted using the textconv filter.  If we
> were to mimic it by extending the format based substitutions, a
> design consistent with the behaviour is to teach --format=%(raw)
> to show the contents after applying the textconv filter instead of
> the raw blob contents.
>

Yes, this is exactly what cat-file --textconv does.

> And there is a corresponding
>
>     git cat-file --batch --textconv
>
> The "--path=$path" parameter is omitted when using --batch, as each
> object would sit at different path in the tree (so the input stream
> would be given as a run of "<blob> <path>" to give each item its own
> path).
>

Just like let --batch omitted --path, --rest is meaningless for "for-ech-ref".

> So to answer my question in the previous message, yes, this is an
> attempt to support the "cat-file --textconv".  So in the context of
> that command, something may need to be added.  But I do not think it
> makes any sense to expose that to for-each-ref and friends, even if
> we were to share the internal machinery (after all, sharing of the
> internal machinery is a mere means to an end that is to make it
> easier to give the same syntax and same behaviour to end users and
> is not a goal itself; "because we use the same machinery, the users
> have to tolerate that irrelevant %(atoms) are accepted by the parser"
> is not making a good excuse for a sloppy implementation).
>

Because "git cat-file --batch" will only print the contents of the object once,
so when implements the function of textconv/filters in ref-filter,
we should really consider whether we should let something like
"%(raw) %(raw) %(raw) %(raw:size)" all pass the conversion of textconv/filters.
If it is my previous %(raw:textconv) or %(raw:filter), they can only print
the converted content separately, and we need:

$ git for-ecah-ref --format="%(raw:filters) %(raw:filters)
%(raw:filters) %(raw:filters:size)"

As you said it might be too complicated for the user....

> Having said all that, I somehow doubt that the "--batch=<format>"
> was designed to interact sensibly with the "--textconv" option.
> builtin/cat-file.c::expand_atom() does not know anything at all that
> the data could be modified from the raw contents of the blob, so
> --batch="%(contents) %(size)" --textconv, if existed, may show the
> conveted contents with size of blob before conversion, or something
> incoherent like that.  And if your rewrite using the shared internal
> machinery results in a more coherent behaviour, that would be
> excellent.  For example, we could imagine that the machinery, when
> textconv (or filter) is in use, would first grab the blob contents
> and run the requested conversion, and then work solely on that
> conveted contents when computing what to fill with %(raw:size) and
> other blob-related atoms.
>

And with your --textconv/--filters, we only need:

$ git for-ecah-ref --format="%(raw) %(raw) %(raw) %(raw:size)" --filters

This will be more concise for users. I will try to build --filters, --textconv
for ref-filter . But as stated in the previous reply, it needs to be placed
after the transplant of cat-file --batch (Because --path is useless for
for-each-ref, and at the same time we need proper testing for
--filter/--textconv.)

Thanks, your reply is very reasonable,
--
ZheNing Hu

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 7, 2021

On the Git mailing list, Junio C Hamano wrote (reply to this):

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> The current series is based on 0efed9435 ([GSOC] ref-filter: add %(raw)
> atom)

I do not have that commit object, but these six patches include the
two commits that are %(raw) and %(raw:size), so I'll just discard
the old round that wasn't based on the atom-type stuff and queue
these six as a single series.

As I already said, I am not sure how %(rest) makes any sense outside
the context of "cat-file --batch"; I suspect it would make more sense
to make it easier to arrange certain placeholders to error out when
used in a context where they do not make sense (e.g. use of --rest
in "git branch --list").

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 7, 2021

On the Git mailing list, ZheNing Hu wrote (reply to this):

Junio C Hamano <gitster@pobox.com> 于2021年6月7日周一 下午1:56写道:
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > The current series is based on 0efed9435 ([GSOC] ref-filter: add %(raw)
> > atom)
>
> I do not have that commit object, but these six patches include the
> two commits that are %(raw) and %(raw:size), so I'll just discard
> the old round that wasn't based on the atom-type stuff and queue
> these six as a single series.
>

Well, it is a commit that has not been sent to the mailing list. But it’s okay
to treat the new 6 commits as a new patch series.

> As I already said, I am not sure how %(rest) makes any sense outside
> the context of "cat-file --batch"; I suspect it would make more sense
> to make it easier to arrange certain placeholders to error out when
> used in a context where they do not make sense (e.g. use of --rest
> in "git branch --list").
>

I agree.

--
ZheNing Hu

@@ -235,6 +235,15 @@ and `date` to extract the named component. For email fields (`authoremail`,
without angle brackets, and `:localpart` to get the part before the `@` symbol
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, struct ref_array_item *b)
>  {
>  	struct atom_value *va, *vb;
> @@ -2389,10 +2452,30 @@ static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, stru
>  	} else if (s->sort_flags & REF_SORTING_VERSION) {
>  		cmp = versioncmp(va->s, vb->s);
>  	} else if (cmp_type == FIELD_STR) {
> -		int (*cmp_fn)(const char *, const char *);
> -		cmp_fn = s->sort_flags & REF_SORTING_ICASE
> -			? strcasecmp : strcmp;
> -		cmp = cmp_fn(va->s, vb->s);
> +		if (va->s_size == ATOM_VALUE_S_SIZE_INIT &&
> +		    vb->s_size == ATOM_VALUE_S_SIZE_INIT) {
> +			int (*cmp_fn)(const char *, const char *);
> +			cmp_fn = s->sort_flags & REF_SORTING_ICASE
> +				? strcasecmp : strcmp;
> +			cmp = cmp_fn(va->s, vb->s);
> +		} else {
> +			int (*cmp_fn)(const void *, const void *, size_t);
> +			cmp_fn = s->sort_flags & REF_SORTING_ICASE
> +				? memcasecmp : memcmp;
> +			size_t a_size = va->s_size == ATOM_VALUE_S_SIZE_INIT ?
> +					strlen(va->s) : va->s_size;
> +			size_t b_size = vb->s_size == ATOM_VALUE_S_SIZE_INIT ?
> +					strlen(vb->s) : vb->s_size;

This breaks -Wdecl-after-stmt.  A possible fix below.

diff --git a/ref-filter.c b/ref-filter.c
index 46aec291de..648f9cabff 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2459,13 +2459,13 @@ static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, stru
 				? strcasecmp : strcmp;
 			cmp = cmp_fn(va->s, vb->s);
 		} else {
-			int (*cmp_fn)(const void *, const void *, size_t);
-			cmp_fn = s->sort_flags & REF_SORTING_ICASE
+			size_t a_size = va->s_size == ATOM_VALUE_S_SIZE_INIT
+					? strlen(va->s) : va->s_size;
+			size_t b_size = vb->s_size == ATOM_VALUE_S_SIZE_INIT
+					? strlen(vb->s) : vb->s_size;
+			int (*cmp_fn)(const void *, const void *, size_t) =
+				s->sort_flags & REF_SORTING_ICASE
 				? memcasecmp : memcmp;
-			size_t a_size = va->s_size == ATOM_VALUE_S_SIZE_INIT ?
-					strlen(va->s) : va->s_size;
-			size_t b_size = vb->s_size == ATOM_VALUE_S_SIZE_INIT ?
-					strlen(vb->s) : vb->s_size;
 
 			cmp = cmp_fn(va->s, vb->s, b_size > a_size ?
 				     a_size : b_size);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Junio C Hamano <gitster@pobox.com> 于2021年6月8日周二 下午1:07写道:
>
> This breaks -Wdecl-after-stmt.  A possible fix below.
>
> diff --git a/ref-filter.c b/ref-filter.c
> index 46aec291de..648f9cabff 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2459,13 +2459,13 @@ static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, stru
>                                 ? strcasecmp : strcmp;
>                         cmp = cmp_fn(va->s, vb->s);
>                 } else {
> -                       int (*cmp_fn)(const void *, const void *, size_t);
> -                       cmp_fn = s->sort_flags & REF_SORTING_ICASE
> +                       size_t a_size = va->s_size == ATOM_VALUE_S_SIZE_INIT
> +                                       ? strlen(va->s) : va->s_size;
> +                       size_t b_size = vb->s_size == ATOM_VALUE_S_SIZE_INIT
> +                                       ? strlen(vb->s) : vb->s_size;
> +                       int (*cmp_fn)(const void *, const void *, size_t) =
> +                               s->sort_flags & REF_SORTING_ICASE
>                                 ? memcasecmp : memcmp;
> -                       size_t a_size = va->s_size == ATOM_VALUE_S_SIZE_INIT ?
> -                                       strlen(va->s) : va->s_size;
> -                       size_t b_size = vb->s_size == ATOM_VALUE_S_SIZE_INIT ?
> -                                       strlen(vb->s) : vb->s_size;
>
>                         cmp = cmp_fn(va->s, vb->s, b_size > a_size ?
>                                      a_size : b_size);

You are right.

Thanks.
--
ZheNing Hu

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 8, 2021

This branch is now known as zh/ref-filter-raw-data.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 8, 2021

This patch series was integrated into seen via git@418aa44.

@gitgitgadget gitgitgadget bot added the seen label Jun 8, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 8, 2021

On the Git mailing list, Junio C Hamano wrote (reply to this):

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> ZheNing Hu (6):
>   [GSOC] ref-filter: add obj-type check in grab contents
>   [GSOC] ref-filter: add %(raw) atom
>   [GSOC] ref-filter: use non-const ref_format in *_atom_parser()
>   [GSOC] ref-filter: add %(rest) atom and --rest option
>   [GSOC] ref-filter: teach grab_sub_body_contents() return value and err
>   [GSOC] ref-filter: add %(raw:textconv) and %(raw:filters)

I haven't gotten around looking at anything after the %(rest) one,
but 

https://github.com/git/git/runs/2770688471?check_suite_focus=true

seems to tell us that there is "size_t *" vs "ulong *" type
confusion, possibly around the textconv thing.


@gitgitgadget
Copy link

gitgitgadget bot commented Jun 8, 2021

On the Git mailing list, ZheNing Hu wrote (reply to this):

Junio C Hamano <gitster@pobox.com> 于2021年6月8日周二 下午2:42写道:
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > ZheNing Hu (6):
> >   [GSOC] ref-filter: add obj-type check in grab contents
> >   [GSOC] ref-filter: add %(raw) atom
> >   [GSOC] ref-filter: use non-const ref_format in *_atom_parser()
> >   [GSOC] ref-filter: add %(rest) atom and --rest option
> >   [GSOC] ref-filter: teach grab_sub_body_contents() return value and err
> >   [GSOC] ref-filter: add %(raw:textconv) and %(raw:filters)
>
> I haven't gotten around looking at anything after the %(rest) one,
> but
>
> https://github.com/git/git/runs/2770688471?check_suite_focus=true
>
> seems to tell us that there is "size_t *" vs "ulong *" type
> confusion, possibly around the textconv thing.
>
>

ok, I will change it when we use %(raw:textconv) next time.

Thanks.
--
ZheNing Hu

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 30, 2021

There was a status update in the "Graduated to 'master'" section about the branch zh/ref-filter-raw-data on the Git mailing list:

Originally merged to 'next' on 2021-08-04

Prepare the "ref-filter" machinery that drives the "--format"
option of "git for-each-ref" and its friends to be used in "git
cat-file --batch".

@dscho dscho removed the seen label Dec 14, 2021
@dscho dscho closed this May 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants