[GSOC][RFC] cat-file: reuse ref-filter logic #980

adlternative · 2021-06-12T10:31:16Z

This patch series make cat-file reuse ref-filter logic.

Change from last version:

Amend part of the description of git for-each-ref.txt.
Modify commit messages.
Rebase on zh/cat-file-batch-fix.

cc: Junio C Hamano gitster@pobox.com
cc: Christian Couder christian.couder@gmail.com
cc: Hariom Verma hariom18599@gmail.com
cc: Bagas Sanjaya bagasdotme@gmail.com
cc: Jeff King peff@peff.net
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com

adlternative · 2021-06-12T11:13:07Z

/submit

gitgitgadget · 2021-06-12T11:14:46Z

Submitted as pull.980.git.1623496458.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-980/adlternative/cat-file-batch-refactor-v1

To fetch this version to local tag pr-980/adlternative/cat-file-batch-refactor-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-980/adlternative/cat-file-batch-refactor-v1

gitgitgadget · 2021-06-12T20:11:53Z

ref-filter.c

@@ -144,6 +144,7 @@ enum atom_type {
 	ATOM_BODY,


On the Git mailing list, Christian Couder wrote (reply to this):

On Sat, Jun 12, 2021 at 1:14 PM ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: ZheNing Hu <adlternative@gmail.com> > > Let `populate_value()`, `get_ref_atom_value()` and > `format_ref_array_item()` get the return value of `get_value()` > correctly. This can help us later let `cat-file --batch` get the > correct error message and return value of `get_value()`. Is it get_object() or get_value()?

On the Git mailing list, ZheNing Hu wrote (reply to this):

Christian Couder <christian.couder@gmail.com> 于2021年6月13日周日上午4:09写道： > > On Sat, Jun 12, 2021 at 1:14 PM ZheNing Hu via GitGitGadget > <gitgitgadget@gmail.com> wrote: > > > > From: ZheNing Hu <adlternative@gmail.com> > > > > Let `populate_value()`, `get_ref_atom_value()` and > > `format_ref_array_item()` get the return value of `get_value()` > > correctly. This can help us later let `cat-file --batch` get the > > correct error message and return value of `get_value()`. > > Is it get_object() or get_value()? Oh, it's get_object(). Thanks for pointing out:) -- ZheNing Hu

adlternative · 2021-06-15T13:27:43Z

/submit

gitgitgadget · 2021-06-15T13:29:39Z

Submitted as pull.980.v2.git.1623763746.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-980/adlternative/cat-file-batch-refactor-v2

To fetch this version to local tag pr-980/adlternative/cat-file-batch-refactor-v2:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-980/adlternative/cat-file-batch-refactor-v2

gitgitgadget · 2021-06-16T07:33:08Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This patch series make cat-file reuse ref-filter logic, which based on
> 5a5b5f78 ([GSOC] ref-filter: add %(rest) atom)

Hmph, does anybody have 5a5b5f78?

The way to deal with this and avoid resending the same patches
(assuming that this is not a 9-patch series, but only 5 of them are
new) is to rebase your topic on 723bc66d (ref-filter: add %(rest)
atom, 2021-06-09), which will allow you to discard the 4 earlier
patches, and force push, with base set to 723bc66d, I think, but I
am not a GGG user, so there may need an extra step or two on top of
that.

gitgitgadget · 2021-06-16T07:39:04Z

ref-filter.c

@@ -144,6 +144,7 @@ enum atom_type {
 	ATOM_BODY,


On the Git mailing list, Junio C Hamano wrote (reply to this):

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: ZheNing Hu <adlternative@gmail.com> > > Let `populate_value()`, `get_ref_atom_value()` and > `format_ref_array_item()` get the return value of `get_object()` > correctly. The "get" the value correctly, I think. What you are teaching them is to pass the return value from get_object() through the callchain to their callers. The readers will be helped if you say what kind of errors get_object() wants to tell its callers, not just "-1" is for error, which is what populate_value() assumes to be sufficient. In other words, which non-zero returns from get_object() are interesting and why? > @@ -1997,9 +1997,11 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err) > static int get_ref_atom_value(struct ref_array_item *ref, int atom, > struct atom_value **v, struct strbuf *err) > { > + int ret = 0; > + > if (!ref->value) { > - if (populate_value(ref, err)) > - return -1; > + if ((ret = populate_value(ref, err))) > + return ret; The new variable only needs to be in this scope, and does not have to be shown to the entire function. > @@ -2573,6 +2575,7 @@ int format_ref_array_item(struct ref_array_item *info, > { > const char *cp, *sp, *ep; > struct ref_formatting_state state = REF_FORMATTING_STATE_INIT; > + int ret = 0; This is dubious... > state.quote_style = format->quote_style; > push_stack_element(&state.stack); > @@ -2585,10 +2588,10 @@ int format_ref_array_item(struct ref_array_item *info, > if (cp < sp) > append_literal(cp, sp, &state); > pos = parse_ref_filter_atom(format, sp + 2, ep, error_buf); > - if (pos < 0 || get_ref_atom_value(info, pos, &atomv, error_buf) || > + if (pos < 0 || (ret = get_ref_atom_value(info, pos, &atomv, error_buf)) || Here, if "ret" gets assigned any non-zero value, the condition is satisfied, and ... > atomv->handler(atomv, &state, error_buf)) { > pop_stack_element(&state.stack); > - return -1; > + return ret ? ret : -1; ... the control flow will leave this function. Therefore, ... > } > } > if (*cp) { > @@ -2610,7 +2613,7 @@ int format_ref_array_item(struct ref_array_item *info, > } > strbuf_addbuf(final_buf, &state.stack->output); > pop_stack_element(&state.stack); > - return 0; > + return ret; ... at this point, "ret" can never be anything other than zero. Am I misreading the patch? If I am not misreading the patch, then "ret" does not have to be globally visible in this function---it can have the same scope as "pos". > } > > void pretty_print_ref(const char *name, const struct object_id *oid,

On the Git mailing list, ZheNing Hu wrote (reply to this):

Junio C Hamano <gitster@pobox.com> 于2021年6月16日周三下午3:36写道： > > "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: ZheNing Hu <adlternative@gmail.com> > > > > Let `populate_value()`, `get_ref_atom_value()` and > > `format_ref_array_item()` get the return value of `get_object()` > > correctly. > > The "get" the value correctly, I think. What you are teaching them > is to pass the return value from get_object() through the callchain > to their callers. > Yes, this is exactly what I meant. > The readers will be helped if you say what kind of errors > get_object() wants to tell its callers, not just "-1" is for error, > which is what populate_value() assumes to be sufficient. In other > words, which non-zero returns from get_object() are interesting and > why? > As stated in 765337a, We can just print the error without exiting if the return value of format_ref_array_item() is greater than 0. Therefore, the current patch is to make get_object() return a value other than -1 when an error occurs. > > @@ -1997,9 +1997,11 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err) > > static int get_ref_atom_value(struct ref_array_item *ref, int atom, > > struct atom_value **v, struct strbuf *err) > > { > > + int ret = 0; > > + > > if (!ref->value) { > > - if (populate_value(ref, err)) > > - return -1; > > + if ((ret = populate_value(ref, err))) > > + return ret; > > The new variable only needs to be in this scope, and does not have > to be shown to the entire function. > Makes sense. > > @@ -2573,6 +2575,7 @@ int format_ref_array_item(struct ref_array_item *info, > > { > > const char *cp, *sp, *ep; > > struct ref_formatting_state state = REF_FORMATTING_STATE_INIT; > > + int ret = 0; > > This is dubious... > > > state.quote_style = format->quote_style; > > push_stack_element(&state.stack); > > @@ -2585,10 +2588,10 @@ int format_ref_array_item(struct ref_array_item *info, > > if (cp < sp) > > append_literal(cp, sp, &state); > > pos = parse_ref_filter_atom(format, sp + 2, ep, error_buf); > > - if (pos < 0 || get_ref_atom_value(info, pos, &atomv, error_buf) || > > + if (pos < 0 || (ret = get_ref_atom_value(info, pos, &atomv, error_buf)) || > > Here, if "ret" gets assigned any non-zero value, the condition is > satisfied, and ... > > > atomv->handler(atomv, &state, error_buf)) { > > pop_stack_element(&state.stack); > > - return -1; > > + return ret ? ret : -1; > > ... the control flow will leave this function. Therefore, ... > > > } > > } > > if (*cp) { > > @@ -2610,7 +2613,7 @@ int format_ref_array_item(struct ref_array_item *info, > > } > > strbuf_addbuf(final_buf, &state.stack->output); > > pop_stack_element(&state.stack); > > - return 0; > > + return ret; > > ... at this point, "ret" can never be anything other than zero. Am > I misreading the patch? > > If I am not misreading the patch, then "ret" does not have to be > globally visible in this function---it can have the same scope as > "pos". > You are right, It is correct to only return 0 here at the moment. Thanks. -- ZheNing Hu

gitgitgadget · 2021-06-16T07:51:16Z

ref-filter.c

@@ -144,6 +144,7 @@ enum atom_type {
 	ATOM_BODY,


On the Git mailing list, Junio C Hamano wrote (reply to this):

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: ZheNing Hu <adlternative@gmail.com> > > Introduce free_array_item_internal() for freeing ref_array_item value. > It will be called internally by free_array_item(), and it will help > `cat-file --batch` free ref_array_item's memory later. As a file local static function, the horrible name free_array_item() was tolerable. But before exposing a name like that to the outside world, think twice if that is specific enough, and it is not. There are 47 different kinds of "array"s we use in the system, but this new helper function only works with ref_array_item and not on items in any other kinds of arrays. > -/* Free memory allocated for a ref_array_item */ > -static void free_array_item(struct ref_array_item *item) > +void free_array_item_internal(struct ref_array_item *item) > { And "internal" is a horrible name to have as an external name. You probably can come up with a more appropriate name when you imagine yourself explaining to somebody who is relatively new to this part of the codebase what the difference between free_array_item() and this new helper is, where the difference comes from, why the symref member (and no other member) is so special, etc. I _think_ what is special is not the .symref but is the .value field, IOW, you are trying to come up with an interface to free the value part of ref_array_item without touching other things. But it is not helpful at all to readers if you do not explain why you want to do so. Why is the .value member so special? The ability to clear only the .value member without touching other members is useful because ...? In any case, assuming that you'd establish why the .value member is so special to deserve an externally callable function, when external callers do not have to be able to free the item as a whole (i.e. free_array_item() is still file-scope static), in the proposed log message in an updated patch, I would imagine that free_ref_array_item_value() would be a more suitable name than the _internal thing. When it happens, you might want to rename the static one to free_ref_array_item() to match, even if it does not have external callers.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Junio C Hamano <gitster@pobox.com> 于2021年6月16日周三下午3:49写道： > > "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: ZheNing Hu <adlternative@gmail.com> > > > > Introduce free_array_item_internal() for freeing ref_array_item value. > > It will be called internally by free_array_item(), and it will help > > `cat-file --batch` free ref_array_item's memory later. > > As a file local static function, the horrible name free_array_item() > was tolerable. But before exposing a name like that to the outside > world, think twice if that is specific enough, and it is not. There > are 47 different kinds of "array"s we use in the system, but this > new helper function only works with ref_array_item and not on items > in any other kinds of arrays. > Well, this is indeed ignored by me. So free_array_item() --> free_ref_array_item() and free_array_item() --> free_ref_array_item(). > > -/* Free memory allocated for a ref_array_item */ > > -static void free_array_item(struct ref_array_item *item) > > +void free_array_item_internal(struct ref_array_item *item) > > { > > And "internal" is a horrible name to have as an external name. You > probably can come up with a more appropriate name when you imagine > yourself explaining to somebody who is relatively new to this part > of the codebase what the difference between free_array_item() and > this new helper is, where the difference comes from, why the symref > member (and no other member) is so special, etc. > Yeah, "internal" is an incorrect naming. The main purpose here is to only free a ref_array_item's value. > I _think_ what is special is not the .symref but is the .value > field, IOW, you are trying to come up with an interface to free the > value part of ref_array_item without touching other things. But it > is not helpful at all to readers if you do not explain why you want > to do so. Why is the .value member so special? The ability to > clear only the .value member without touching other members is useful > because ...? > Because batch_object_write() use a ref_array_item which is allocated on the stack, and it's member symref is not used at all. ref_array_item's symref and itself do not need free. The original free_array_item() will free all dynamically allocated content in ref_array_item. It cannot meet our requirements at this time: only free the ref_array_item's value. > In any case, assuming that you'd establish why the .value member is > so special to deserve an externally callable function, when external > callers do not have to be able to free the item as a whole (i.e. > free_array_item() is still file-scope static), in the proposed log > message in an updated patch, I would imagine that > > free_ref_array_item_value() > > would be a more suitable name than the _internal thing. When it > happens, you might want to rename the static one to > free_ref_array_item() to match, even if it does not have external > callers. Make sence. -- ZheNing Hu

gitgitgadget · 2021-06-17T06:11:27Z

On the Git mailing list, ZheNing Hu wrote (reply to this):

Junio C Hamano <gitster@pobox.com> 于2021年6月16日周三 下午3:29写道：
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > This patch series make cat-file reuse ref-filter logic, which based on
> > 5a5b5f78 ([GSOC] ref-filter: add %(rest) atom)
>
> Hmph, does anybody have 5a5b5f78?
>
> The way to deal with this and avoid resending the same patches
> (assuming that this is not a 9-patch series, but only 5 of them are
> new) is to rebase your topic on 723bc66d (ref-filter: add %(rest)
> atom, 2021-06-09), which will allow you to discard the 4 earlier
> patches, and force push, with base set to 723bc66d, I think, but I
> am not a GGG user, so there may need an extra step or two on top of
> that.

Oh, 5a5b5f78 is in gitgitgadget.git and 723bc66d in git.git, their commit
messages are different. Later I will rebase my new patch to zh/xxx.

Thanks.
--
ZheNing Hu

gitgitgadget · 2021-06-17T07:08:27Z

ref-filter.c

@@ -1356,11 +1356,12 @@ static void append_lines(struct strbuf *out, const char *buf, unsigned long size
 }


On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Sat, Jun 12 2021, ZheNing Hu via GitGitGadget wrote: > From: ZheNing Hu <adlternative@gmail.com> > > Only tag and commit objects use `grab_sub_body_contents()` to grab > object contents in the current codebase. We want to teach the > function to also handle blobs and trees to get their raw data, > without parsing a blob (whose contents looks like a commit or a tag) > incorrectly as a commit or a tag. > > Skip the block of code that is specific to handling commits and tags > early when the given object is of a wrong type to help later > addition to handle other types of objects in this function. > > Mentored-by: Christian Couder <christian.couder@gmail.com> > Mentored-by: Hariom Verma <hariom18599@gmail.com> > Helped-by: Junio C Hamano <gitster@pobox.com> > Signed-off-by: ZheNing Hu <adlternative@gmail.com> > --- > ref-filter.c | 24 +++++++++++++++--------- > 1 file changed, 15 insertions(+), 9 deletions(-) > > diff --git a/ref-filter.c b/ref-filter.c > index 4db0e40ff4c6..5cee6512fbaf 100644 > --- a/ref-filter.c > +++ b/ref-filter.c > @@ -1356,11 +1356,12 @@ static void append_lines(struct strbuf *out, const char *buf, unsigned long size > } > > /* See grab_values */ > -static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf) > +static void grab_sub_body_contents(struct atom_value *val, int deref, struct expand_data *data) > { > int i; > const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL; > size_t sublen = 0, bodylen = 0, nonsiglen = 0, siglen = 0; > + void *buf = data->content; > > for (i = 0; i < used_atom_cnt; i++) { > struct used_atom *atom = &used_atom[i]; > @@ -1371,10 +1372,13 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf) > continue; > if (deref) > name++; > - if (strcmp(name, "body") && > - !starts_with(name, "subject") && > - !starts_with(name, "trailers") && > - !starts_with(name, "contents")) > + > + if ((data->type != OBJ_TAG && > + data->type != OBJ_COMMIT) || > + (strcmp(name, "body") && > + !starts_with(name, "subject") && > + !starts_with(name, "trailers") && > + !starts_with(name, "contents"))) We have 4 "real" object types, commit, tree, blob, tag. Do you really mean "not tag or commit" here, don't you mean "is tree or blob" instead? I.e. do we really want to pass OBJ_NONE etc. here? > continue; > if (!subpos) > find_subpos(buf, > @@ -1438,17 +1442,19 @@ static void fill_missing_values(struct atom_value *val) > * pointed at by the ref itself; otherwise it is the object the > * ref (which is a tag) refers to. > */ > -static void grab_values(struct atom_value *val, int deref, struct object *obj, void *buf) > +static void grab_values(struct atom_value *val, int deref, struct object *obj, struct expand_data *data) > { > + void *buf = data->content; > + > switch (obj->type) { > case OBJ_TAG: > grab_tag_values(val, deref, obj); > - grab_sub_body_contents(val, deref, buf); > + grab_sub_body_contents(val, deref, data); > grab_person("tagger", val, deref, buf); > break; > case OBJ_COMMIT: > grab_commit_values(val, deref, obj); > - grab_sub_body_contents(val, deref, buf); > + grab_sub_body_contents(val, deref, data); > grab_person("author", val, deref, buf); > grab_person("committer", val, deref, buf); > break; > @@ -1678,7 +1684,7 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj > return strbuf_addf_ret(err, -1, _("parse_object_buffer failed on %s for %s"), > oid_to_hex(&oi->oid), ref->refname); > } > - grab_values(ref->value, deref, *obj, oi->content); > + grab_values(ref->value, deref, *obj, oi); > } > > grab_common_values(ref->value, deref, oi);

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> - if (strcmp(name, "body") && >> - !starts_with(name, "subject") && >> - !starts_with(name, "trailers") && >> - !starts_with(name, "contents")) >> + >> + if ((data->type != OBJ_TAG && >> + data->type != OBJ_COMMIT) || >> + (strcmp(name, "body") && >> + !starts_with(name, "subject") && >> + !starts_with(name, "trailers") && >> + !starts_with(name, "contents"))) > > We have 4 "real" object types, commit, tree, blob, tag. Do you really > mean "not tag or commit" here, don't you mean "is tree or blob" instead? > I.e. do we really want to pass OBJ_NONE etc. here? > >> continue; If somebody throws OBJ_NONE at us by mistake, we do not want to handle such an object and try to extract the subject member from it anyway, no? The intent of the code here, before the patch, is that "what we do after the control flow passes this point is about the body, subject, trailers, and contents request, so everybody else should go to the next iteration". The caller used to give us an object compatible with these four types of requests, now the caller may throw others, hence "by the way, we know these four kinds of requests make sense only for tags and commits, so everybody else should go to the next iteration, too" would be a natural thing to add. So in that sense, I prefer it over "we know these four types of requests do not make sense for blobs and trees".

gitgitgadget · 2021-06-17T07:08:30Z

User Ævar Arnfjörð Bjarmason <avarab@gmail.com> has been added to the cc: list.

gitgitgadget · 2021-06-17T07:20:24Z

Documentation/git-for-each-ref.txt

@@ -235,6 +235,15 @@ and `date` to extract the named component.  For email fields (`authoremail`,
 without angle brackets, and `:localpart` to get the part before the `@` symbol


On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Sat, Jun 12 2021, ZheNing Hu via GitGitGadget wrote: > From: ZheNing Hu <adlternative@gmail.com> > > Add new formatting option `%(raw)`, which will print the raw > object data without any changes. It will help further to migrate > all cat-file formatting logic from cat-file to ref-filter. Nice goal and feature to have. > The raw data of blob, tree objects may contain '\0', but most of > the logic in `ref-filter` depends on the output of the atom being > text (specifically, no embedded NULs in it). > > E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()` > add the data to the buffer. The raw data of a tree object is > `100644 one\0...`, only the `100644 one` will be added to the buffer, > which is incorrect. > > Therefore, add a new member in `struct atom_value`: `s_size`, which > can record raw object size, it can help us add raw object data to > the buffer or compare two buffers which contain raw object data. Most of the functions that deal with this already use a strbuf in some way, before we had a const char *, now there's a size_t to go along with it, why not simply use a strbuf in the struct for the data? You'll then get the size and \0 handling for free, and any functions to deal with conversion can stick to the strbuf API, there seems to be a lot of back and forth now. > Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`, > `--tcl`, `--perl` because if our binary raw data is passed to a variable > in the host language, the host language may not support arbitrary binary > data in the variables of its string type. Perl at least deals with that just fine, and to the extent that it doesn't any new problems here would have nothing to do with \0 being in the data. Perl doesn't have a notion of "binary has \0 in it", it always supports \0, it has a notion of "is it utf-8 or not?", so any encoding problems wouldn't be new. I'd think that the same would be true of Python, but I'm not sure. > +test_expect_success 'basic atom: refs/tags/testtag *raw' ' > + git cat-file commit refs/tags/testtag^{} >expected && > + git for-each-ref --format="%(*raw)" refs/tags/testtag >actual && > + sanitize_pgp <expected >expected.clean && > + sanitize_pgp <actual >actual.clean && > + echo "" >>expected.clean && Just "echo" will do, ditto for the rest. Also odd to go back and forth between populating expected.clean & actual.clean. > +test_expect_success 'set up refs pointing to binary blob' ' > + printf "a\0b\0c" >blob1 && > + printf "a\0c\0b" >blob2 && > + printf "\0a\0b\0c" >blob3 && > + printf "abc" >blob4 && > + printf "\0 \0 \0 " >blob5 && > + printf "\0 \0a\0 " >blob6 && > + printf " " >blob7 && > + >blob8 && > + git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 && > + git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 && > + git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 && > + git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 && > + git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 && > + git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 && > + git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 && > + git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8 Hrm, xargs just to avoid: git update-ref ... $(git hash-object) ? > +test_expect_success '%(raw) with --python must failed' ' > + test_must_fail git for-each-ref --format="%(raw)" --python > +' > + > +test_expect_success '%(raw) with --tcl must failed' ' > + test_must_fail git for-each-ref --format="%(raw)" --tcl > +' > + > +test_expect_success '%(raw) with --perl must failed' ' > + test_must_fail git for-each-ref --format="%(raw)" --perl > +' > + > +test_expect_success '%(raw) with --shell must failed' ' > + test_must_fail git for-each-ref --format="%(raw)" --shell > +' > + > +test_expect_success '%(raw) with --shell and --sort=raw must failed' ' > + test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell > +' s/must failed/must fail/, but see question above about encoding in these languages...

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`, >> `--tcl`, `--perl` because if our binary raw data is passed to a variable >> in the host language, the host language may not support arbitrary binary >> data in the variables of its string type. > > Perl at least deals with that just fine, and to the extent that it > doesn't any new problems here would have nothing to do with \0 being in > the data. Perl doesn't have a notion of "binary has \0 in it", it always > supports \0, it has a notion of "is it utf-8 or not?", so any encoding > problems wouldn't be new. I'd think that the same would be true of > Python, but I'm not sure. During an earlier iteration long time ago, as we knew Perl is capable of handling sequence of bytes including NUL, it was decided to start more strict by rejecting binary for all languages, which can later be loosened, to limit the scope of the initial implementation. >> + git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 && >> + git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 && >> + git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 && >> + git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 && >> + git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 && >> + git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 && >> + git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 && >> + git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8 > > Hrm, xargs just to avoid: > > git update-ref ... $(git hash-object) ? That's horrible. Thanks for noticing. We'd want to catch segfaults from both hash-object and update-ref. One way to do so may be O=$(git hash-object -w blob1) && git update-ref refs/myblobs/blob1 "$O" Thanks for a review.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年6月17日周四下午3:16写道： > > > The raw data of blob, tree objects may contain '\0', but most of > > the logic in `ref-filter` depends on the output of the atom being > > text (specifically, no embedded NULs in it). > > > > E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()` > > add the data to the buffer. The raw data of a tree object is > > `100644 one\0...`, only the `100644 one` will be added to the buffer, > > which is incorrect. > > > > Therefore, add a new member in `struct atom_value`: `s_size`, which > > can record raw object size, it can help us add raw object data to > > the buffer or compare two buffers which contain raw object data. > > Most of the functions that deal with this already use a strbuf in some > way, before we had a const char *, now there's a size_t to go along with > it, why not simply use a strbuf in the struct for the data? You'll then > get the size and \0 handling for free, and any functions to deal with > conversion can stick to the strbuf API, there seems to be a lot of back > and forth now. > Yes, strbuf is a suitable choice when using <str,len> pair. But if replace v->s with strbuf, the possible changes will be larger. > > Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`, > > `--tcl`, `--perl` because if our binary raw data is passed to a variable > > in the host language, the host language may not support arbitrary binary > > data in the variables of its string type. > > Perl at least deals with that just fine, and to the extent that it > doesn't any new problems here would have nothing to do with \0 being in > the data. Perl doesn't have a notion of "binary has \0 in it", it always > supports \0, it has a notion of "is it utf-8 or not?", so any encoding > problems wouldn't be new. I'd think that the same would be true of > Python, but I'm not sure. > Not python safe. See [1]. Regarding the perl language, I support Junio's point of view: it can be re-supported in the future. > > > +test_expect_success 'basic atom: refs/tags/testtag *raw' ' > > + git cat-file commit refs/tags/testtag^{} >expected && > > + git for-each-ref --format="%(*raw)" refs/tags/testtag >actual && > > + sanitize_pgp <expected >expected.clean && > > + sanitize_pgp <actual >actual.clean && > > + echo "" >>expected.clean && > > Just "echo" will do, ditto for the rest. Also odd to go back and forth > between populating expected.clean & actual.clean. > Are you saying that sanitize_pgp is not needed? > > > +test_expect_success 'set up refs pointing to binary blob' ' > > + printf "a\0b\0c" >blob1 && > > + printf "a\0c\0b" >blob2 && > > + printf "\0a\0b\0c" >blob3 && > > + printf "abc" >blob4 && > > + printf "\0 \0 \0 " >blob5 && > > + printf "\0 \0a\0 " >blob6 && > > + printf " " >blob7 && > > + >blob8 && > > + git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 && > > + git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 && > > + git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 && > > + git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 && > > + git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 && > > + git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 && > > + git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 && > > + git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8 > > Hrm, xargs just to avoid: > > git update-ref ... $(git hash-object) ? > I didn’t think about it, just for convenience. > > +test_expect_success '%(raw) with --python must failed' ' > > + test_must_fail git for-each-ref --format="%(raw)" --python > > +' > > + > > +test_expect_success '%(raw) with --tcl must failed' ' > > + test_must_fail git for-each-ref --format="%(raw)" --tcl > > +' > > + > > +test_expect_success '%(raw) with --perl must failed' ' > > + test_must_fail git for-each-ref --format="%(raw)" --perl > > +' > > + > > +test_expect_success '%(raw) with --shell must failed' ' > > + test_must_fail git for-each-ref --format="%(raw)" --shell > > +' > > + > > +test_expect_success '%(raw) with --shell and --sort=raw must failed' ' > > + test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell > > +' > > s/must failed/must fail/, but see question above about encoding in these > languages... [1]: https://lore.kernel.org/git/CAOLTT8QR_GRm4TYk0E_eazQ+unVQODc-3L+b4V5JUN5jtZR8uA@mail.gmail.com/ Thanks for a review. -- ZheNing Hu

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Thu, Jun 17 2021, ZheNing Hu wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年6月17日周四下午3:16写道： >> >> > The raw data of blob, tree objects may contain '\0', but most of >> > the logic in `ref-filter` depends on the output of the atom being >> > text (specifically, no embedded NULs in it). >> > >> > E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()` >> > add the data to the buffer. The raw data of a tree object is >> > `100644 one\0...`, only the `100644 one` will be added to the buffer, >> > which is incorrect. >> > >> > Therefore, add a new member in `struct atom_value`: `s_size`, which >> > can record raw object size, it can help us add raw object data to >> > the buffer or compare two buffers which contain raw object data. >> >> Most of the functions that deal with this already use a strbuf in some >> way, before we had a const char *, now there's a size_t to go along with >> it, why not simply use a strbuf in the struct for the data? You'll then >> get the size and \0 handling for free, and any functions to deal with >> conversion can stick to the strbuf API, there seems to be a lot of back >> and forth now. >> > > Yes, strbuf is a suitable choice when using <str,len> pair. > But if replace v->s with strbuf, the possible changes will be larger. I for one would like to see it done that way, those changes are usually easy to read. Also it seems a large part of 2/8 is extra new code because we didn't do that, e.g. getting length differently if something is a strbuf or not, passing char*/size_t pairs to new functions etc. >> > Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`, >> > `--tcl`, `--perl` because if our binary raw data is passed to a variable >> > in the host language, the host language may not support arbitrary binary >> > data in the variables of its string type. >> >> Perl at least deals with that just fine, and to the extent that it >> doesn't any new problems here would have nothing to do with \0 being in >> the data. Perl doesn't have a notion of "binary has \0 in it", it always >> supports \0, it has a notion of "is it utf-8 or not?", so any encoding >> problems wouldn't be new. I'd think that the same would be true of >> Python, but I'm not sure. >> > > Not python safe. See [1]. > Regarding the perl language, I support Junio's point of view: it can be > re-supported in the future. Ah, I'd missed that. Anyway, if it's easy it seems you discovered that Perl deals with it correctly, so we could just have it support this. >> >> > +test_expect_success 'basic atom: refs/tags/testtag *raw' ' >> > + git cat-file commit refs/tags/testtag^{} >expected && >> > + git for-each-ref --format="%(*raw)" refs/tags/testtag >actual && >> > + sanitize_pgp <expected >expected.clean && >> > + sanitize_pgp <actual >actual.clean && >> > + echo "" >>expected.clean && >> >> Just "echo" will do, ditto for the rest. Also odd to go back and forth >> between populating expected.clean & actual.clean. >> > > Are you saying that sanitize_pgp is not needed? No that instead of: echo "" >x You can do: echo >x And also that going back and forth between populating different files is confusing, i.e. this: echo a >x echo c >y echo b >>x is better as: echo a >x echo b >>x echo c >y >> >> > +test_expect_success 'set up refs pointing to binary blob' ' >> > + printf "a\0b\0c" >blob1 && >> > + printf "a\0c\0b" >blob2 && >> > + printf "\0a\0b\0c" >blob3 && >> > + printf "abc" >blob4 && >> > + printf "\0 \0 \0 " >blob5 && >> > + printf "\0 \0a\0 " >blob6 && >> > + printf " " >blob7 && >> > + >blob8 && >> > + git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 && >> > + git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 && >> > + git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 && >> > + git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 && >> > + git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 && >> > + git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 && >> > + git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 && >> > + git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8 >> >> Hrm, xargs just to avoid: >> >> git update-ref ... $(git hash-object) ? >> > > I didn’t think about it, just for convenience. *nod*, Junio had a good suggestion. >> > +test_expect_success '%(raw) with --python must failed' ' >> > + test_must_fail git for-each-ref --format="%(raw)" --python >> > +' >> > + >> > +test_expect_success '%(raw) with --tcl must failed' ' >> > + test_must_fail git for-each-ref --format="%(raw)" --tcl >> > +' >> > + >> > +test_expect_success '%(raw) with --perl must failed' ' >> > + test_must_fail git for-each-ref --format="%(raw)" --perl >> > +' >> > + >> > +test_expect_success '%(raw) with --shell must failed' ' >> > + test_must_fail git for-each-ref --format="%(raw)" --shell >> > +' >> > + >> > +test_expect_success '%(raw) with --shell and --sort=raw must failed' ' >> > + test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell >> > +' >> >> s/must failed/must fail/, but see question above about encoding in these >> languages... > > > [1]: https://lore.kernel.org/git/CAOLTT8QR_GRm4TYk0E_eazQ+unVQODc-3L+b4V5JUN5jtZR8uA@mail.gmail.com/ > > Thanks for a review.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年6月17日周四下午10:45写道： > > > On Thu, Jun 17 2021, ZheNing Hu wrote: > > > > > Yes, strbuf is a suitable choice when using <str,len> pair. > > But if replace v->s with strbuf, the possible changes will be larger. > > I for one would like to see it done that way, those changes are usually > easy to read. Also it seems a large part of 2/8 is extra new code > because we didn't do that, e.g. getting length differently if something > is a strbuf or not, passing char*/size_t pairs to new functions etc. > After some refactoring, I found that there are two problems: 1. There are a lot of codes like this in ref-filter to fill v->s: v->s = show_ref(...) v->s = copy_email(...) It is very difficult to modify here: We know that show_ref() or copy_email() will allocate a block of memory to v->s, but if v->s is a strbuf, what should we do? In copy_email(), we can pass the v->s to copy_email() and use strbuf_add()/strbuf_addstr() instead of xstrdup() and xmemdupz(). But show_ref() will call external functions like shorten_unambiguous_ref(), we don’t know whether it will return us NULL or a dynamically allocated memory. If continue to pass v->s to the inner function, it is not a feasible method. Or we can use strbuf_attach() + strlen(), I'm not sure this is a good method. 2. See: - for (i = 0; i < used_atom_cnt; i++) { + for (i = 0; i < used_atom_cnt; i++) { struct atom_value *v = &ref->value[i]; - if (v->s == NULL && used_atom[i].source == SOURCE_NONE) + if (v->s.len == 0 && used_atom[i].source == SOURCE_NONE) return strbuf_addf_ret(err, -1, _("missing object %s for %s"), oid_to_hex(&ref->objectname), ref->refname); } In the case of using strbuf, I don’t know how to distinguish between an empty strbuf and NULL. It can be easily distinguished by using c-style "const char*". > > > > Not python safe. See [1]. > > Regarding the perl language, I support Junio's point of view: it can be > > re-supported in the future. > > Ah, I'd missed that. Anyway, if it's easy it seems you discovered that > Perl deals with it correctly, so we could just have it support this. > Well, it's ok, support for perl will be put in a separate commit. > >> > >> > +test_expect_success 'basic atom: refs/tags/testtag *raw' ' > >> > + git cat-file commit refs/tags/testtag^{} >expected && > >> > + git for-each-ref --format="%(*raw)" refs/tags/testtag >actual && > >> > + sanitize_pgp <expected >expected.clean && > >> > + sanitize_pgp <actual >actual.clean && > >> > + echo "" >>expected.clean && > >> > >> Just "echo" will do, ditto for the rest. Also odd to go back and forth > >> between populating expected.clean & actual.clean. > >> > > > > Are you saying that sanitize_pgp is not needed? > > No that instead of: > > echo "" >x > > You can do: > > echo >x > > And also that going back and forth between populating different files is > confusing, i.e. this: > > > echo a >x > echo c >y > echo b >>x > > is better as: > > echo a >x > echo b >>x > echo c >y > > Thanks, I get what you meant now. > >> > >> > +test_expect_success 'set up refs pointing to binary blob' ' > >> > + printf "a\0b\0c" >blob1 && > >> > + printf "a\0c\0b" >blob2 && > >> > + printf "\0a\0b\0c" >blob3 && > >> > + printf "abc" >blob4 && > >> > + printf "\0 \0 \0 " >blob5 && > >> > + printf "\0 \0a\0 " >blob6 && > >> > + printf " " >blob7 && > >> > + >blob8 && > >> > + git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 && > >> > + git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 && > >> > + git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 && > >> > + git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 && > >> > + git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 && > >> > + git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 && > >> > + git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 && > >> > + git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8 > >> > >> Hrm, xargs just to avoid: > >> > >> git update-ref ... $(git hash-object) ? > >> > > > > I didn’t think about it, just for convenience. > > *nod*, Junio had a good suggestion. > ok. Thanks. -- ZheNing Hu

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Fri, Jun 18 2021, ZheNing Hu wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年6月17日周四下午10:45写道： >> >> >> On Thu, Jun 17 2021, ZheNing Hu wrote: >> >> > >> > Yes, strbuf is a suitable choice when using <str,len> pair. >> > But if replace v->s with strbuf, the possible changes will be larger. >> >> I for one would like to see it done that way, those changes are usually >> easy to read. Also it seems a large part of 2/8 is extra new code >> because we didn't do that, e.g. getting length differently if something >> is a strbuf or not, passing char*/size_t pairs to new functions etc. >> > > After some refactoring, I found that there are two problems: > 1. There are a lot of codes like this in ref-filter to fill v->s: > > v->s = show_ref(...) > v->s = copy_email(...) > > It is very difficult to modify here: We know that show_ref() > or copy_email() will allocate a block of memory to v->s, but > if v->s is a strbuf, what should we do? In copy_email(), we > can pass the v->s to copy_email() and use strbuf_add()/strbuf_addstr() > instead of xstrdup() and xmemdupz(). But show_ref() will call > external functions like shorten_unambiguous_ref(), we don’t know > whether it will return us NULL or a dynamically allocated memory. > If continue to pass v->s to the inner function, it is not a feasible > method. Or we can use strbuf_attach() + strlen(), I'm not sure > this is a good method. > > 2. See: > > - for (i = 0; i < used_atom_cnt; i++) { > + for (i = 0; i < used_atom_cnt; i++) { > struct atom_value *v = &ref->value[i]; > - if (v->s == NULL && used_atom[i].source == SOURCE_NONE) > + if (v->s.len == 0 && used_atom[i].source == SOURCE_NONE) > return strbuf_addf_ret(err, -1, _("missing > object %s for %s"), > > oid_to_hex(&ref->objectname), ref->refname); > } > > In the case of using strbuf, I don’t know how to distinguish between an empty > strbuf and NULL. It can be easily distinguished by using c-style "const char*". Yes, sometimes it's just too much of a hassle, looking at shorten_unambiguous_ref() which returns a xstrdup()'d value that could indeed be strbuf_attach'd. I haven't tried the conversion myself, perhaps it's too much hassle. Just a suggestion from reading your patch in isolation. >> > >> > Not python safe. See [1]. >> > Regarding the perl language, I support Junio's point of view: it can be >> > re-supported in the future. >> >> Ah, I'd missed that. Anyway, if it's easy it seems you discovered that >> Perl deals with it correctly, so we could just have it support this. >> > > Well, it's ok, support for perl will be put in a separate commit. > >> >> >> >> > +test_expect_success 'basic atom: refs/tags/testtag *raw' ' >> >> > + git cat-file commit refs/tags/testtag^{} >expected && >> >> > + git for-each-ref --format="%(*raw)" refs/tags/testtag >actual && >> >> > + sanitize_pgp <expected >expected.clean && >> >> > + sanitize_pgp <actual >actual.clean && >> >> > + echo "" >>expected.clean && >> >> >> >> Just "echo" will do, ditto for the rest. Also odd to go back and forth >> >> between populating expected.clean & actual.clean. >> >> >> > >> > Are you saying that sanitize_pgp is not needed? >> >> No that instead of: >> >> echo "" >x >> >> You can do: >> >> echo >x >> >> And also that going back and forth between populating different files is >> confusing, i.e. this: >> >> >> echo a >x >> echo c >y >> echo b >>x >> >> is better as: >> >> echo a >x >> echo b >>x >> echo c >y >> >> > > Thanks, I get what you meant now. > >> >> >> >> > +test_expect_success 'set up refs pointing to binary blob' ' >> >> > + printf "a\0b\0c" >blob1 && >> >> > + printf "a\0c\0b" >blob2 && >> >> > + printf "\0a\0b\0c" >blob3 && >> >> > + printf "abc" >blob4 && >> >> > + printf "\0 \0 \0 " >blob5 && >> >> > + printf "\0 \0a\0 " >blob6 && >> >> > + printf " " >blob7 && >> >> > + >blob8 && >> >> > + git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 && >> >> > + git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 && >> >> > + git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 && >> >> > + git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 && >> >> > + git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 && >> >> > + git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 && >> >> > + git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 && >> >> > + git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8 >> >> >> >> Hrm, xargs just to avoid: >> >> >> >> git update-ref ... $(git hash-object) ? >> >> >> > >> > I didn’t think about it, just for convenience. >> >> *nod*, Junio had a good suggestion. >> > > ok. > > Thanks.

On the Git mailing list, Christian Couder wrote (reply to this):

On Fri, Jun 18, 2021 at 12:51 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > On Fri, Jun 18 2021, ZheNing Hu wrote: > > After some refactoring, I found that there are two problems: > > 1. There are a lot of codes like this in ref-filter to fill v->s: > > > > v->s = show_ref(...) > > v->s = copy_email(...) > > > > It is very difficult to modify here: We know that show_ref() > > or copy_email() will allocate a block of memory to v->s, but > > if v->s is a strbuf, what should we do? In copy_email(), we > > can pass the v->s to copy_email() and use strbuf_add()/strbuf_addstr() > > instead of xstrdup() and xmemdupz(). But show_ref() will call > > external functions like shorten_unambiguous_ref(), we don’t know > > whether it will return us NULL or a dynamically allocated memory. > > If continue to pass v->s to the inner function, it is not a feasible > > method. Or we can use strbuf_attach() + strlen(), I'm not sure > > this is a good method. If you resend this patch, it might be a good idea to add a short version of the above explanations into the commit message. [...] > > In the case of using strbuf, I don’t know how to distinguish between an empty > > strbuf and NULL. It can be easily distinguished by using c-style "const char*". Maybe this could also be part of the explanation. > Yes, sometimes it's just too much of a hassle, looking at > shorten_unambiguous_ref() which returns a xstrdup()'d value that could > indeed be strbuf_attach'd. I haven't tried the conversion myself, > perhaps it's too much hassle. > > Just a suggestion from reading your patch in isolation. Yeah, thanks for the review anyway!

gitgitgadget · 2021-06-17T07:29:24Z

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Tue, Jun 15 2021, ZheNing Hu via GitGitGadget wrote:

> This patch series make cat-file reuse ref-filter logic, which based on
> 5a5b5f78 ([GSOC] ref-filter: add %(rest) atom)
>
> Change from last version:
>
>  1. Use free_array_item_internal() to solve the memory leak problem.
>  2. Change commit message of ([GSOC] ref-filter: teach get_object() return
>     useful value).

I left some comments, but saw after the fact that I'd replied to the v1
E-Mails by accident, but anyway, the comments were all on things that
are also in v2, so it worked out in the end. Sorry about the confusion.

gitgitgadget · 2021-06-17T10:06:39Z

On the Git mailing list, ZheNing Hu wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年6月17日周四 下午3:27写道：
>
>
> On Tue, Jun 15 2021, ZheNing Hu via GitGitGadget wrote:
>
> > This patch series make cat-file reuse ref-filter logic, which based on
> > 5a5b5f78 ([GSOC] ref-filter: add %(rest) atom)
> >
> > Change from last version:
> >
> >  1. Use free_array_item_internal() to solve the memory leak problem.
> >  2. Change commit message of ([GSOC] ref-filter: teach get_object() return
> >     useful value).
>
> I left some comments, but saw after the fact that I'd replied to the v1
> E-Mails by accident, but anyway, the comments were all on things that
> are also in v2, so it worked out in the end. Sorry about the confusion.

It's okay. Your comments are very useful.

Thanks.
--
ZheNing Hu

gitgitgadget · 2021-06-18T13:50:10Z

User Christian Couder <christian.couder@gmail.com> has been added to the cc: list.

adlternative · 2021-06-19T07:01:30Z

/submit

gitgitgadget · 2021-06-19T07:03:32Z

Submitted as pull.980.v3.git.1624086181.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-980/adlternative/cat-file-batch-refactor-v3

To fetch this version to local tag pr-980/adlternative/cat-file-batch-refactor-v3:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-980/adlternative/cat-file-batch-refactor-v3

gitgitgadget · 2021-06-21T05:58:13Z

Documentation/git-cat-file.txt

@@ -226,6 +226,12 @@ newline. The available atoms are:
 	after that first run of whitespace (i.e., the "rest" of the


On the Git mailing list, Christian Couder wrote (reply to this):

On Sat, Jun 19, 2021 at 9:03 AM ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: ZheNing Hu <adlternative@gmail.com> > > In order to let cat-file use ref-filter logic, the following > methods are used: Maybe: s/the following methods are used/let's do the following/ > 1. Add `cat_file_mode` member in struct `ref_format`, this can > help us reject atoms in verify_ref_format() which cat-file > cannot use, e.g. `%(refname)`, `%(push)`, `%(upstream)`... > 2. Change the type of member `format` in struct `batch_options` > to `ref_format`, We can add format data in it. Not sure what "We can add format data in it." means. > 3. Let `batch_objects()` add atoms to format, and use > `verify_ref_format()` to check atoms. > 4. Use `has_object_file()` in `batch_one_object()` to check > whether the input object exists. > 5. Let get_object() return 1 and print "<oid> missing" instead > of returning -1 and printing "missing object <oid> for <refname>", > this can help `format_ref_array_item()` just report that the > object is missing without letting Git exit. > 6. Use `format_ref_array_item()` in `batch_object_write()` to > get the formatted data corresponding to the object. If the > return value of `format_ref_array_item()` is equals to zero, > use `batch_write()` to print object data; else if the return > value less than zero, use `die()` to print the error message > and exit; else return value greater than zero, only print the s/else return value greater/else if the return value is greater/ > error message, but not exit. s/not exit/don't exit/ > 7. Use free_ref_array_item_value() to free ref_array_item's > value. That looks like a lot of changes in a single commit. I wonder if this commit could be split. > Most of the atoms in `for-each-ref --format` are now supported, > such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`, > `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected: > `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`, > `%(flag)`, `%(HEAD)`, because our objects don't have refname. s/refname/a refname/ It might be worth talking a bit about possible performance changes. [...] > + ret = format_ref_array_item(&item, &opt->format, scratch, &err); > + if (!ret) { > + strbuf_addch(scratch, '\n'); > + batch_write(opt, scratch->buf, scratch->len); > + } else if (ret < 0) { > + die("%s\n", err.buf); This if (ret < 0) could be checked first. > + } else { > + /* when ret > 0 , don't call die and print the err to stdout*/ I think it would be more helpful to tell what ret > 0 means, rather than what we do below (which can easily be seen). > + printf("%s\n", err.buf); > + fflush(stdout); > } For example: if (ret < 0) { die("%s\n", err.buf); if (ret) { /* ret > 0 means ... */ printf("%s\n", err.buf); fflush(stdout); } else { strbuf_addch(scratch, '\n'); batch_write(opt, scratch->buf, scratch->len); } > + free_ref_array_item_value(&item); > + strbuf_release(&err); > } [...] > +static int batch_objects(struct batch_options *opt, const struct option *options) It's unfortunate that one argument is called "opt" and the other one "options". I wonder if the first one could be called "batch" as it seems to be called this way somewhere else. > + if (!opt->format.format) > + strbuf_addstr(&format, "%(objectname) %(objecttype) %(objectsize)"); > + else > + strbuf_addstr(&format, opt->format.format); If there is no reason for the condition to be (!X) over just (X), I think the latter is a bit better. > if (opt->print_contents) > - data.info.typep = &data.type; > + strbuf_addstr(&format, "\n%(raw)"); > + opt->format.format = format.buf; I wonder if this should be: opt->format.format = strbuf_detach(&format, NULL); > + if (verify_ref_format(&opt->format)) > + usage_with_options(cat_file_usage, options); [...] > @@ -86,7 +87,7 @@ struct ref_format { > int need_color_reset_at_eol; > }; > > -#define REF_FORMAT_INIT { NULL, NULL, 0, 0, -1 } > +#define REF_FORMAT_INIT { NULL, NULL, 0, 0, 0, -1 } Maybe this can already be changed to a designated initializer, like Ævar suggested recently. [...] > +test_expect_success 'cat-file refs/heads/main refs/tags/testtag %(rest)' ' If this test is about checking that %(rest) works with both a branch and a tag, it might be better to say it more explicitly. > + cat >expected <<-EOF && > + 123 commit 123 > + 456 tag 456 > + EOF > + git cat-file --batch-check="%(rest) %(objecttype) %(rest)" >actual <<-EOF && > + refs/heads/main 123 > + refs/tags/testtag 456 > + EOF > + test_cmp expected actual > +'

On the Git mailing list, ZheNing Hu wrote (reply to this):

Christian Couder <christian.couder@gmail.com> 于2021年6月21日周一下午1:55写道： > > > 1. Add `cat_file_mode` member in struct `ref_format`, this can > > help us reject atoms in verify_ref_format() which cat-file > > cannot use, e.g. `%(refname)`, `%(push)`, `%(upstream)`... > > 2. Change the type of member `format` in struct `batch_options` > > to `ref_format`, We can add format data in it. > > Not sure what "We can add format data in it." means. Well, there is something wrong with the expression here. What I want to express is that we can fill its member "format" with the atoms like "%(objectname) %(refname)", and then pass it to the ref-filter. > > 7. Use free_ref_array_item_value() to free ref_array_item's > > value. > > That looks like a lot of changes in a single commit. I wonder if this > commit could be split. > Yeah, But I don’t know if I should take it apart step by step, If taken apart, those intermediate commits are likely to fail the test. > > Most of the atoms in `for-each-ref --format` are now supported, > > such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`, > > `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected: > > `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`, > > `%(flag)`, `%(HEAD)`, because our objects don't have refname. > > s/refname/a refname/ > > It might be worth talking a bit about possible performance changes. > Makes sense. The performance has indeed deteriorated. > [...] > > > + ret = format_ref_array_item(&item, &opt->format, scratch, &err); > > + if (!ret) { > > + strbuf_addch(scratch, '\n'); > > + batch_write(opt, scratch->buf, scratch->len); > > + } else if (ret < 0) { > > + die("%s\n", err.buf); > > This if (ret < 0) could be checked first. Yes, it is better to put error checking first. > > > + } else { > > + /* when ret > 0 , don't call die and print the err to stdout*/ > > I think it would be more helpful to tell what ret > 0 means, rather > than what we do below (which can easily be seen). > Ah, There is indeed only one situation for ret > 0 for the time being: Show "<oid> missing" without exiting Git. > > + printf("%s\n", err.buf); > > + fflush(stdout); > > } > > For example: > > if (ret < 0) { > die("%s\n", err.buf); > if (ret) { > /* ret > 0 means ... */ > printf("%s\n", err.buf); > fflush(stdout); > } else { > strbuf_addch(scratch, '\n'); > batch_write(opt, scratch->buf, scratch->len); > } > Yes, this might be better. > > + free_ref_array_item_value(&item); > > + strbuf_release(&err); > > } > > [...] > > > +static int batch_objects(struct batch_options *opt, const struct option *options) > > It's unfortunate that one argument is called "opt" and the other one > "options". I wonder if the first one could be called "batch" as it > seems to be called this way somewhere else. > OK. > > + if (!opt->format.format) > > + strbuf_addstr(&format, "%(objectname) %(objecttype) %(objectsize)"); > > + else > > + strbuf_addstr(&format, opt->format.format); > > If there is no reason for the condition to be (!X) over just (X), I > think the latter is a bit better. > Although I think it doesn’t matter which one to use first, I will still follow your suggestions. > > if (opt->print_contents) > > - data.info.typep = &data.type; > > + strbuf_addstr(&format, "\n%(raw)"); > > + opt->format.format = format.buf; > > I wonder if this should be: > > opt->format.format = strbuf_detach(&format, NULL); > No. here our opt->format.format will not be changed, it would be better for us to use `strbuf_release(&format)` for resource recovery. (strbuf_detach() will forced to let us free opt->format.format) > > + if (verify_ref_format(&opt->format)) > > + usage_with_options(cat_file_usage, options); > > [...] > > > @@ -86,7 +87,7 @@ struct ref_format { > > int need_color_reset_at_eol; > > }; > > > > -#define REF_FORMAT_INIT { NULL, NULL, 0, 0, -1 } > > +#define REF_FORMAT_INIT { NULL, NULL, 0, 0, 0, -1 } > > Maybe this can already be changed to a designated initializer, like > Ævar suggested recently. > I agree. > [...] > > > +test_expect_success 'cat-file refs/heads/main refs/tags/testtag %(rest)' ' > > If this test is about checking that %(rest) works with both a branch > and a tag, it might be better to say it more explicitly. > OK. Thanks. -- ZheNing Hu

adlternative · 2021-06-22T03:18:50Z

/submit

gitgitgadget · 2021-06-22T03:21:30Z

Submitted as pull.980.v4.git.1624332054.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-980/adlternative/cat-file-batch-refactor-v4

To fetch this version to local tag pr-980/adlternative/cat-file-batch-refactor-v4:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-980/adlternative/cat-file-batch-refactor-v4

gitgitgadget · 2021-06-24T04:06:20Z

ref-filter.c

@@ -144,6 +144,7 @@ enum atom_type {
 	ATOM_BODY,


On the Git mailing list, Bagas Sanjaya wrote (reply to this):

On 22/06/21 10.20, ZheNing Hu via GitGitGadget wrote: > From: ZheNing Hu <adlternative@gmail.com> > > Since in the refactor of `git cat-file --batch` later, > oid_object_info_extended() in get_object() will be used to obtain > the info of an object with it's oid. When the object cannot be > obtained in the git repository, `cat-file --batch` expects to output > "<oid> missing" and continue the next oid query instead of letting > Git exit. In other error conditions, Git should exit normally. So we > can achieve this function by passing the return value of get_object(). > s/Since/Because/ -- An old man doll... just what I always wanted! - Clara

gitgitgadget · 2021-06-24T04:08:56Z

builtin/cat-file.c

@@ -428,6 +428,13 @@ static void batch_one_object(const char *obj_name,
 		return;


On the Git mailing list, Bagas Sanjaya wrote (reply to this):

On 22/06/21 10.20, ZheNing Hu via GitGitGadget wrote: > From: ZheNing Hu <adlternative@gmail.com> > > Since cat-file reuses ref-filter logic later will add the > formal parameter "const struct option *options" to > batch_objects(), the two synonymous parameters of "opt" > and "options" may confuse readers, so change batch_options > parameter of batch_objects() from "opt" to "batch". > Better say "Because later cat-file reuses ref-filter logic that will add parameter ... ". -- An old man doll... just what I always wanted! - Clara

In the original ref-filter design, it will copy the parsed atom's name and attributes to `used_atom[i].name` in the atom's parsing step, and use it again for string matching in the later specific ref attributes filling step. It use a lot of string matching to determine which atom we need. Introduce the enum "atom_type", each enum value is named as `ATOM_*`, which is the index of each corresponding valid_atom entry. In the first step of the atom parsing, `used_atom.atom_type` will record corresponding enum value from valid_atom entry index, and then in specific reference attribute filling step, only need to compare the value of the `used_atom[i].atom_type` to check the atom type. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: ZheNing Hu <adlternative@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The --batch code to print an object assumes we found out the type of the object from calling oid_object_info_extended(). This is true for the default format, but even in a custom format, we manually modify the object_info struct to ask for the type. This assumption was broken by 845de33 (cat-file: avoid noop calls to sha1_object_info_extended, 2016-05-18). That commit skips the call to oid_object_info_extended() entirely when --batch-all-objects is in use, and the custom format does not include any placeholders that require calling it. Or when the custom format only include placeholders like %(objectname) or %(rest), oid_object_info_extended() will not get the type of the object. This results in an error when we try to confirm that the type didn't change: $ git cat-file --batch=batman --batch-all-objects batman fatal: object 0000239 changed type!? and also has other subtle effects (e.g., we'd fail to stream a blob, since we don't realize it's a blob in the first place). We can fix this by flipping the order of the setup. The check for "do we need to get the object info" must come _after_ we've decided whether we need to look up the type. Helped-by: Jeff King <peff@peff.net> Signed-off-by: ZheNing Hu <adlternative@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>

There are two "if (opt->all_objects)" blocks next to each other, merge them into one to provide better readability. Helped-by: Jeff King <peff@peff.net> Signed-off-by: ZheNing Hu <adlternative@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>

Only tag and commit objects use `grab_sub_body_contents()` to grab object contents in the current codebase. We want to teach the function to also handle blobs and trees to get their raw data, without parsing a blob (whose contents looks like a commit or a tag) incorrectly as a commit or a tag. Skip the block of code that is specific to handling commits and tags early when the given object is of a wrong type to help later addition to handle other types of objects in this function. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: ZheNing Hu <adlternative@gmail.com>

Add new formatting option `%(raw)`, which will print the raw object data without any changes. It will help further to migrate all cat-file formatting logic from cat-file to ref-filter. The raw data of blob, tree objects may contain '\0', but most of the logic in `ref-filter` depends on the output of the atom being text (specifically, no embedded NULs in it). E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()` add the data to the buffer. The raw data of a tree object is `100644 one\0...`, only the `100644 one` will be added to the buffer, which is incorrect. Therefore, we need to find a way to record the length of the atom_value's member `s`. Although strbuf can already record the string and its length, if we want to replace the type of atom_value's member `s` with strbuf, many places in ref-filter that are filled with dynamically allocated mermory in `v->s` are not easy to replace. At the same time, we need to check if `v->s == NULL` in populate_value(), and strbuf cannot easily distinguish NULL and empty strings, but c-style "const char *" can do it. So add a new member in `struct atom_value`: `s_size`, which can record raw object size, it can help us add raw object data to the buffer or compare two buffers which contain raw object data. Note that `--format=%(raw)` cannot be used with `--python`, `--shell`, `--tcl`, and `--perl` because if the binary raw data is passed to a variable in such languages, these may not support arbitrary binary data in their string variable type. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Helped-by: Bagas Sanjaya <bagasdotme@gmail.com> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Helped-by: Felipe Contreras <felipe.contreras@gmail.com> Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk> Helped-by: Junio C Hamano <gitster@pobox.com> Based-on-patch-by: Olga Telezhnaya <olyatelezhnaya@gmail.com> Signed-off-by: ZheNing Hu <adlternative@gmail.com>

Because the perl language can handle binary data correctly, add the function perl_quote_buf_with_len(), which can specify the length of the data and prevent the data from being truncated at '\0' to help `--format="%(raw)"` re-support `--perl`. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: ZheNing Hu <adlternative@gmail.com>

Use non-const ref_format in *_atom_parser(), which can help us modify the members of ref_format in *_atom_parser(). Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: ZheNing Hu <adlternative@gmail.com>

In order to let "cat-file --batch=%(rest)" use the ref-filter interface, add %(rest) atom for ref-filter. "git for-each-ref", "git branch", "git tag" and "git verify-tag" will reject %(rest) by default. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: ZheNing Hu <adlternative@gmail.com>

Because in the refactor of `git cat-file --batch` later, oid_object_info_extended() in get_object() will be used to obtain the info of an object with it's oid. When the object cannot be obtained in the git repository, `cat-file --batch` expects to output "<oid> missing" and continue the next oid query instead of letting Git exit. In other error conditions, Git should exit normally. So we can achieve this function by passing the return value of get_object(). Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: ZheNing Hu <adlternative@gmail.com>

When we use ref_array_item which is not dynamically allocated and want to free the space of its member "value" after the end of use, free_array_item() does not meet our needs, because it tries to free ref_array_item itself and its member "symref". Introduce free_ref_array_item_value() for freeing ref_array_item value. It will be called internally by free_array_item(), and it will help `cat-file --batch` free ref_array_item's value memory later. Helped-by: Junio C Hamano <gitster@pobox.com> Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: ZheNing Hu <adlternative@gmail.com>

Add `cat_file_mode` member in struct `ref_format`, when `cat-file --batch` use ref-filter logic later, it can help us reject atoms in verify_ref_format() which cat-file cannot use, e.g. `%(refname)`, `%(push)`, `%(upstream)`... Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: ZheNing Hu <adlternative@gmail.com>

Let get_object() return 1 and print "<oid> missing" instead of returning -1 and printing "missing object <oid> for <refname>" if oid_object_info_extended() unable to find the data corresponding to oid. When `cat-file --batch` use ref-filter logic later it can help `format_ref_array_item()` just report that the object is missing without letting Git exit. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: ZheNing Hu <adlternative@gmail.com>

Use `has_object_file()` in `batch_one_object()` to check whether the input object exists. This can help us reject the missing oid when we let `cat-file --batch` use ref-filter logic later. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: ZheNing Hu <adlternative@gmail.com>

Because later cat-file reuses ref-filter logic that will add parameter "const struct option *options" to batch_objects(), the two synonymous parameters of "opt" and "options" may confuse readers, so change batch_options parameter of batch_objects() from "opt" to "batch". Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: ZheNing Hu <adlternative@gmail.com>

In order to let cat-file use ref-filter logic, let's do the following: 1. Change the type of member `format` in struct `batch_options` to `ref_format`, we will pass it to ref-filter later. 2. Let `batch_objects()` add atoms to format, and use `verify_ref_format()` to check atoms. 3. Use `format_ref_array_item()` in `batch_object_write()` to get the formatted data corresponding to the object. If the return value of `format_ref_array_item()` is equals to zero, use `batch_write()` to print object data; else if the return value is less than zero, use `die()` to print the error message and exit; else if return value is greater than zero, only print the error message, but don't exit. 4. Use free_ref_array_item_value() to free ref_array_item's value. Most of the atoms in `for-each-ref --format` are now supported, such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`, `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected: `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`, `%(flag)`, `%(HEAD)`, because these atoms are unique to those objects that pointed to by a ref, "for-each-ref"'s family can naturally use these atoms, but not all objects are pointed to be a ref, so "cat-file" will not be able to use them. The performance for `git cat-file --batch-all-objects --batch-check` on the Git repository itself with performance testing tool `hyperfine` changes from 669.4 ms ± 31.1 ms to 1.134 s ± 0.063 s. The performance for `git cat-file --batch-all-objects --batch >/dev/null` on the Git repository itself with performance testing tool `time` change from "27.37s user 0.29s system 98% cpu 28.089 total" to "33.69s user 1.54s system 87% cpu 40.258 total". Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: ZheNing Hu <adlternative@gmail.com>

Reuse the `err` buffer in batch_object_write(), as the buffer `scratch` does. This will reduce the overhead of multiple allocations of memory of the err buffer. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: ZheNing Hu <adlternative@gmail.com>

After cat-file reuses the ref-filter logic, we re-implement the functions of --textconv and --filters options. Add members `use_textconv` and `use_filters` in struct `ref_format`, and use global variables `use_filters` and `use_textconv` in `ref-filter.c`, so that we can filter the content of the object in get_object(). Use `actual_oi` to record the real expand_data: it may point to the original `oi` or the `act_oi` processed by `textconv_object()` or `convert_to_working_tree()`. `grab_values()` will grab the contents of `actual_oi` and `grab_common_values()` to grab the contents of origin `oi`, this ensures that `%(objectsize)` still uses the size of the unfiltered data. In `get_object()`, we made an optimization: Firstly, get the size and type of the object instead of directly getting the object data. If using --textconv, after successfully obtaining the filtered object data, an extra oid_object_info_extended() will be skipped, which can reduce the cost of object data copy; If using --filter, the data of the object first will be getted first, and then convert_to_working_tree() will be used to get the filtered object data. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: ZheNing Hu <adlternative@gmail.com>

Because "atom_type == ATOM_OBJECTNAME" implies the condition of `starts_with(name, "objectname")`, "atom_type == ATOM_TREE" implies the condition of `starts_with(name, "tree")`, so the check for `starts_with(name, field)` in grab_oid() is redundant. So Remove the grab_oid() from ref-filter, to reduce repeated check. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: ZheNing Hu <adlternative@gmail.com>

gitgitgadget · 2021-07-01T14:21:25Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

ZheNing Hu <adlternative@gmail.com> writes:

> Junio C Hamano <gitster@pobox.com> 于2021年7月1日周四 上午6:04写道：
>>
>> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>> > This patch series make cat-file reuse ref-filter logic.
>>
>> Unfortunately this seems to interact with your own
>> zh/cat-file-batch-fix rather badly.
>>
>
> Well, it's because I didn't base this patch on it.
> That should be easy to achieve.

It is preferrable for contributors try merging their individual
topics with the rest of 'seen' to see if there are potential
conflicts (either textual or semantic) before sending their topics
out.  Not all topics need to build on top of other topics (in fact,
the fewer inter-dependencies they have, the better), but in this
case, I think it makes sense to build one on top of the other.

gitgitgadget · 2021-07-01T21:24:27Z

This branch is now known as zh/ref-filter-raw-data.

gitgitgadget · 2021-07-01T21:24:28Z

This patch series was integrated into seen via git@3c525b5.

gitgitgadget · 2021-07-02T21:33:23Z

This patch series was integrated into seen via git@ce080dd.

gitgitgadget · 2021-07-03T00:35:56Z

This patch series was integrated into seen via git@c8db213.

gitgitgadget · 2021-07-09T10:07:14Z

On the Git mailing list, ZheNing Hu wrote (reply to this):

Hi, Junio,

Junio C Hamano <gitster@pobox.com> 于2021年7月1日周四 下午10:17写道：
>
> ZheNing Hu <adlternative@gmail.com> writes:
>
> > Junio C Hamano <gitster@pobox.com> 于2021年7月1日周四 上午6:04写道：
> >>
> >> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >>
> >> > This patch series make cat-file reuse ref-filter logic.
> >>
> >> Unfortunately this seems to interact with your own
> >> zh/cat-file-batch-fix rather badly.
> >>
> >
> > Well, it's because I didn't base this patch on it.
> > That should be easy to achieve.
>
> It is preferrable for contributors try merging their individual
> topics with the rest of 'seen' to see if there are potential
> conflicts (either textual or semantic) before sending their topics
> out.  Not all topics need to build on top of other topics (in fact,
> the fewer inter-dependencies they have, the better), but in this
> case, I think it makes sense to build one on top of the other.
>

I have a "rebase" trouble:

My new feature branch "cat-file-batch-refactor-rebase-version" should base on
zh/cat-file-batch-fix and  zh/ref-filter-atom-type, so last time I choice
(bb9a3a8f77 Merge branch 'zh/cat-file-batch-fix' into jch) as the patch base.

But github only allow me base the patch on a branch, so I choice
"gitgitgadget:seen"
as my github PR base. It causes that some merge commit include in it. [1]

So In order to prevent these "merge" commits from being sent, the GGG mechanism
is modified to reject their merge commits.

Now I can't choice a good branch as my patch base... Have any ideas?

Thanks.

[1]: https://github.com/gitgitgadget/git/pull/989

--
ZheNing Hu

gitgitgadget bot reviewed Jun 12, 2021

View reviewed changes

adlternative force-pushed the cat-file-batch-refactor branch from 0004d5b to cbf7d51 Compare June 15, 2021 12:56

gitgitgadget bot reviewed Jun 16, 2021

View reviewed changes

gitgitgadget bot reviewed Jun 17, 2021

View reviewed changes

adlternative force-pushed the cat-file-batch-refactor branch from cbf7d51 to 86ac3bc Compare June 19, 2021 05:38

adlternative changed the base branch from zh/ref-filter-atom-type to zh/ref-filter-raw-data June 19, 2021 05:55

adlternative changed the base branch from zh/ref-filter-raw-data to zh/ref-filter-atom-type June 19, 2021 05:56

adlternative changed the base branch from zh/ref-filter-atom-type to zh/ref-filter-raw-data June 19, 2021 06:12

adlternative changed the base branch from zh/ref-filter-raw-data to zh/ref-filter-atom-type June 19, 2021 06:13

gitgitgadget bot reviewed Jun 21, 2021

View reviewed changes

adlternative force-pushed the cat-file-batch-refactor branch 2 times, most recently from 0bb7727 to d1114a2 Compare June 22, 2021 02:41

gitgitgadget bot reviewed Jun 24, 2021

View reviewed changes

adlternative added 18 commits July 1, 2021 21:12

adlternative force-pushed the cat-file-batch-refactor branch from 891d62f to c3b60b9 Compare July 1, 2021 13:48

adlternative changed the base branch from zh/ref-filter-atom-type to zh/cat-file-batch-fix July 1, 2021 13:51

adlternative changed the base branch from zh/cat-file-batch-fix to zh/ref-filter-raw-data July 1, 2021 13:58

adlternative changed the base branch from zh/ref-filter-raw-data to zh/ref-filter-atom-type July 1, 2021 13:59

gitgitgadget bot added the seen label Jul 1, 2021

adlternative closed this Jul 4, 2021

		@@ -1356,11 +1356,12 @@ static void append_lines(struct strbuf out, const char buf, unsigned long size
		}

		@@ -235,6 +235,15 @@ and `date` to extract the named component. For email fields (`authoremail`,
		without angle brackets, and `:localpart` to get the part before the `@` symbol

		@@ -226,6 +226,12 @@ newline. The available atoms are:
		after that first run of whitespace (i.e., the "rest" of the

		@@ -428,6 +428,13 @@ static void batch_one_object(const char *obj_name,
		return;

[GSOC][RFC] cat-file: reuse ref-filter logic #980

[GSOC][RFC] cat-file: reuse ref-filter logic #980

Conversation

adlternative commented Jun 12, 2021 • edited Loading

adlternative commented Jun 12, 2021

gitgitgadget bot commented Jun 12, 2021

gitgitgadget bot Jun 12, 2021

Choose a reason for hiding this comment

gitgitgadget bot Jun 13, 2021

Choose a reason for hiding this comment

adlternative commented Jun 15, 2021

gitgitgadget bot commented Jun 15, 2021

gitgitgadget bot commented Jun 16, 2021

gitgitgadget bot Jun 16, 2021

Choose a reason for hiding this comment

gitgitgadget bot Jun 17, 2021

Choose a reason for hiding this comment

gitgitgadget bot Jun 16, 2021

Choose a reason for hiding this comment

gitgitgadget bot Jun 17, 2021

Choose a reason for hiding this comment

gitgitgadget bot commented Jun 17, 2021

gitgitgadget bot Jun 17, 2021

Choose a reason for hiding this comment

gitgitgadget bot Jun 17, 2021

Choose a reason for hiding this comment

gitgitgadget bot commented Jun 17, 2021

gitgitgadget bot Jun 17, 2021

Choose a reason for hiding this comment

gitgitgadget bot Jun 17, 2021

Choose a reason for hiding this comment

gitgitgadget bot Jun 17, 2021

Choose a reason for hiding this comment

gitgitgadget bot Jun 17, 2021

Choose a reason for hiding this comment

gitgitgadget bot Jun 17, 2021

Choose a reason for hiding this comment

gitgitgadget bot Jun 18, 2021

Choose a reason for hiding this comment

gitgitgadget bot Jun 18, 2021

Choose a reason for hiding this comment

gitgitgadget bot commented Jun 17, 2021

gitgitgadget bot commented Jun 17, 2021

gitgitgadget bot commented Jun 18, 2021

adlternative commented Jun 19, 2021

gitgitgadget bot commented Jun 19, 2021

gitgitgadget bot Jun 21, 2021

Choose a reason for hiding this comment

gitgitgadget bot Jun 21, 2021

Choose a reason for hiding this comment

adlternative commented Jun 22, 2021

gitgitgadget bot commented Jun 22, 2021

gitgitgadget bot Jun 24, 2021

Choose a reason for hiding this comment

gitgitgadget bot Jun 24, 2021

Choose a reason for hiding this comment

gitgitgadget bot commented Jul 1, 2021

gitgitgadget bot commented Jul 1, 2021

gitgitgadget bot commented Jul 1, 2021

gitgitgadget bot commented Jul 2, 2021

gitgitgadget bot commented Jul 3, 2021

gitgitgadget bot commented Jul 9, 2021

adlternative commented Jun 12, 2021 •

edited

Loading