New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix two rebase bugs related to total_nr
#1531
Conversation
The `total_nr` field in the `todo_list` structure merely serves display purposes, and should only be used when generating the progress message. In these two instances, however, we want to loop over all of the commands in the parsed rebase script. The loop limit therefore needs to be `nr`, which refers to the count of commands in the current `todo_list`. This is important because the two numbers, `nr` and `total_nr` can differ wildly, e.g. due to `total_nr` _not_ counting comments or empty lines, while `nr` skips any commands that already moved from the `git-rebase-todo` file to the `done` file. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
b3b7ecf
to
d12d546
Compare
/submit |
Submitted as pull.1531.git.1683759338.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
@@ -4202,7 +4202,7 @@ void todo_list_filter_update_refs(struct repository *r, | |||
if (!is_null_oid(&rec->after)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> The `total_nr` field in the `todo_list` structure merely serves display
> purposes, and should only be used when generating the progress message.
This is a good distinction to keep in mind.
I notice that sequencer.h does not do a very good job at giving
guidance on how these members are to be used.
struct todo_list {
struct strbuf buf;
struct todo_item *items;
int nr, alloc, current;
int done_nr, total_nr;
};
The <nr,alloc> tuple lets readers to guess they are tied to the
items[] array, so perhaps it is sufficient to give a comment to
total_nr member and probably done_nr while we are at it.
> In these two instances, however, we want to loop over all of the
> commands in the parsed rebase script. The loop limit therefore needs to
> be `nr`, which refers to the count of commands in the current
> `todo_list`.
Yes.
> This is important because the two numbers, `nr` and `total_nr` can
> differ wildly, e.g. due to `total_nr` _not_ counting comments or empty
> lines, while `nr` skips any commands that already moved from the
> `git-rebase-todo` file to the `done` file.
OK. The items[] array has not just executable insn but also holds
comments and NOOP, and <nr,alloc> tuple is used to control its
sizing in the usual ALLOC_GROW() way. Because total_nr is used only
for progress, it naturally excludes the no-ops. Elements of items[]
array are consumed in core by incrementing the current pointer and
nr will not update while that is happening, but when the sequencer
gives control the user and then takes the control back upon resuming,
items[] would contain only the insns that have not been moved to the
done list, meaning that 'nr' would shrink. total_nr is compensated
by reading the done list and adds its size to 'nr'.
OK, that all makes sense. The whole arrangement sounds like a bit
more error prone than necessary (an obvious alternative is to just
always keep the whole todo list with "done up to here" pointner) but
changing that is not in the scope of these fixes, because such an
arrangement wouldn't have prevented this particular bug from
happening, as total_nr and nr could still be different due to
no-ops.
> diff --git a/sequencer.c b/sequencer.c
> index 5f22b7cd377..f5d89abdc5e 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -4202,7 +4202,7 @@ void todo_list_filter_update_refs(struct repository *r,
> if (!is_null_oid(&rec->after))
> continue;
>
> - for (j = 0; !found && j < todo_list->total_nr; j++) {
> + for (j = 0; !found && j < todo_list->nr; j++) {
> struct todo_item *item = &todo_list->items[j];
The .total_nr member could be smaller (because it does not count
noops) or larger (because it counts already done steps that are not
in the items[]) than the .nr member, and the old code could have
made out-of-bounds access into the items[] array. It is now
corrected. Excellent.
> const char *arg = todo_list->buf.buf + item->arg_offset;
>
> @@ -4232,7 +4232,7 @@ void todo_list_filter_update_refs(struct repository *r,
> * For each todo_item, check if its ref is in the update_refs list.
> * If not, then add it as an un-updated ref.
> */
> - for (i = 0; i < todo_list->total_nr; i++) {
> + for (i = 0; i < todo_list->nr; i++) {
> struct todo_item *item = &todo_list->items[i];
> const char *arg = todo_list->buf.buf + item->arg_offset;
> int j, found = 0;
Ditto. Will queue.
Thanks.
@@ -2609,7 +2609,7 @@ int todo_list_parse_insn_buffer(struct repository *r, char *buf, | |||
char *p = buf, *next_p; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
> index f351701fec2..8da99a075dc 100755
> --- a/t/t3430-rebase-merges.sh
> +++ b/t/t3430-rebase-merges.sh
> @@ -517,4 +517,11 @@ test_expect_success '--rebase-merges with message matched with onto label' '
> EOF
> '
>
> +test_expect_success 'progress shows the correct total' '
> + git checkout -b progress H &&
> + git rebase --rebase-merges --force-rebase --verbose A 2> err &&
> + grep "^Rebasing.*14.$" err >progress &&
> + test_line_count = 14 progress
> +'
This is trying to find output lines that say "Rebasing (NN/14)";
Clarifying that we mean the denominator by matching
"^Rebasing.*/14)$"
or even
"^Rebasing ([0-9]*/14)$"
would have made it easier to grok (even though I know we do not have
114 steps to rebase in this one ;-).
Looks good. Will queue. Thanks.
This branch is now known as |
This patch series was integrated into seen via git@af8d571. |
@@ -2609,7 +2609,7 @@ int todo_list_parse_insn_buffer(struct repository *r, char *buf, | |||
char *p = buf, *next_p; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Phillip Wood wrote (reply to this):
Hi Dscho
On 10/05/2023 23:55, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> > For regular, non-`--rebase-merges` runs, there is very little work to do
> for the parser when determining the total number of commands in a rebase
> script: it is simply the number of lines after stripping the commented
> lines and then trimming the trailing empty line, if any.
> > The `--rebase-merges` mode complicates things by introducing empty lines
> and comments in the middle of the script. These should _not_ be counted
> as commands, and indeed, when an interactive rebase is interrupted and
> subsequently resumed, the total number of commands can magically shrink,
> sometimes dramatically.
> > The reason for this strange behavior is that empty lines _are_ counted
> in `edit_todo_list()` (but not the comments, as they are stripped via
> `strbuf_stripspace(..., 1)`, which is a bug.
> > Let's fix this so that the correct total number is shown from the
> get-go, by carefully adjusting it according to what's in the rebase
> script. Extra care needs to be taken in case the user edits the script:
> the number of commands might be different after the user edited than
> beforehand.
> > Note: Even though commented lines are skipped in `edit_todo_list()`, we
> still need to handle `TODO_COMMENT` items by decrementing the
> already-incremented `total_nr` again: empty lines are also marked as
> `TODO_COMMENT`.
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> sequencer.c | 12 ++++++++----
> t/t3430-rebase-merges.sh | 7 +++++++
> 2 files changed, 15 insertions(+), 4 deletions(-)
> > diff --git a/sequencer.c b/sequencer.c
> index f5d89abdc5e..46dd07df0f2 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -2609,7 +2609,7 @@ int todo_list_parse_insn_buffer(struct repository *r, char *buf,
> char *p = buf, *next_p;
> int i, res = 0, fixup_okay = file_exists(rebase_path_done());
> > - todo_list->current = todo_list->nr = 0;
> + todo_list->current = todo_list->nr = todo_list->total_nr = 0;
> > for (i = 1; *p; i++, p = next_p) {
> char *eol = strchrnul(p, '\n');
> @@ -2628,7 +2628,8 @@ int todo_list_parse_insn_buffer(struct repository *r, char *buf,
> item->arg_offset = p - buf;
> item->arg_len = (int)(eol - p);
> item->commit = NULL;
> - }
> + } else if (item->command == TODO_COMMENT)
> + todo_list->total_nr--;
This feels a bit fragile, I think it would be better to count the commands properly in the first place rather than adjusting the total here. We could do that by not incrementing "todo_list->total_nr" in append_new_todo() and then doing
if (item->command != TODO_COMMENT)
todo_list->total_nr++;
here. We may want to stop counting invalid commands as well by only counting commands whre "item->command < TODO_COMMENT".
> > if (fixup_okay)
> ; /* do nothing */
> @@ -6039,7 +6040,7 @@ int complete_action(struct repository *r, struct replay_opts *opts, unsigned fla
> struct todo_list new_todo = TODO_LIST_INIT;
> struct strbuf *buf = &todo_list->buf, buf2 = STRBUF_INIT;
> struct object_id oid = onto->object.oid;
> - int res;
> + int new_total_nr, res;
> > find_unique_abbrev_r(shortonto, &oid, DEFAULT_ABBREV);
> > @@ -6066,6 +6067,7 @@ int complete_action(struct repository *r, struct replay_opts *opts, unsigned fla
> return error(_("nothing to do"));
> }
> > + new_total_nr = todo_list->total_nr - count_commands(todo_list);
> res = edit_todo_list(r, todo_list, &new_todo, shortrevisions,
> shortonto, flags);
> if (res == -1)
> @@ -6088,11 +6090,13 @@ int complete_action(struct repository *r, struct replay_opts *opts, unsigned fla
> return -1;
> }
> > + new_total_nr += count_commands(&new_todo);
> + new_todo.total_nr = new_total_nr;
> +
> /* Expand the commit IDs */
> todo_list_to_strbuf(r, &new_todo, &buf2, -1, 0);
> strbuf_swap(&new_todo.buf, &buf2);
> strbuf_release(&buf2);
> - new_todo.total_nr -= new_todo.nr;
I think if we just change this line to
new_todo.total_nr = 0;
we don't need any other changes to this function. This is because complete_action() is only called at the start of a rebase so we don't need to worry about "total_nr" including commands that have already been executed. The reason we need to set it to zero here is that we re-parse the todo list below which increments "total_nr" by the number of commands parsed.
Thanks for working on this.
Best Wishes
Phillip
> if (todo_list_parse_insn_buffer(r, new_todo.buf.buf, &new_todo) < 0)
> BUG("invalid todo list after expanding IDs:\n%s",
> new_todo.buf.buf);
> diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
> index f351701fec2..8da99a075dc 100755
> --- a/t/t3430-rebase-merges.sh
> +++ b/t/t3430-rebase-merges.sh
> @@ -517,4 +517,11 @@ test_expect_success '--rebase-merges with message matched with onto label' '
> EOF
> '
> > +test_expect_success 'progress shows the correct total' '
> + git checkout -b progress H &&
> + git rebase --rebase-merges --force-rebase --verbose A 2> err &&
> + grep "^Rebasing.*14.$" err >progress &&
> + test_line_count = 14 progress
> +'
> +
> test_done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Phillip Wood <phillip.wood123@gmail.com> writes:
>> @@ -2609,7 +2609,7 @@ int todo_list_parse_insn_buffer(struct repository *r, char *buf,
>> char *p = buf, *next_p;
>> int i, res = 0, fixup_okay = file_exists(rebase_path_done());
>> - todo_list->current = todo_list->nr = 0;
>> + todo_list->current = todo_list->nr = todo_list->total_nr = 0;
>> for (i = 1; *p; i++, p = next_p) {
>> char *eol = strchrnul(p, '\n');
>> @@ -2628,7 +2628,8 @@ int todo_list_parse_insn_buffer(struct repository *r, char *buf,
>> item->arg_offset = p - buf;
>> item->arg_len = (int)(eol - p);
>> item->commit = NULL;
>> - }
>> + } else if (item->command == TODO_COMMENT)
>> + todo_list->total_nr--;
>
> This feels a bit fragile, I think it would be better to count the
> commands properly in the first place rather than adjusting the total
> here. We could do that by not incrementing "todo_list->total_nr" in
> append_new_todo() and then doing
>
> if (item->command != TODO_COMMENT)
> todo_list->total_nr++;
>
> here. We may want to stop counting invalid commands as well by only
> counting commands whre "item->command < TODO_COMMENT".
OK.
>> @@ -6088,11 +6090,13 @@ int complete_action(struct repository *r, struct replay_opts *opts, unsigned fla
>> return -1;
>> }
>> + new_total_nr += count_commands(&new_todo);
>> + new_todo.total_nr = new_total_nr;
>> +
>> /* Expand the commit IDs */
>> todo_list_to_strbuf(r, &new_todo, &buf2, -1, 0);
>> strbuf_swap(&new_todo.buf, &buf2);
>> strbuf_release(&buf2);
>> - new_todo.total_nr -= new_todo.nr;
>
> I think if we just change this line to
>
> new_todo.total_nr = 0;
>
> we don't need any other changes to this function. This is because
> complete_action() is only called at the start of a rebase so we don't
> need to worry about "total_nr" including commands that have already
> been executed. The reason we need to set it to zero here is that we
> re-parse the todo list below which increments "total_nr" by the number
> of commands parsed.
>
> Thanks for working on this.
> Best Wishes
>
> Phillip
Thanks both for working on the fix and reviewing the patch.
User |
This patch series was integrated into seen via git@da98e7c. |
There was a status update in the "New Topics" section about the branch A few bugs in the sequencer machinery that results in miscounting the steps have been corrected. Waiting for responses to reviews. cf. <b481d212-89c5-5c8e-c99f-6df3909523c7@gmail.com> source: <pull.1531.git.1683759338.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@17cf13e. |
For regular, non-`--rebase-merges` runs, there is very little work to do for the parser when determining the total number of commands in a rebase script: it is simply the number of lines after stripping the commented lines and then trimming the trailing empty line, if any. The `--rebase-merges` mode complicates things by introducing empty lines and comments in the middle of the script. These should _not_ be counted as commands, and indeed, when an interactive rebase is interrupted and subsequently resumed, the total number of commands can magically shrink, sometimes dramatically. The reason for this strange behavior is that empty lines _are_ counted in `edit_todo_list()` (but not the comments, as they are stripped via `strbuf_stripspace(..., 1)`, which is a bug. Let's fix this so that the correct total number is shown from the get-go, by carefully adjusting it according to what's in the rebase script. Extra care needs to be taken in case the user edits the script: the number of commands might be different after the user edited than beforehand. Note: Even though commented lines are skipped in `edit_todo_list()`, we still need to handle `TODO_COMMENT` items by decrementing the already-incremented `total_nr` again: empty lines are also marked as `TODO_COMMENT`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
d12d546
to
cac809b
Compare
/submit |
Submitted as pull.1531.v2.git.1683965487.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
On the Git mailing list, Phillip Wood wrote (reply to this): Hi Dscho
On 13/05/2023 09:11, Johannes Schindelin via GitGitGadget wrote:
> I recently picked up work where I regularly rebase large branch thickets
> consisting of thousands of commits. During those rebases, I could not fail
> to notice that the progress initially showed a total number around 2,100,
> when the actual number of commands was more like 1,850. And indeed, when
> resuming the rebase after being interrupted due to a break command or due to
> a merge conflict, the progress showed the correct number!
> > So I set out to fix this, stumbling over an incorrect use of total_nr in the
> --update-refs code, so I fixed that, too.
> > Note: These patches apply cleanly on top of ds/rebase-update-ref as well as
> on top of the current main branch.
> > Changes since v1:
> > * Clarified the pattern expected by the test case in the progress output,
> as suggested by Junio.
> * Simplified the code as suggested by Phillipp, based on the insight that
> complete_action() (naming is hard!) is only called at the very beginning
> of a rebase, and therefore there cannot be any already-done commands in
> the todo list.
This version looks good to me, thanks for re-rolling
Phillip
> Johannes Schindelin (2):
> rebase --update-refs: fix loops
> rebase -r: fix the total number shown in the progress
> > sequencer.c | 13 ++++++++-----
> t/t3430-rebase-merges.sh | 8 ++++++++
> 2 files changed, 16 insertions(+), 5 deletions(-)
> > > base-commit: 4611884ea883908a9638cafbd824c401c41cf7f6
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1531%2Fdscho%2Ffix-rebase-i-progress-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1531/dscho/fix-rebase-i-progress-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/1531
> > Range-diff vs v1:
> > 1: 2ac7c7a7c61 = 1: 2ac7c7a7c61 rebase --update-refs: fix loops
> 2: d12d5469f8c ! 2: cac809bcffd rebase -r: fix the total number shown in the progress
> @@ Commit message
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ## sequencer.c ##
> +@@ sequencer.c: void todo_list_release(struct todo_list *todo_list)
> + static struct todo_item *append_new_todo(struct todo_list *todo_list)
> + {
> + ALLOC_GROW(todo_list->items, todo_list->nr + 1, todo_list->alloc);
> +- todo_list->total_nr++;
> + return todo_list->items + todo_list->nr++;
> + }
> +
> @@ sequencer.c: int todo_list_parse_insn_buffer(struct repository *r, char *buf,
> char *p = buf, *next_p;
> int i, res = 0, fixup_okay = file_exists(rebase_path_done());
> @@ sequencer.c: int todo_list_parse_insn_buffer(struct repository *r, char *buf,
> for (i = 1; *p; i++, p = next_p) {
> char *eol = strchrnul(p, '\n');
> @@ sequencer.c: int todo_list_parse_insn_buffer(struct repository *r, char *buf,
> - item->arg_offset = p - buf;
> - item->arg_len = (int)(eol - p);
> item->commit = NULL;
> -- }
> -+ } else if (item->command == TODO_COMMENT)
> -+ todo_list->total_nr--;
> + }
> > ++ if (item->command != TODO_COMMENT)
> ++ todo_list->total_nr++;
> ++
> if (fixup_okay)
> ; /* do nothing */
> + else if (is_fixup(item->command))
> @@ sequencer.c: int complete_action(struct repository *r, struct replay_opts *opts, unsigned fla
> - struct todo_list new_todo = TODO_LIST_INIT;
> - struct strbuf *buf = &todo_list->buf, buf2 = STRBUF_INIT;
> - struct object_id oid = onto->object.oid;
> -- int res;
> -+ int new_total_nr, res;
> -
> - find_unique_abbrev_r(shortonto, &oid, DEFAULT_ABBREV);
> -
> -@@ sequencer.c: int complete_action(struct repository *r, struct replay_opts *opts, unsigned fla
> - return error(_("nothing to do"));
> - }
> -
> -+ new_total_nr = todo_list->total_nr - count_commands(todo_list);
> - res = edit_todo_list(r, todo_list, &new_todo, shortrevisions,
> - shortonto, flags);
> - if (res == -1)
> -@@ sequencer.c: int complete_action(struct repository *r, struct replay_opts *opts, unsigned fla
> - return -1;
> - }
> -
> -+ new_total_nr += count_commands(&new_todo);
> -+ new_todo.total_nr = new_total_nr;
> -+
> - /* Expand the commit IDs */
> todo_list_to_strbuf(r, &new_todo, &buf2, -1, 0);
> strbuf_swap(&new_todo.buf, &buf2);
> strbuf_release(&buf2);
> - new_todo.total_nr -= new_todo.nr;
> ++ /* Nothing is done yet, and we're reparsing, so let's reset the count */
> ++ new_todo.total_nr = 0;
> if (todo_list_parse_insn_buffer(r, new_todo.buf.buf, &new_todo) < 0)
> BUG("invalid todo list after expanding IDs:\n%s",
> new_todo.buf.buf);
> @@ t/t3430-rebase-merges.sh: test_expect_success '--rebase-merges with message matc
> +test_expect_success 'progress shows the correct total' '
> + git checkout -b progress H &&
> + git rebase --rebase-merges --force-rebase --verbose A 2> err &&
> -+ grep "^Rebasing.*14.$" err >progress &&
> ++ # Expecting "Rebasing (N/14)" here, no bogus total number
> ++ grep "^Rebasing.*/14.$" err >progress &&
> + test_line_count = 14 progress
> +'
> +
> |
This patch series was integrated into seen via git@c50b864. |
This patch series was integrated into seen via git@1556499. |
This patch series was integrated into seen via git@c4fe1bc. |
This patch series was integrated into seen via git@0817253. |
There was a status update in the "Cooking" section about the branch A few bugs in the sequencer machinery that results in miscounting the steps have been corrected. Will merge to 'next'. source: <pull.1531.v2.git.1683965487.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@31365c6. |
This patch series was integrated into seen via git@48dad36. |
This patch series was integrated into seen via git@0a4dc1d. |
This patch series was integrated into seen via git@736b9f9. |
This patch series was integrated into seen via git@19770fa. |
This patch series was integrated into next via git@6503a9a. |
This patch series was integrated into seen via git@cacc15e. |
This patch series was integrated into master via git@cacc15e. |
This patch series was integrated into next via git@cacc15e. |
Closed via cacc15e. |
I recently picked up work where I regularly rebase large branch thickets consisting of thousands of commits. During those rebases, I could not fail to notice that the progress initially showed a total number around 2,100, when the actual number of commands was more like 1,850. And indeed, when resuming the rebase after being interrupted due to a
break
command or due to a merge conflict, the progress showed the correct number!So I set out to fix this, stumbling over an incorrect use of
total_nr
in the--update-refs
code, so I fixed that, too.Note: These patches apply cleanly on top of
ds/rebase-update-ref
as well as on top of the currentmain
branch.Changes since v1:
complete_action()
(naming is hard!) is only called at the very beginning of a rebase, and therefore there cannot be any already-done commands in the todo list.Cc: Derrick Stolee derrickstolee@github.com
cc: Phillip Wood phillip.wood123@gmail.com