-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
This fixes a minor memory leak (detected by LeakSanitizer) in git merge #1577
base: master
Are you sure you want to change the base?
This fixes a minor memory leak (detected by LeakSanitizer) in git merge #1577
Conversation
40a55c3
to
64b00e4
Compare
Welcome to GitGitGadgetHi @kevinbackhouse, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests. Please make sure that your Pull Request has a good description, as it will be used as cover letter. You can CC potential reviewers by adding a footer to the PR description with the following syntax:
Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:
It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code. Contributing the patchesBefore you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form Both the person who commented An alternative is the channel
Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment If you want to see what email(s) would be sent for a After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail). If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the curl -g --user "<EMailAddress>:<Password>" \
--url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):
To send a new iteration, just add another PR comment with the contents: Need help?New contributors who want advice are encouraged to join git-mentoring@googlegroups.com, where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join. You may also be able to find help in real time in the developer IRC channel, |
/allow |
Error: User kevinbackhouse is not yet permitted to use GitGitGadget |
/allow |
User kevinbackhouse is now allowed to use GitGitGadget. |
/submit |
Submitted as pull.1577.git.1692389061490.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
On the Git mailing list, Junio C Hamano wrote (reply to this): "Kevin Backhouse via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Kevin Backhouse <kevinbackhouse@github.com>
>
> To reproduce (with an ASAN build):
>
> ```
> mkdir test
> cd test
> git init
> echo x > x.txt
> git add .
> git commit -m "WIP"
> git checkout -b dev
> echo y > x.txt
> git add .
> git commit -m "WIP"
> git checkout main
> echo z > x.txt
> git add .
> git commit -m "WIP"
> echo a > x.txt
> git add .
> git merge dev
> ```
We'd rather not to see the above in the proposed log message; can't
we add (a variation of) it to our test suite?
> The fix is to call free_commit_list(merge_bases) when an error occurs.
We usually have the description of what the problem is and give an
analysis on why/how it happens, before presenting a solution. Write
it more like:
The caller of merge_ort_recursive() expects the commit list
passed in as the merge_bases parameter to be fully consumed by
the function and does not free it when the function returns. In
normal cases, the commit list does get consumed, but when the
function returns early upon encountering an error, it forgets to
clean it up.
Fix this by freeing the list in the code paths for error returns.
> merge-ort-wrappers.c | 4 +++-
> merge-ort.c | 4 +++-
These two places and their fixes seem OK, but I have to wonder if
these are complete fixes.
> diff --git a/merge-ort-wrappers.c b/merge-ort-wrappers.c
> index 4acedf3c338..aeb56c9970c 100644
> --- a/merge-ort-wrappers.c
> +++ b/merge-ort-wrappers.c
> @@ -54,8 +54,10 @@ int merge_ort_recursive(struct merge_options *opt,
> struct tree *head = repo_get_commit_tree(opt->repo, side1);
> struct merge_result tmp;
>
> - if (unclean(opt, head))
> + if (unclean(opt, head)) {
> + free_commit_list(merge_bases);
> return -1;
> + }
>
> memset(&tmp, 0, sizeof(tmp));
> merge_incore_recursive(opt, merge_bases, side1, side2, &tmp);
The function before this hunk appears to have very similar code
structure. Does it need the same fix, or if not why not?
> diff --git a/merge-ort.c b/merge-ort.c
> index 8631c997002..a0eb91fb011 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -5070,8 +5070,10 @@ static void merge_ort_internal(struct merge_options *opt,
> opt->branch1 = "Temporary merge branch 1";
> opt->branch2 = "Temporary merge branch 2";
> merge_ort_internal(opt, NULL, prev, next, result);
> - if (result->clean < 0)
> + if (result->clean < 0) {
> + free_commit_list(merge_bases);
> return;
> + }
Before this function, there is a comment that this came from another
function and it seems to still have a very similar code structure.
Does the other function need the same fix, or if not why not?
Thanks. |
64b00e4
to
7f06cf2
Compare
There are issues in commit 7d1202b: |
7f06cf2
to
edf3107
Compare
There are issues in commit d751eee: |
Signed-off-by: Kevin Backhouse <kevinbackhouse@github.com>
edf3107
to
db220a3
Compare
Signed-off-by: Kevin Backhouse <kevinbackhouse@github.com>
db220a3
to
e81af34
Compare
Signed-off-by: Kevin Backhouse <kevinbackhouse@github.com>
0466052
to
7fff1ca
Compare
There are issues in commit 7fff1ca: |
7fff1ca
to
f5af0c7
Compare
Signed-off-by: Kevin Backhouse <kevinbackhouse@github.com>
The callers of merge_recursive() and merge_ort_recursive() expects the commit list passed in as the merge_bases parameter to be fully consumed by the function and does not free it when the function returns. In normal cases, the commit list does get consumed, but when the function returns early upon encountering an error, it forgets to clean it up. Fix this by freeing the list in the code paths for error returns. Signed-off-by: Kevin Backhouse <kevinbackhouse@github.com>
f5af0c7
to
353e196
Compare
/submit |
Submitted as pull.1577.v2.git.1692886365.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
@@ -0,0 +1,40 @@ | |||
#!/bin/sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Kevin Backhouse via GitGitGadget" <gitgitgadget@gmail.com> writes:
> Subject: Re: [PATCH v2 1/2] Regression test for https://github.com/gitgitgadget/git/pull/1577
We try to come up with titles that are helpful to readers when seen
in "git shortlog --since=6.months --no-merges", and the above does
not exactly it.
> From: Kevin Backhouse <kevinbackhouse@github.com>
>
> Signed-off-by: Kevin Backhouse <kevinbackhouse@github.com>
> ---
> t/t9904-merge-leak.sh | 40 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 40 insertions(+)
> create mode 100755 t/t9904-merge-leak.sh
>
> diff --git a/t/t9904-merge-leak.sh b/t/t9904-merge-leak.sh
> new file mode 100755
> index 00000000000..09a4474fd73
> --- /dev/null
> +++ b/t/t9904-merge-leak.sh
> @@ -0,0 +1,40 @@
> +#!/bin/sh
> +#
> +
> +test_description='regression test for memory leak in git merge'
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +
> +. ./lib-bash.sh
> +
> +# test-lib.sh disables LeakSanitizer by default, but we want it enabled
> +# for this test
> +ASAN_OPTIONS=
> +export ASAN_OPTIONS
You do not want to do this.
We have CI jobs that run everybody under asan, ubsan etc., so it is
sufficient and much more preferrable to just add a reproduction
recipe to an _existing_ test that is about "git merge" (or if we
have "ort" specific one, "git merge -s ort"). Of course they would
not fail in jobs that do not enable asan, and that is expected and
perfectly OK.
Also, please check Documentation/CodingGuidelines for shell style
issues.
> +. "$GIT_BUILD_DIR/contrib/completion/git-prompt.sh"
Is this about testing prompts, or does the bug/leak appear only when
the prompt support is in use? Could you explain why this is needed?
> +test_expect_success 'Merge fails due to local changes' '
> + git init &&
> + echo x > x.txt &&
> + git add . &&
> + git commit -m "WIP" &&
> + git checkout -b dev &&
> + echo y > x.txt &&
> + git add . &&
> + git commit -m "WIP" &&
> + git checkout main &&
> + echo z > x.txt &&
> + git add . &&
> + git commit -m "WIP" &&
> + echo a > x.txt &&
> + git add . &&
> + echo "error: ''Your local changes to the following files would be overwritten by merge:''" >expected &&
> + echo " x.txt" >>expected &&
> + echo "Merge with strategy ort failed." >>expected &&
> + test_must_fail git merge -s ort dev 2>actual &&
> + test_cmp expected actual
> +'
If this 1/2 adds a new test that is expected to fail without leak
fix, which has to wait until 2/2, it breaks the bisection. In this
case, since it will be a simple addition to an existing test script,
having both tests and code changes in a single patch is the most
appropriate.
Thank you for working on this.
> +
> +test_done
@@ -54,8 +54,10 @@ int merge_ort_recursive(struct merge_options *opt, | |||
struct tree *head = repo_get_commit_tree(opt->repo, side1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Kevin Backhouse via GitGitGadget" <gitgitgadget@gmail.com> writes:
> Subject: Re: [PATCH v2 2/2] Fix minor memory leak found by LeakSanitizer.
Continuing the review for the previous step, perhaps
Subject: [PATCH] merge: free list of merge bases upon failure
or something?
> From: Kevin Backhouse <kevinbackhouse@github.com>
>
> The callers of merge_recursive() and merge_ort_recursive() expects the
"expects" -> "expect"
> commit list passed in as the merge_bases parameter to be fully
> consumed by the function and does not free it when the function
"does not" -> "do not".
> returns. In normal cases, the commit list does get consumed, but when
> the function returns early upon encountering an error, it forgets to
> clean it up.
>
> Fix this by freeing the list in the code paths for error returns.
>
> Signed-off-by: Kevin Backhouse <kevinbackhouse@github.com>
> ---
Well written to be understandable. Nicely done.
> merge-ort-wrappers.c | 4 +++-
> merge-ort.c | 4 +++-
> merge-recursive.c | 32 ++++++++++++++++++++++----------
> 3 files changed, 28 insertions(+), 12 deletions(-)
>
> diff --git a/merge-ort-wrappers.c b/merge-ort-wrappers.c
> index 4acedf3c338..aeb56c9970c 100644
> --- a/merge-ort-wrappers.c
> +++ b/merge-ort-wrappers.c
> @@ -54,8 +54,10 @@ int merge_ort_recursive(struct merge_options *opt,
> struct tree *head = repo_get_commit_tree(opt->repo, side1);
> struct merge_result tmp;
>
> - if (unclean(opt, head))
> + if (unclean(opt, head)) {
> + free_commit_list(merge_bases);
> return -1;
> + }
OK.
> diff --git a/merge-ort.c b/merge-ort.c
> index 8631c997002..a0eb91fb011 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -5070,8 +5070,10 @@ static void merge_ort_internal(struct merge_options *opt,
> opt->branch1 = "Temporary merge branch 1";
> opt->branch2 = "Temporary merge branch 2";
> merge_ort_internal(opt, NULL, prev, next, result);
> - if (result->clean < 0)
> + if (result->clean < 0) {
> + free_commit_list(merge_bases);
> return;
> + }
OK.
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 6a4081bb0f5..49e54d3722f 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -3652,14 +3652,18 @@ static int merge_recursive_internal(struct merge_options *opt,
> opt->branch1 = "Temporary merge branch 1";
> opt->branch2 = "Temporary merge branch 2";
> if (merge_recursive_internal(opt, merged_merge_bases, iter->item,
> - NULL, &merged_merge_bases) < 0)
> - return -1;
> + NULL, &merged_merge_bases) < 0) {
> + clean = -1;
> + goto out;
> + }
> opt->branch1 = saved_b1;
> opt->branch2 = saved_b2;
> opt->priv->call_depth--;
>
> - if (!merged_merge_bases)
> - return err(opt, _("merge returned no commit"));
> + if (!merged_merge_bases) {
> + clean = err(opt, _("merge returned no commit"));
> + goto out;
> + }
> }
>
> /*
> @@ -3682,8 +3686,11 @@ static int merge_recursive_internal(struct merge_options *opt,
> repo_get_commit_tree(opt->repo,
> merged_merge_bases),
> &result_tree);
> +
> +out:
> strbuf_release(&merge_base_abbrev);
> opt->ancestor = NULL; /* avoid accidental re-use of opt->ancestor */
> + free_commit_list(merge_bases);
> if (clean < 0) {
> flush_output(opt);
> return clean;
Hmph, so the proposed log message made it sound like the merge_bases
list is consumed fully in the normal non-error case, but even the
normal case was leaky on the "-s recursive" side? Or was the
recursive side was OK and the caller had different expectations, in
which case we may be breaking them, but you poked at these codepaths
long enough to produce this patch, so I doubt it. The proposed log
message needs to be updated to explain the findings on this side,
too, if the situation is different from the "ort" side.
> @@ -3729,6 +3736,9 @@ static int merge_start(struct merge_options *opt, struct tree *head)
> assert(!opt->record_conflict_msgs_as_headers);
> assert(!opt->msg_header_prefix);
>
> + CALLOC_ARRAY(opt->priv, 1);
> + string_list_init_dup(&opt->priv->df_conflict_file_set);
This move, what it does, why it is needed, and what breaks without
it, is not explained in the proposed log message.
> /* Sanity check on repo state; index must match head */
> if (repo_index_has_changes(opt->repo, head, &sb)) {
> err(opt, _("Your local changes to the following files would be overwritten by merge:\n %s"),
> @@ -3737,16 +3747,13 @@ static int merge_start(struct merge_options *opt, struct tree *head)
> return -1;
> }
>
> - CALLOC_ARRAY(opt->priv, 1);
> - string_list_init_dup(&opt->priv->df_conflict_file_set);
> return 0;
> }
> static void merge_finalize(struct merge_options *opt)
> {
> flush_output(opt);
> - if (!opt->priv->call_depth && opt->buffer_output < 2)
> - strbuf_release(&opt->obuf);
> + strbuf_release(&opt->obuf);
Ditto. Unconditional release here may help the new caller in
merge_trees() that failed merge_start(), but is the change safe for
other existing callers and if so why/how?
In any case, this needs a review by somebody more familiar with the
recursive backend machinery than myself. Any takers?
> if (show(opt, 2))
> diff_warn_rename_limit("merge.renamelimit",
> opt->priv->needed_rename_limit, 0);
> @@ -3763,8 +3770,10 @@ int merge_trees(struct merge_options *opt,
>
> assert(opt->ancestor != NULL);
>
> - if (merge_start(opt, head))
> + if (merge_start(opt, head)) {
> + merge_finalize(opt);
> return -1;
> + }
> clean = merge_trees_internal(opt, head, merge, merge_base, &ignored);
> merge_finalize(opt);
>
> @@ -3785,8 +3794,11 @@ int merge_recursive(struct merge_options *opt,
> prepare_repo_settings(opt->repo);
> opt->repo->settings.command_requires_full_index = 1;
>
> - if (merge_start(opt, repo_get_commit_tree(opt->repo, h1)))
> + if (merge_start(opt, repo_get_commit_tree(opt->repo, h1))) {
> + free_commit_list(merge_bases);
> + merge_finalize(opt);
> return -1;
> + }
I suspect that the way leaks happen is different between "ort" and
"recursive", and what is in the proposed log message may have been
the right description of the problem back when the patch was only
about fixing "ort" but no longer is sufficient now that we also fix
the "recursive" side.
> clean = merge_recursive_internal(opt, h1, h2, merge_bases, result);
> merge_finalize(opt);
Hmph, but this does expect merge_bases is consumed in normal
codepath. Now I am confused, sorry.
Thanks for working on this.
On the Git mailing list, Elijah Newren wrote (reply to this): On Fri, Aug 18, 2023 at 2:41 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Kevin Backhouse via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Kevin Backhouse <kevinbackhouse@github.com>
> >
> > To reproduce (with an ASAN build):
> >
> > ```
> > mkdir test
> > cd test
> > git init
> > echo x > x.txt
> > git add .
> > git commit -m "WIP"
> > git checkout -b dev
> > echo y > x.txt
> > git add .
> > git commit -m "WIP"
> > git checkout main
> > echo z > x.txt
> > git add .
> > git commit -m "WIP"
> > echo a > x.txt
> > git add .
> > git merge dev
> > ```
>
> We'd rather not to see the above in the proposed log message; can't
> we add (a variation of) it to our test suite?
>
> > The fix is to call free_commit_list(merge_bases) when an error occurs.
>
> We usually have the description of what the problem is and give an
> analysis on why/how it happens, before presenting a solution. Write
> it more like:
>
> The caller of merge_ort_recursive() expects the commit list
> passed in as the merge_bases parameter to be fully consumed by
> the function and does not free it when the function returns. In
> normal cases, the commit list does get consumed, but when the
> function returns early upon encountering an error, it forgets to
> clean it up.
>
> Fix this by freeing the list in the code paths for error returns.
>
> > merge-ort-wrappers.c | 4 +++-
> > merge-ort.c | 4 +++-
>
> These two places and their fixes seem OK, but I have to wonder if
> these are complete fixes.
>
> > diff --git a/merge-ort-wrappers.c b/merge-ort-wrappers.c
> > index 4acedf3c338..aeb56c9970c 100644
> > --- a/merge-ort-wrappers.c
> > +++ b/merge-ort-wrappers.c
> > @@ -54,8 +54,10 @@ int merge_ort_recursive(struct merge_options *opt,
> > struct tree *head = repo_get_commit_tree(opt->repo, side1);
> > struct merge_result tmp;
> >
> > - if (unclean(opt, head))
> > + if (unclean(opt, head)) {
> > + free_commit_list(merge_bases);
> > return -1;
> > + }
> >
> > memset(&tmp, 0, sizeof(tmp));
> > merge_incore_recursive(opt, merge_bases, side1, side2, &tmp);
>
> The function before this hunk appears to have very similar code
> structure. Does it need the same fix, or if not why not?
>
> > diff --git a/merge-ort.c b/merge-ort.c
> > index 8631c997002..a0eb91fb011 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -5070,8 +5070,10 @@ static void merge_ort_internal(struct merge_options *opt,
> > opt->branch1 = "Temporary merge branch 1";
> > opt->branch2 = "Temporary merge branch 2";
> > merge_ort_internal(opt, NULL, prev, next, result);
> > - if (result->clean < 0)
> > + if (result->clean < 0) {
> > + free_commit_list(merge_bases);
> > return;
> > + }
>
> Before this function, there is a comment that this came from another
> function and it seems to still have a very similar code structure.
> Does the other function need the same fix, or if not why not?
The other function would need a more involved fix, which would
basically involve porting a59b8dd94f (merge-ort: fix memory leak in
merge_ort_internal(), 2022-01-20) to merge-recursive as a preparatory
step. This particular cleanup cannot be ported in its current form to
merge-recursive.c until then. |
User |
Hi Junio,
Thank you for your comments. As you suggested, I have added similar fixes in merge-recursive.c and updated the commit message. I have also added a test.
Thanks,
Kev
cc: Elijah Newren newren@gmail.com