-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
midx: reduce memory pressure while writing bitmaps #1292
Conversation
There are issues in commit 9bb826c: |
9bb826c
to
9104bc5
Compare
/submit |
Submitted as pull.1292.git.1658176565751.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this): On Mon, Jul 18 2022, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> [...]
> It is unfortunate that the lifetime of the 'entries' array is less
> clear. To make this simpler, I added a few things to try and prevent an
> accidental reference:
>
> 1. Using FREE_AND_NULL() we will at least get a segfault from reading a
> NULL pointer instead of a use-after-free.
>
> 2. 'entries_nr' is also set to zero to make any loop that would iterate
> over the entries be trivial.
>
> 3. Set the 'ctx' pointer to NULL within write_midx_bitmap() so it does
> not get another reference later. This requires adding a local copy
> of 'pack_order' giving us a reference that we can use later in the
> method.
>
> 4. Add significant comments in write_midx_bitmap() and
> write_midx_internal() to add warnings for future authors who might
> accidentally add references to this cleared memory.
> [...]
> + /*
> + * Remove the ctx.entries to reduce memory pressure.
> + * Nullify 'ctx' to help avoid adding new references to ctx->entries.
> + */
> + FREE_AND_NULL(ctx->entries);
> + ctx->entries_nr = 0;
> + pack_order = ctx->pack_order;
> + ctx = NULL;
After this change this is a ~70 line function, but only 3 lines at the
top actually use ctx for anything:
/* the bug check for ctx.nr... */
prepare_midx_packing_data(&pdata, ctx);
commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
Did you consider just splitting it up so that that there's a "prepare
write" function? Then you don't need to worry about the scoping of ctx.
I'd think that would be better, then you also wouldn't need to implement
your own free-ing, nothing after this seems to use ctx->entries_nr (but
I just skimmed it), so it could just fall through to the free() at the
end of write_midx_internal() (the only caller), couldn't it? |
User |
This branch is now known as |
This patch series was integrated into seen via git@b1a51ed. |
The next change will use a const array when calling this method. There is no need for the non-const version, so let's do this cleanup quickly. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The write_midx_bitmap() method is a long method that does a lot of steps. It requires the write_midx_context struct for use in prepare_midx_packing_data() and find_commits_for_midx_bitmap(), but after that only needs the pack_order array. This is a messy, but completely non-functional refactoring. The code is only being moved around to reduce visibility of the write_midx_context during the longest part of computing reachability bitmaps. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
We noticed that some 'git multi-pack-index write --bitmap' processes were running with very high memory. It turns out that a lot of this memory is required to store a list of every object in the written multi-pack-index, with a second copy that has additional information used for the bitmap writing logic. Using 'valgrind --tool=massif' before this change, the following chart shows how memory load increased and was maintained throughout the process: GB 4.102^ :: | @ @::@@::@@::::::::@::::::@@:#:::::::::::::@@:: : | :::::@@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :::: :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :::: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : 0 +---------------------------------------------------------------> It turns out that the 'struct write_midx_context' data is persisting through the life of the process, including the 'entries' array. This array is used last inside find_commits_for_midx_bitmap() within write_midx_bitmap(). If we free (and nullify) the array at that point, we can free a decent chunk of memory before the bitmap logic adds more to the memory footprint. Here is the massif memory load chart after this change: GB 3.111^ # | # :::::::::::@::::::::::::::@ | # ::::::::::::::::::::::::: : :: : @:: ::::: :: ::@ | @# :::::::::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :::@#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ 0 +---------------------------------------------------------------> The previous change introduced a refactoring of write_midx_bitmap() to make it more clear how much of the 'struct write_midx_context' instance is needed at different parts of the process. In addition, the following defensive programming measures were put in place: 1. Using FREE_AND_NULL() we will at least get a segfault from reading a NULL pointer instead of a use-after-free. 2. 'entries_nr' is also set to zero to make any loop that would iterate over the entries be trivial. 3. Add significant comments in write_midx_internal() to add warnings for future authors who might accidentally add references to this cleared memory. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
9104bc5
to
98e72f7
Compare
On the Git mailing list, Derrick Stolee wrote (reply to this): On 7/18/2022 5:47 PM, Ævar Arnfjörð Bjarmason wrote:
>
> On Mon, Jul 18 2022, Derrick Stolee via GitGitGadget wrote:
>
>> From: Derrick Stolee <derrickstolee@github.com>
>> [...]
>> It is unfortunate that the lifetime of the 'entries' array is less
>> clear. To make this simpler, I added a few things to try and prevent an
>> accidental reference:
>>
>> 1. Using FREE_AND_NULL() we will at least get a segfault from reading a
>> NULL pointer instead of a use-after-free.
>>
>> 2. 'entries_nr' is also set to zero to make any loop that would iterate
>> over the entries be trivial.
>>
>> 3. Set the 'ctx' pointer to NULL within write_midx_bitmap() so it does
>> not get another reference later. This requires adding a local copy
>> of 'pack_order' giving us a reference that we can use later in the
>> method.
>>
>> 4. Add significant comments in write_midx_bitmap() and
>> write_midx_internal() to add warnings for future authors who might
>> accidentally add references to this cleared memory.
>> [...]
>> + /*
>> + * Remove the ctx.entries to reduce memory pressure.
>> + * Nullify 'ctx' to help avoid adding new references to ctx->entries.
>> + */
>> + FREE_AND_NULL(ctx->entries);
>> + ctx->entries_nr = 0;
>> + pack_order = ctx->pack_order;
>> + ctx = NULL;
>
> After this change this is a ~70 line function, but only 3 lines at the
> top actually use ctx for anything:
>
> /* the bug check for ctx.nr... */
> prepare_midx_packing_data(&pdata, ctx);
> commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
>
> Did you consider just splitting it up so that that there's a "prepare
> write" function? Then you don't need to worry about the scoping of ctx.
I did not, and that's a good suggestion. Extracting these prepare steps
into write_midx_internal() works to reduce the complexity and make the
early free()ing more clear.
> I'd think that would be better, then you also wouldn't need to implement
> your own free-ing, nothing after this seems to use ctx->entries_nr (but
> I just skimmed it), so it could just fall through to the free() at the
> end of write_midx_internal() (the only caller), couldn't it?
I think this paragraph misunderstands the point. The bitmaps are being
computed and written before the MIDX lock file completes, so the free()
of the entries array is after the bitmaps are computed. To reduce the
memory pressure (by ~25%) by freeing early is the point of this patch.
We still want that free(ctx.entries) after the cleanup: label for the
error cases, but for the "happy path" we can free early.
By doing the refactoring, this point of having an earlier free() makes
things more clear.
Thanks,
-Stolee |
/submit |
Submitted as pull.1292.v2.git.1658244366.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
@@ -1053,40 +1053,35 @@ static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr | |||
return cb.commits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> diff --git a/midx.c b/midx.c
> index e2dd808b35d..772ab7d2944 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -1451,6 +1451,15 @@ static int write_midx_internal(const char *object_dir,
>
> commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, &ctx);
>
> + /*
> + * The previous steps translated the information from
> + * 'entries' into information suitable for constructing
> + * bitmaps. We no longer need that array, so clear it to
> + * reduce memory pressure.
> + */
> + FREE_AND_NULL(ctx.entries);
> + ctx.entries_nr = 0;
> +
> if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata,
> commits, commits_nr, ctx.pack_order,
> refs_snapshot, flags) < 0) {
As the reduced helper, thanks to step [1/3], only takes the
pack_order[] array, without being even aware of other members in the
ctx struct, it is immediately obvious that this early freeing is
safe for this call. It is a bit messy.
I've been staring at the code and was wondering if we can just get
rid of pack_order member from the context, and make pack_order a
separate local variable that belong to this function. The separate
variable needs to be packaged together with ctx back to please the
chunk-format API, so it may require more boilerplate code and may
not be an overall win.
> @@ -1459,6 +1468,10 @@ static int write_midx_internal(const char *object_dir,
> goto cleanup;
> }
> }
> + /*
> + * NOTE: Do not use ctx.entries beyond this point, since it might
> + * have been freed in the previous if block.
> + */
OK.
> if (ctx.m)
> close_object_store(the_repository->objects);
This patch series was integrated into seen via git@134fbc6. |
This patch series was integrated into seen via git@8dc290c. |
There was a status update in the "New Topics" section about the branch The codepath to write multi-pack index has been taught to release a large chunk of memory that holds an array of objects in the packs, as soon as it is done with the array, to reduce memory consumption. Will merge to 'next'? source: <pull.1292.v2.git.1658244366.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@96374cc. |
This patch series was integrated into seen via git@19229e3. |
This patch series was integrated into next via git@250d257. |
We noticed an instance where writing multi-pack-index bitmaps was taking a lot of memory. This small change can reduce the memory pressure slightly (~25%), but more will be needed to significantly reduce the memory pressure. Such a change would require updating the bitmap writing code to use on-disk data structures instead. This is particularly tricky when the multi-pack-index has not been fully written, because we don't want a point in time where the object store has a new multi-pack-index without a reachability bitmap.
Updates in v2
To reduce confusion on the lifetime of 'ctx.entries', some refactoring patches are inserted to first extract the use of 'ctx' out of write_midx_bitmap() and into write_midx_internal(). This makes the FREE_AND_NULL() stand out more clearly.
Thanks,
-Stolee
cc: gitster@pobox.com
cc: me@ttaylorr.com
cc: vdye@github.com
cc: chakrabortyabhradeep79@gmail.com
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com