-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
midx: apply gitconfig to midx repack #626
Conversation
Welcome to GitGitGadgetHi @sluongng, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests. Please make sure that your Pull Request has a good description, as it will be used as cover letter. Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:
It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code. Contributing the patchesBefore you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form Both the person who commented An alternative is the channel
Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment If you want to see what email(s) would be sent for a After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail). If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the curl -g --user "<EMailAddress>:<Password>" \
--url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):
To send a new iteration, just add another PR comment with the contents: Need help?New contributors who want advice are encouraged to join git-mentoring@googlegroups.com, where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join. You may also be able to find help in real time in the developer IRC channel, |
There is an issue in commit 87b41e08dbb9cdd9093b71f8103984e76fd1d914: |
a89952d
to
215c882
Compare
I tried the IRC but it largely inactive / AFK @dscho I recall you are a bit active checking the PRs in this project. Could you please /allow me? 🙇 |
/allow |
User sluongng is now allowed to use GitGitGadget. |
/preview |
Preview email sent as pull.626.git.1588682549235.gitgitgadget@gmail.com |
/preview |
Preview email sent as pull.626.git.1588683676309.gitgitgadget@gmail.com |
/submit |
Submitted as pull.626.git.1588684003766.gitgitgadget@gmail.com |
On the Git mailing list, Derrick Stolee wrote (reply to this):
|
On the Git mailing list, Son Luong Ngoc wrote (reply to this):
|
On the Git mailing list, Son Luong Ngoc wrote (reply to this):
|
215c882
to
4e855e6
Compare
/preview |
Preview email sent as pull.626.v2.git.1588757046.gitgitgadget@gmail.com |
4e855e6
to
9c8623b
Compare
/preview |
Preview email sent as pull.626.v2.git.1588757619.gitgitgadget@gmail.com |
9c8623b
to
3d7b334
Compare
/preview |
Preview email sent as pull.626.v2.git.1588758050.gitgitgadget@gmail.com |
/submit |
Submitted as pull.626.v2.git.1588758194.gitgitgadget@gmail.com |
midx.c
Outdated
@@ -1369,6 +1369,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, | |||
struct child_process cmd = CHILD_PROCESS_INIT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 5/6/2020 5:43 AM, Son Luong Ngoc via GitGitGadget wrote:
> From: Son Luong Ngoc <sluongng@gmail.com>
...
> - `repack.writeBitmaps` when `--batch-size=0` was NOT adopted here as it
> requires `--all` to be passed onto `git pack-objects`, which is very
> slow. I think it would be nice to have this in a future patch.
Just my two cents here: the reachability bitmaps are really tied to the
idea of a single pack right now. To create bitmaps, I would currently
suggest using the 'git repack' builtin with the proper options. That
command deletes the multi-pack-index, unfortunately, but it also produces
a single pack and deletes the others (when creating bitmaps).
You are right that the `--all` option required to pack-objects is not
appropriate to add inside `git multi-pack-index repack` as that changes
the pattern. It requires loading all reachable objects, even if they are
not already in packs covered by the multi-pack-index. This at minimum
violates expectations with the --batch-size argument.
Integrating reachability bitmaps more closely with the multi-pack-index
is certainly on our radar, but is a large endeavor.
This new patch looks good to me.
Thanks,
-Stolee
midx.c
Outdated
@@ -1369,6 +1369,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, | |||
struct child_process cmd = CHILD_PROCESS_INIT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes:
> Multi-Pack-Index repack is an incremental, repack solutions
> that allows user to consolidate multiple packfiles in a non-disruptive
> way. However the new packfile could be created without some of the
> capabilities of a packfile that is created by calling `git repack`.
It may be clear to you who wrote the patch, but it is quite unclear
to readers how `repack` gets into the picture. The first sentence
talks about what "git multi-pack-index repack" subcommand. Unless
you mention that that "git multi-pack-index repack" subcommand calls
"git repack" under the hood in order to create a new packfile, the
second paragraph can be read as if you are pointing out a problem if
the user did
$ git multi-pack-index repack
$ git repack
and the explicit "repack" initiated by the user may create a
packfile that is somehow incompatible with what the previous repack
wanted to do, or something like that.
> This is because with `git repack`, there are configuration that would
> enable different flags to be passed down to `git pack-objects` plumbing.
And this does not help to clear the possible confusion, either.
I think all of the above is clearer if you rewrite the above
(including the title) like so:
midx: teach "git multi-pack-index repack" honor "git repack" configuration
When the "repack" subcommand of "git multi-pack-index" command
creates new packfile(s), it does not call the "git repack"
command but instead directly calls the "git pack-objects"
command, and the configuration variables meant for the "git
repack" command, like "repack.usedaeltabaseoffset", are ignored.
Now the problem description is behind us, let's see the description
of proposed solution. We write this part in imperative mood, as if
we are giving an order to the codebase to "become like so". We do
not say "I do X, I do Y".
> In this patch, I applies those flags into `git multi-pack-index repack`
> so that it respect the `repack.*` config series.
Check the configuration variables used by "git repack" ourselves
and pass the corresponding options to underlying "git pack-objects"
in this codepath.
> Note:
> - `repack.packKeptObjects` will be addressed by Derrick Stolee in
> the following patch
This definitely does not belong to the commit log message. It would
make a helpful note meant for the reviewers if written below the
three-dash line, though.
> - `repack.writeBitmaps` when `--batch-size=0` was NOT adopted here as it
> requires `--all` to be passed onto `git pack-objects`, which is very
> slow. I think it would be nice to have this in a future patch.
The phrasing makes it hard to grok. Do you want to say that the
repack.writeBitmaps configuration variable is ignored?
I think Derrick gave you the reason why bitmaps is not compatible
with midx in general, and that would be a better rationale to record
why the configuration is ignored. Perhaps like
Note that `repack.writeBitmaps` configuration is ignored, as the
pack bitmap faciility is useful only with a single packfile.
or something like that?
Do we need to worry about the configuration variables understood by
the "git pack-objects" command to get in the way, by the way?
"pack.packsizelimit" may cause "git repack" to produce more than one
packfile, and if this codepath wants to avoid it (I do not know if
that is the case), it may have to override it from the command line,
for example.
> Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
> ---
> midx.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/midx.c b/midx.c
> index 9a61d3b37d9..3348f8e569b 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -1369,6 +1369,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
> struct child_process cmd = CHILD_PROCESS_INIT;
> struct strbuf base_name = STRBUF_INIT;
> struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
> + int delta_base_offset = 1;
By default we use delta-base-offset, so if repo_config_get_bool()
did not see the repack.usedeltabaseoffset configuration defined in
any configuration file, we still want to see 1 after it returns.
> + int use_delta_islands;
What is the reason why it is safe to leave this uninitialized? Did
you mean
int use_delta_islands = 0;
here?
> @@ -1381,12 +1383,20 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
> } else if (fill_included_packs_all(m, include_pack))
> goto cleanup;
>
> + repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
> + repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
> +
> argv_array_push(&cmd.args, "pack-objects");
>
> strbuf_addstr(&base_name, object_dir);
> strbuf_addstr(&base_name, "/pack/pack");
> argv_array_push(&cmd.args, base_name.buf);
>
> + if (delta_base_offset)
> + argv_array_push(&cmd.args, "--delta-base-offset");
> + if (use_delta_islands)
> + argv_array_push(&cmd.args, "--delta-islands");
> +
These look like good changes.
> if (flags & MIDX_PROGRESS)
> argv_array_push(&cmd.args, "--progress");
> else
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Son Luong Ngoc wrote (reply to this):
Hi Junio,
Thanks for the feedbacks
> On May 6, 2020, at 19:03, Junio C Hamano <gitster@pobox.com> wrote:
...
> We write this part in imperative mood, as if
> we are giving an order to the codebase to "become like so". We do
> not say "I do X, I do Y".
This is a great feedback.
I will try to include all of your suggestions and edit the message
before submitting V3.
>> Note:
>> - `repack.packKeptObjects` will be addressed by Derrick Stolee in
>> the following patch
>
> This definitely does not belong to the commit log message. It would
> make a helpful note meant for the reviewers if written below the
> three-dash line, though.
Duly noted.
> Do we need to worry about the configuration variables understood by
> the "git pack-objects" command to get in the way, by the way?
> "pack.packsizelimit" may cause "git repack" to produce more than one
> packfile, and if this codepath wants to avoid it (I do not know if
> that is the case), it may have to override it from the command line,
> for example.
I dont think we want to avoid the packsizelimit here.
The point of repacking with midx is to help
end users consolidate multiple packfile in a non-disruptive way.
If you wish to put a constraint (i.e. packsizelimit, packKeptObjects) on this process,
you should be able to.
>> Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
>> ---
>> midx.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/midx.c b/midx.c
>> index 9a61d3b37d9..3348f8e569b 100644
>> --- a/midx.c
>> +++ b/midx.c
>> @@ -1369,6 +1369,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
>> struct child_process cmd = CHILD_PROCESS_INIT;
>> struct strbuf base_name = STRBUF_INIT;
>> struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
>> + int delta_base_offset = 1;
>
> By default we use delta-base-offset, so if repo_config_get_bool()
> did not see the repack.usedeltabaseoffset configuration defined in
> any configuration file, we still want to see 1 after it returns.
>
>> + int use_delta_islands;
>
> What is the reason why it is safe to leave this uninitialized? Did
> you mean
>
> int use_delta_islands = 0;
>
> here?
I think I totally misread how repo_config_get_bool() supposed to work
Your comment here made me re-read it and things got a lot clearer.
Will set the default value to 0 in next version.
> Thanks.
Much appreciate,
Son Luong.
39178b7
to
efeb3d7
Compare
/preview |
Preview email sent as pull.626.v3.git.1589034075.gitgitgadget@gmail.com |
/submit |
Submitted as pull.626.v3.git.1589034270.gitgitgadget@gmail.com |
@@ -56,6 +56,9 @@ repack:: | |||
file is created, rewrite the multi-pack-index to reference the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Đoàn Trần Công Danh wrote (reply to this):
On 2020-05-09 14:24:29+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> +test_expect_success 'repack respects repack.packKeptObjects=false' '
> + test_when_finished rm -f dup/.git/objects/pack/*keep &&
> + (
> + cd dup &&
> + ls .git/objects/pack/*idx >idx-list &&
I think ls(1) is an overkill.
I think:
echo .git/objects/pack/*idx
is more efficient.
> + test_line_count = 5 idx-list &&
> + ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&
Likewise.
> + for keep in $(cat keep-list)
> + do
> + touch $keep || return 1
Is this intended?
Since touch(1) accepts multiple files as argument.
> + done &&
> + git multi-pack-index repack --batch-size=0 &&
> + ls .git/objects/pack/*idx >idx-list &&
> + test_line_count = 5 idx-list &&
> + test-tool read-midx .git/objects | grep idx >midx-list &&
> + test_line_count = 5 midx-list &&
> + THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
This line is overly long.
Should we write test-tool's output to temp file and process it?
And I think either
sed -n '3{p;q}'
or:
sed -n 3p
is cleaner than
head -n 3 | tail -n 1
> + BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
I think we're better to make this correct in this patch instead of
spend a dollar here, than take it back in the next patch.
> + git multi-pack-index repack --batch-size=$BATCH_SIZE &&
> + ls .git/objects/pack/*idx >idx-list &&
--
Danh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Đoàn Trần Công Danh <congdanhqx@gmail.com> writes:
> On 2020-05-09 14:24:29+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> +test_expect_success 'repack respects repack.packKeptObjects=false' '
>> + test_when_finished rm -f dup/.git/objects/pack/*keep &&
>> + (
>> + cd dup &&
>> + ls .git/objects/pack/*idx >idx-list &&
>
> I think ls(1) is an overkill.
> I think:
>
> echo .git/objects/pack/*idx
>
> is more efficient.
When there is no file whose name ends with idx, what happens?
$ ls *idx && echo OK
ls: cannot access '*idx': No such file or directory
$ echo *idx && echo OK
*idx
OK
>> + test_line_count = 5 idx-list &&
>> + ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&
>
> Likewise.
Likewise.
>> + for keep in $(cat keep-list)
>> + do
>> + touch $keep || return 1
>
> Is this intended?
> Since touch(1) accepts multiple files as argument.
Good suggestion, but doesn't .keep file record why the pack is kept
in real life (i.e. not an empty file)?
>> + done &&
>> + git multi-pack-index repack --batch-size=0 &&
>> + ls .git/objects/pack/*idx >idx-list &&
>> + test_line_count = 5 idx-list &&
>> + test-tool read-midx .git/objects | grep idx >midx-list &&
>> + test_line_count = 5 midx-list &&
>> + THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
>
> This line is overly long.
> Should we write test-tool's output to temp file and process it?
>
> And I think either
>
> sed -n '3{p;q}'
>
> or:
>
> sed -n 3p
>
> is cleaner than
>
> head -n 3 | tail -n 1
"sed -n 3p" is the only valid way to write it ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Đoàn Trần Công Danh wrote (reply to this):
On 2020-05-09 10:33:30-0700, Junio C Hamano <gitster@pobox.com> wrote:
> Đoàn Trần Công Danh <congdanhqx@gmail.com> writes:
>
> > On 2020-05-09 14:24:29+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
> >> From: Derrick Stolee <dstolee@microsoft.com>
> >>
> >> +test_expect_success 'repack respects repack.packKeptObjects=false' '
> >> + test_when_finished rm -f dup/.git/objects/pack/*keep &&
> >> + (
> >> + cd dup &&
> >> + ls .git/objects/pack/*idx >idx-list &&
> >
> > I think ls(1) is an overkill.
> > I think:
> >
> > echo .git/objects/pack/*idx
> >
> > is more efficient.
>
> When there is no file whose name ends with idx, what happens?
>
> $ ls *idx && echo OK
> ls: cannot access '*idx': No such file or directory
> $ echo *idx && echo OK
> *idx
> OK
Yes, but I think the next line is checking for the number of lines.
This is better to fail faster.
(My suggestion was wrong anyway, it should be "printf "%s\\n" *idx)
> >> + test_line_count = 5 idx-list &&
> >> + for keep in $(cat keep-list)
> >> + do
> >> + touch $keep || return 1
> >
> > Is this intended?
> > Since touch(1) accepts multiple files as argument.
>
> Good suggestion, but doesn't .keep file record why the pack is kept
> in real life (i.e. not an empty file)?
Yes, in real life, we usually provide a reason in this .keep file.
But, we also allow empty file with git-index-pack --keep
I think simple touch is fine for this test.
Missing piece for my previous command:
if `keep-list` is empty, we may want to fail fast,
touch with empty list will error out (at least in my system).
--
Danh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Son Luong Ngoc wrote (reply to this):
Hi,
Thanks Danh and Junio for the testing improvement suggestions.
I think these are the points I will adopt into next version:
- Remove the 3rd patch and keep the removal of dollar sign locally
inside `repack respects repack.packKeptObjects=false`.
- Change `head -n -3 | tail -n -1` to `sed -n 3p`
- Apply test_line_count on keep-list for failing fast (before touch)
Cheers,
Son Luong.
midx.c
Outdated
@@ -1369,6 +1369,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, | |||
struct child_process cmd = CHILD_PROCESS_INIT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Son Luong Ngoc <sluongng@gmail.com>
>
> Previously, when the "repack" subcommand of "git multi-pack-index" command
> creates new packfile(s), it does not call the "git repack" command but
> instead directly calls the "git pack-objects" command, and the
> configuration variables meant for the "git repack" command, like
> "repack.usedaeltabaseoffset", are ignored.
When we talk about the current state of the code (i.e. before
applying this patch), we do not say "previously". It's not like you
are complaining about a recent breakage, e.g. "previously X worked
like this but since change Y, it instead works like that, which
breaks Z".
> This patch ensured "git multi-pack-index" checks the configuration
> variables used by "git repack" and passes the corresponding options to
> the underlying "git pack-objects" command.
We write this part in imperative mood, as if we are giving an order
to the codebase to "become like so". We do not give an observation
about the patch or the author ("This patch does X, this patch also
does Y", "I do X, I do Y").
Taking these two together, perhaps like:
When the "repack" subcommand of "git multi-pack-index" command
creates new packfile(s), it does not call the "git repack"
command but instead directly calls the "git pack-objects"
command, and the configuration variables meant for the "git
repack" command, like "repack.usedaeltabaseoffset", are ignored.
Check the configuration variables used by "git repack" ourselves
in "git multi-index-pack" and pass the corresponding options to
underlying "git pack-objects".
> Note that `repack.writeBitmaps` configuration is ignored, as the
> pack bitmap facility is useful only with a single packfile.
Good.
> + int delta_base_offset = 1;
> + int use_delta_islands = 0;
These give the default values for two configurations and over there
builtin/repack.c has these lines:
17 static int delta_base_offset = 1;
18 static int pack_kept_objects = -1;
19 static int write_bitmaps = -1;
20 static int use_delta_islands;
21 static char *packdir, *packtmp;
When somebody is tempted to update these to change the default used
by "git repack", it should be easy to notice that such a change must
be accompanied by a matching change to the lines you are introducing
in this patch, or we'll be out of sync.
The easiest way to avoid such a problem may be to stop bypassing
"git repack" and calling "pack-objects" ourselves. That is the
reason why the configuration variables honored by "git repack" are
ignored in this codepath in the first place. But that is not the
approach we are taking, so we need a reasonable way to tell those
who update this file and builtin/repack.c to make matching changes.
At the very least, perhaps we should give a comment above these two
lines in this file, e.g.
/*
* when updating the default for these configuration
* variables in builtin/repack.c, these must be adjusted
* to match.
*/
int delta_base_offset = 1;
int use_delta_islands = 0;
or something like that.
With that, the rest of the patch makes sense.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Son Luong Ngoc wrote (reply to this):
On Sat, May 09, 2020 at 09:51:08AM -0700, Junio C Hamano wrote:
> "Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Son Luong Ngoc <sluongng@gmail.com>
> >
> > Previously, when the "repack" subcommand of "git multi-pack-index" command
> > creates new packfile(s), it does not call the "git repack" command but
> > instead directly calls the "git pack-objects" command, and the
> > configuration variables meant for the "git repack" command, like
> > "repack.usedaeltabaseoffset", are ignored.
>
> When we talk about the current state of the code (i.e. before
> applying this patch), we do not say "previously". It's not like you
> are complaining about a recent breakage, e.g. "previously X worked
> like this but since change Y, it instead works like that, which
> breaks Z".
>
> > This patch ensured "git multi-pack-index" checks the configuration
> > variables used by "git repack" and passes the corresponding options to
> > the underlying "git pack-objects" command.
>
> We write this part in imperative mood, as if we are giving an order
> to the codebase to "become like so". We do not give an observation
> about the patch or the author ("This patch does X, this patch also
> does Y", "I do X, I do Y").
>
> Taking these two together, perhaps like:
>
> When the "repack" subcommand of "git multi-pack-index" command
> creates new packfile(s), it does not call the "git repack"
> command but instead directly calls the "git pack-objects"
> command, and the configuration variables meant for the "git
> repack" command, like "repack.usedaeltabaseoffset", are ignored.
>
> Check the configuration variables used by "git repack" ourselves
> in "git multi-index-pack" and pass the corresponding options to
> underlying "git pack-objects".
Thanks for this, it will take me a bit to adjust to this style of
writing but I do find it to be a lot clearer and practical.
Will update in next version.
>
> > Note that `repack.writeBitmaps` configuration is ignored, as the
> > pack bitmap facility is useful only with a single packfile.
>
> Good.
>
> > + int delta_base_offset = 1;
> > + int use_delta_islands = 0;
>
> These give the default values for two configurations and over there
> builtin/repack.c has these lines:
>
> 17 static int delta_base_offset = 1;
> 18 static int pack_kept_objects = -1;
> 19 static int write_bitmaps = -1;
> 20 static int use_delta_islands;
> 21 static char *packdir, *packtmp;
>
> When somebody is tempted to update these to change the default used
> by "git repack", it should be easy to notice that such a change must
> be accompanied by a matching change to the lines you are introducing
> in this patch, or we'll be out of sync.
>
> The easiest way to avoid such a problem may be to stop bypassing
> "git repack" and calling "pack-objects" ourselves. That is the
> reason why the configuration variables honored by "git repack" are
> ignored in this codepath in the first place. But that is not the
> approach we are taking, so we need a reasonable way to tell those
> who update this file and builtin/repack.c to make matching changes.
> At the very least, perhaps we should give a comment above these two
> lines in this file, e.g.
>
> /*
> * when updating the default for these configuration
> * variables in builtin/repack.c, these must be adjusted
> * to match.
> */
> int delta_base_offset = 1;
> int use_delta_islands = 0;
>
> or something like that.
Will add the comments in next version.
>
> With that, the rest of the patch makes sense.
>
> Thanks.
Cheers,
Son Luong
t/t5319-multi-pack-index.sh
Outdated
@@ -62,8 +62,8 @@ generate_objects () { | |||
} >wide_delta_$iii && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Son Luong Ngoc <sluongng@gmail.com>
>
> As the old versions of dash is deprecated, dollar-sign inside
> artihmetic expansion is no longer needed.
> This ensures t5319 follows the coding guideline updated
> in 'jk/arith-expansion-coding-guidelines' 6d4bf5813cd2c1a3b93fd4f0b231733f82133cce.
That does not match my understanding of the guideline. By removing
the "dollar required" rule and not adding a new "dollar forbidden"
rule, we pretty much declared that "we do not care much either way"
[*1*].
Even if we cared, "Once it _is_ in the tree, it's not really worth
the patch noise to go and fix it up." rule from the guidelines
applies here.
Thanks.
[Reference]
*1* https://lore.kernel.org/git/20200505210741.GB645290@coredump.intra.peff.net/
…ations When the "repack" subcommand of "git multi-pack-index" command creates new packfile(s), it does not call the "git repack" command but instead directly calls the "git pack-objects" command, and the configuration variables meant for the "git repack" command, like "repack.usedaeltabaseoffset", are ignored. Check the configuration variables used by "git repack" ourselves in "git multi-index-pack" and pass the corresponding options to underlying "git pack-objects". Note that `repack.writeBitmaps` configuration is ignored, as the pack bitmap facility is useful only with a single packfile. Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
When selecting a batch of pack-files to repack in the "git multi-pack-index repack" command, Git should respect the repack.packKeptObjects config option. When false, this option says that the pack-files with an associated ".keep" file should not be repacked. This config value is "false" by default. There are two cases for selecting a batch of objects. The first is the case where the input batch-size is zero, which specifies "repack everything". The second is with a non-zero batch size, which selects pack-files using a greedy selection criteria. Both of these cases are updated and tested. Reported-by: Son Luong Ngoc <sluongng@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
efeb3d7
to
192fc78
Compare
/preview |
Preview email sent as pull.626.v4.git.1589126605.gitgitgadget@gmail.com |
/submit |
Submitted as pull.626.v4.git.1589126855.gitgitgadget@gmail.com |
This branch is now known as |
This patch series was integrated into pu via git@225a130. |
This patch series was integrated into pu via git@6a450b0. |
This patch series was integrated into next via git@d73f8f5. |
This patch series was integrated into pu via git@431cacf. |
This patch series was integrated into pu via git@6baba94. |
This patch series was integrated into next via git@6baba94. |
This patch series was integrated into master via git@6baba94. |
Closed via 6baba94. |
Midx repack has largely been used in Microsoft Scalar on the client side
to optimize the repository multiple packs state. However when I tried to
apply this onto the server-side, I realized that there are certain
features that were lacking compare to
git repack
. Most of thesefeatures are highly desirable on the server-side to create the most
optimized pack possible.
One of the example is delta_base_offset, comparing an midx repack
with/without delta_base_offset, we can observe significant size
differences.
Note that
repack.writeBitmaps
configuration is ignored, as thepack bitmap facility is useful only with a single packfile.
Derrick Stolee's following patch will address
repack.packKeptObjects
support.