-
Notifications
You must be signed in to change notification settings - Fork 156
multi-pack-index: repack batches below --batch-size #698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The --batch-size=<size> option of 'git multi-pack-index repack' is intended to limit the amount of work done by the repack. In the case of a large repository, this command should repack a number of small pack-files but leave the large pack-files alone. Most often, the repository has one large pack-file from a 'git clone' operation and number of smaller pack-files from incremental 'git fetch' operations. The issue with '--batch-size' is that it also _prevents_ the repack from happening if the expected size of the resulting pack-file is too small. This was intended as a way to avoid frequent churn of small pack-files, but it has mostly caused confusion when a repository is of "medium" size. That is, not enormous like the Windows OS repository, but also not so small that this incremental repack isn't valuable. The solution presented here is to collect pack-files for repack if their expected size is smaller than the batch-size parameter until either the total expected size exceeds the batch-size or all pack-files are considered. If there are at least two pack-files, then these are combined to a new pack-file whose size should not be too much larger than the batch-size. This new strategy should succeed in keeping the number of pack-files small in these "medium" size repositories. The concern about churn is likely not interesting, as the real control over that is the frequency in which the repack command is run. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
/submit |
Submitted as pull.698.git.1597159818457.gitgitgadget@gmail.com |
On the Git mailing list, Taylor Blau wrote (reply to this):
|
On the Git mailing list, Junio C Hamano wrote (reply to this):
|
This branch is now known as |
This patch series was integrated into seen via git@97098f7. |
This patch series was integrated into seen via git@94085fd. |
This patch series was integrated into seen via git@e174ce4. |
This patch series was integrated into seen via git@012bd21. |
This patch series was integrated into seen via git@ea3c5c9. |
This patch series was integrated into seen via git@dbcd970. |
This patch series was integrated into next via git@eee9463. |
As reported [1], the 'git multi-pack-index repack' command has some unexpected behavior due to the nature of "expected size" for un-thinned fetch packs and the fact that the batch size requires the total size to be at least as large as that batch-size. By removing this minimum size restriction, we will repack more frequently and prevent this "many pack-file" problems.
[1] https://lore.kernel.org/git/6FA8F54A-C92D-497B-895F-AC6E8287AACD@gmail.com/
Cc: sluongng@gmail.com