-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migration to Git LFS inflates repository multiple times #3374
Comments
As Ævar suggested on the Git mailing list, the most likely culprit here is the lack of delta compression on the LFS objects after they're extracted. Each large object is still individually compressed, so they don't use the total space Rather than a simple Have you tried doing a |
@bturner Thanks for the hints.
I am aware of the different SHA-s, but I'm still quite lacking of the Git fu. Does git-sizer count objects managed by Git LFS? For this moment, I'll assume it does NOT. Then, I have written this Here is the results for proj.git (BARE:master) $ git-sizer
Processing blobs: 1107392
Processing trees: 178226
Processing commits: 29412
Matching commits to trees: 29412
Processing annotated tags: 0
Processing references: 24
| Name | Value | Level of concern |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size | | |
| * Blobs | | |
| * Total size | 12.8 GiB | * |
| | | |
| Biggest objects | | |
| * Trees | | |
| * Maximum entries [1] | 1.96 k | * |
| * Blobs | | |
| * Maximum size [2] | 113 MiB | *********** |
| | | |
| Biggest checkouts | | |
| * Number of directories [3] | 13.3 k | ****** |
| * Maximum path depth [4] | 18 | * |
| * Maximum path length [5] | 232 B | ** |
| * Number of files [6] | 910 k | ****************** |
| * Total size of files [7] | 3.37 GiB | *** | proj.git (BARE:master) $ python git_lfs_calculate_size_by_type.py
Git LFS objects summary:
.lib: count: 1111 size: 8764.66 MB
.dll: count: 749 size: 1427.98 MB
.pdb: count: 612 size: 2814.09 MB
.exe: count: 786 size: 2005.72 MB
.zip: count: 24 size: 1153.65 MB
Total: count: 3282 size: 16166.11 MB Then, would
I've just tried that and here is what I got using
check.git (BARE:master) $ du -sh
2.4G .
check.git (BARE:master) $ git-sizer
Processing blobs: 1107392
Processing trees: 178226
Processing commits: 29412
Matching commits to trees: 29412
Processing annotated tags: 0
Processing references: 24
| Name | Value | Level of concern |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size | | |
| * Blobs | | |
| * Total size | 12.8 GiB | * |
| | | |
| Biggest objects | | |
| * Trees | | |
| * Maximum entries [1] | 1.96 k | * |
| * Blobs | | |
| * Maximum size [2] | 113 MiB | *********** |
| | | |
| Biggest checkouts | | |
| * Number of directories [3] | 13.3 k | ****** |
| * Maximum path depth [4] | 18 | * |
| * Maximum path length [5] | 232 B | ** |
| * Number of files [6] | 910 k | ****************** |
| * Total size of files [7] | 3.37 GiB | *** |
proj.git (BARE:master) $ python git_lfs_calculate_size_by_type.py
Git LFS objects summary:
.lib: count: 1111 size: 8764.66 MB
.dll: count: 749 size: 1427.98 MB
.pdb: count: 612 size: 2814.09 MB
.exe: count: 786 size: 2005.72 MB
.zip: count: 24 size: 1153.65 MB
Total: count: 3282 size: 16166.11 MB Interestingly, The
and
Hmm, does that mean the whole set of content is compressed into single pack file? That would explain this, wouldn't it? check.git (BARE:master) $ du -sh
2.4G . |
Looking at your The two together imply the work tree of a repository that cloned your I walked through your steps locally (with a trivial, throwaway repository; it's not the data contents I was interested in). Based on having done so, in your
My expectation is that your |
Right, I made a mistake, the total is not Git blobs + Git LFS (
AFAICT, it is how you expect:
By the way, for the
Thank you very much for walking me through and helping to understand the issues. (Closing the issue as it's been answered for me. Thanks a lot!) |
(I originally sent this to Git mailing list, thread Migration to Git LFS inflates repository multiple times,
but I hope asking Git LFS team directly is fine too.)
TL;TR: Is this normal a repository migrated to Git LFS inflates multiple times
and how to deal with it?
I'm migrating a big SVN repository to Git.
In SVN, a collection of third-party SDKs is maintained along with codebase.
Many of the third-party libraries come in binary form.
So, I'm migrating binary files of those to Git LFS.
I'm following the Git LFS tutorial, section Migrating existing repository data to LFS.
First, I run initial translation of the SVN reoi into Git..
The new repository is a Git bare repository.
There are 5 branches and 10+ tags in the proj.git repo.
It is quite large:
Next, I performed the following sequence of steps to optimise it
and migrate to Git LFS:
.git
directory after migration to Git LFSand
Now, I'm looking for anaswers to the following questions:
Is the procedure presented above correct to migrate (SVN ->) Git -> Git LFS?
Given the initial translation to Git generated 19 GB repo (optimised to 11 GB) is this normal Git LFS migration inflates the repository to 47 GB (optimised ot 39 GB)?
Why the inflation happens? Is this a function of number of branches? How to understand the jump from 11 GB to 39 GB?
How to optimise the repository to cut the size down further?
My next step is to somehow push the fat pig into GitHub, Bitbucket or Azure DevOps ;-)
I've used Git for a few years, but I'm pretty newbie regarding low-level or administration tasks, so I might have made basic errors.
I'll be thankful for any feedback.
The text was updated successfully, but these errors were encountered: