Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate large binary files to git lfs? #29

Closed
mikeage opened this issue Feb 3, 2021 · 14 comments
Closed

Migrate large binary files to git lfs? #29

mikeage opened this issue Feb 3, 2021 · 14 comments
Labels
needs more info Further information is requested

Comments

@mikeage
Copy link
Member

mikeage commented Feb 3, 2021

I’m a bit conflicted about this one, but in general, git is not the best place to store large binary files. Git-lfs is the best way to integrate them (and is supported by GitHub), but an effective migration would involve rewriting history which would affect forks. On the other hand, it greatly shrinks the size of the repo and makes many operations faster. I see three options here:

  1. Ignore it.
  2. Migrate ASAP and get the pain over with now (of having forks needing to be updated and any in progress PRs rebased)
  3. Wait until development settles down and then migrate (but the number of forks will just go up over time, even if the number of active forks drops)

Any thoughts from others who’ve worked on large git reports?

@TimAidley
Copy link
Contributor

We always used git when developing Tilt Brush; there are a few large files in there, but it doesn't seem to me that there's enough to warrant LFS. The large files in there are unlikely to change much (they're mainly some .psd files for the backgrounds and some .dlls etc).

I feel like lfs would add some extra complexity for no gain.

@mikeage
Copy link
Member Author

mikeage commented Feb 3, 2021

I thought of it when I saw the ffmpeg.exe commit.

It does add a bit of complexity, especially the first time you use it, but it's pretty straightforward once you get used to it. Though arguably, that's the exact definition of Stockholm Syndrome!

@TimAidley
Copy link
Contributor

My understanding is that its main benefit is that you don't end up storing multiple copies of large files in the .git repository. However, if the large files are unlikely to change, it doesn't seem to me that you really gain anything. (I have never used git-lfs)

@mikeage
Copy link
Member Author

mikeage commented Feb 3, 2021

The clones are faster as well. Between .exe, .psd, .dll, and .png, it looks like we have 353MB worth (git ls-files | grep -e '\.dll$' -e '\.exe$' -e '\.psd$' -e '\.png$' | xargs du -cmh)

It's not a huge deal, but if we do want to switch, it'll need to rewrite the history.

@mikeage
Copy link
Member Author

mikeage commented Feb 3, 2021

(if you want to see what'd look like after a conversion, I pushed a version to https://github.com/mikeage/open-brush-lfs for comparison. To create it, I did the following:

# Track large/binary files
git lfs track '*.psd'
git lfs track '*.exe'
git lfs track '*.dll'
git lfs track '*.png'
git lfs track '*.prefab'
git add .gitattributes
# Readd the ones we have
find . \( -iname '*dll' -or -iname '*exe' -or -iname '*psd' -or -iname '*png'  -or -iname '*.prefab' \) -print0 | xargs -0 git add
git commit -m "Switch to git-lfs for png, exe, dll, psd, and .prefab"
# Rewrite history so that they've always been in git-lfs
git lfs migrate import --everything --include="*.psd,*.exe,*.dll,*.png,*.prefab"
git push origin HEAD -f

)

@mikeskydev
Copy link
Member

If we are to do this, I don't think we should be tracking *.prefab in lfs as they're essentially just YAML and can benefit from line by line comparison when merging, and there's also a specialised tool that ships with Unity for dealing with prefab merge conflicts. How does the clone speed compare when prefabs are moved out of lfs?

@mikeage
Copy link
Member Author

mikeage commented Feb 3, 2021

Gotcha. I did it because on the first push without them, git complained about a large file (Assets/Prefabs/Intro/powered_by_tiltbrush_full.prefab) and i didn't even think that 66MB could be a yaml! Let me try updating the repo without them, and I'll compare. (it looks like the download speed for LFS blobs is faster, but I don't really know why).

P.S. One other argument against doing this is that the quota for LFS files is apparently shared by forks (!). I did not know that (I use LFS at work with a hosted Github Enterprise version, so I haven't really thought about costs and limitations)

@mikeage
Copy link
Member Author

mikeage commented Feb 3, 2021

Updated.

Having reviewed the docs, perhaps it's not really as critical if they're never modified; even the large files aren't really getting that close to the 100MB limit for github. There are only 3 files over 30MB, and only 1 over 50 (and it's the yaml above).

So while I think it's a good discussion, maybe it's not worth it. Force pushing is always a challenge for a popular project.

@billyquith
Copy link

Does GitHub do LFS hosting? (I haven't tried it)

Those files are compressed when stored, and if they don't change often then probably not worth complication of moving elsewhere? There are many projects with 400MB+ repos, but you can use --depth to limit the size/history locally,

Another option is to move large/specific files into a "tools/resources" (or PC?) sub project which you only pull in when you need it.

@mikeage
Copy link
Member Author

mikeage commented Feb 3, 2021

Github does. See https://docs.github.com/en/github/managing-large-files . Above 100MB, you must use it, and above 50MB, you get a warning when you push.

@billyquith
Copy link

Does GitHub do LFS hosting? (I haven't tried it)

Seems it does, but I believe pulling the files out and moving them to LFS would mean a restarting the repo history, making everyone's clones invalid.

Github does. See https://docs.github.com/en/github/managing-large-files . Above 100MB, you must use it, and above 50MB, you get a warning when you push.

@mikeage Thanks. I didn't know that. Haven't encountered it yet, but good to know.

@mikeage
Copy link
Member Author

mikeage commented Feb 3, 2021

Correct. It'd ruin the history. A rebase would be straightforward, but annoying, and PRs would be a mess if not done. That's why if it's done, it should be done either during a long dry spell, or ASAP.

The longer this thread goes, and the more I think about it, the more I feel like the cost is too high, which is a bit ironic since it's only been about a week. But still.

@mikeskydev mikeskydev added the needs more info Further information is requested label Apr 10, 2021
@andybak
Copy link
Contributor

andybak commented Aug 23, 2021

I suggest closing this as it sounds like we've decided to do nothing at the moment. Less open issues is always a good look.

@mikeskydev
Copy link
Member

Agreed, closing but good to have this noted in case we decide to revisit in the future!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs more info Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants