Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap: local storage management #490

Closed
25 of 26 tasks
sinbad opened this issue Jul 22, 2015 · 14 comments
Closed
25 of 26 tasks

Roadmap: local storage management #490

sinbad opened this issue Jul 22, 2015 · 14 comments
Labels

Comments

@sinbad
Copy link
Contributor

sinbad commented Jul 22, 2015

This is a meta issue for the roadmap. Comments should be geared towards the addition and deletion of tasks. Leave detailed discussion to issues linked in tasks. If a task doesn't have an issue next to it, create one and leave a comment here so we can update the list.

Tasks

Prune Command

Deletes objects which fall outside retention conditions in order to save disk space on the local clone. Can be invoked specifically (git lfs prune) or via the --prune option on fetch. The following must be retained:

  • Objects referenced in the current checkout (required)
  • Objects referenced by commits which have not been pushed, either to a named remote, or ANY remote, based on config (required - data only stored locally must never be deleted)
  • Other recent objects that you may want to use again without re-downloading (optional)
    • Objects referenced at recent branches/tags (current date - N days, configurable, 0 to ignore)
    • Objects referenced in other recent commits on HEAD (HEAD commit date - N days, 0 to ignore)
    • Objects referenced in other recent commits of recent branches (latest commit - N days, 0 to ignore)
  • Take account of git worktree to avoid pruning HEAD objects from any referenced worktree

The reason for so many configurable options here is that deleting something too aggressively may result in large downloads which will frustrate the user. The prune command has to be useful enough to be regularly used to keep local storage at a reasonable level, but without annoying the user with repeat downloads if they happen to switch branches or want to look at recent changes. Omitting more options and only keeping the current HEAD + unpushed would probably make a lot of people not use it after they realised they had to re-download other recent branches.

All options should be kept in gitconfig so the user can specify their general preferences for retention and override them for specific repositories if they want. Having them as config options rather than command line options also means they only have to think about them once, making using the command much easier.

Fetch command extensions

Update the fetch command so that it has the following features:

  1. Fetch objects referenced by a list of refs provided as arguments (currently only 1 supported)
  2. Fetch objects for the current checkout (no arguments) - as present
  3. Fetch 'recent' objects in addition to either the list of refs or the current checkout, say via a '--recent' option, which could also be set on by default via gitconfig
  4. Accept a --prune option to invoke git lfs prune afterwards
  5. Accept --all to download all objects ever referenced in the history, for the purposes of migration between remotes. Incompatible with the --prune option.
  6. Make 'fetch' only download objects, not alter the working copy. Introduce a 'checkout' command to ensure working copy files contain the correct content, and a 'pull' command which combines 'fetch' and 'checkout'.

Where 'recent' objects closely mirrors the prune command's retention periods.

  • Initially rename 'fetch' to 'pull'
  • Split 'pull' implementation into 2 separate commands, 'fetch' and 'checkout' which can be run separately
  • Allow git lfs fetch to accept multiple refs
  • Progress reporting on separate checkout command
  • Separate checkout should interpret . ./ .\ args correctly to filter to current dir
  • Add a remote as first parameter to fetch and pull to be specific about where to fetch from
  • Add new default remote derivation (tracking branch, origin)
  • Accept a --recent option (always done after named refs or current checkout)
    • Objects referenced at recent branches/tags (current date - N days, configurable, 0 to ignore)
    • Objects referenced in other recent commits on HEAD (HEAD commit date - N days, 0 to ignore)
    • Objects referenced in other recent commits of recent branches (latest commit - N days, 0 to ignore)
  • Add gitconfig option to always assume --recent when no arguments supplied
  • Add a dedicated man page listing all config options
  • Accept --all to download all objects referenced by any reachable commit
  • Accept a --prune option to trim storage post-fetch

The main purpose of this is to allow users to bulk fetch objects at a time of their choosing, even if they may not have those objects checked out right now. This is useful for running just before travelling, or downloading in the background while you work on other things so that the objects are already there when you come to check out another branch or review a recent change, meaning you don't have to wait.

Include / exclude paths

This feature will allow people to fetch only files in specific paths, or exclude files in specific paths. This is helpful in big repos where a user only needs a subset of the large files to perform their work and would rather not spend the download time or local disk space on any others.

This involves two settings in gitconfig called lfs.include-paths and lfs.exclude-paths, both of which can contain comma-separated lists of relative repository paths, with wildcard matching as in gitignore. If either are specified, they affect the behaviour of git lfs fetch, depending on the path of the file which is referencing the object (in the commit which is being fetched for).

  • Add lfs.include-paths and lfs.exclude-paths to config
  • In fetch, if include-paths is specified, only file paths which match one or more include-paths items will be fetched
  • In fetch, if exclude-paths is specified, file paths which match any exclude-paths items will not be fetched
  • This could be extended to prune too at a later date to allow objects to be retrospectively tidied by path to save disk space
@technoweenie
Copy link
Contributor

I'd also like to essentially rename the current git lfs fetch to git lfs pull, and then break out a new fetch that only downloads objects from the LFS server, and a new checkout that only smudges files in the working directory from the local .git/lfs/objects storage.

I don't think breaking the command up is necessary right away though. I'd be happy with just renaming fetch to pull for Git LFS v0.6.x.

One other feature that I'd like for the new git lfs fetch is a way to download everything for a repo for the purposes of migrating LFS servers. I'm assuming if no --recent flag or config is set, that git lfs fetch/pull only downloads the current checkout.

$ git lfs fetch # lfs objects referenced in current commit.
$ git lfs fetch --recent # lfs objects in commit AND recent commits
$ git lfs fetch --all # ALL lfs objects for the current ref
$ git lfs fetch --all # ALL lfs objects for the current ref
$ git lfs fetch --all --prune # ERROR: prune flag can't coexist with all flag
$ git lfs fetch --all --recent # ERROR: recent flag can't coexist with all flag

@rubyist
Copy link
Contributor

rubyist commented Jul 23, 2015

I'd also like to essentially rename the current git lfs fetch to git lfs pull, and then break out a new fetch that only downloads objects from the LFS server, and a new checkout that only smudges files in the working directory from the local .git/lfs/objects storage.

👍 to this. In the current implementation of fetch the download and smudge steps are pretty separate in the code so splitting them up will be straight forward.

@sinbad
Copy link
Contributor Author

sinbad commented Jul 23, 2015

I'd also like to essentially rename the current git lfs fetch to git lfs pull

Ah yes, I forgot to include this, updated. There's one for rename fetch->pull which can be done first to stabilise that interface, and then to add separate fetch & checkout, but I'll probably get them done around the same time anyway.

One other feature that I'd like for the new git lfs fetch is a way to download everything for a repo for the purposes of migrating LFS servers.

Good call, added. Your list of command variants was correct (no params = current checkout unless lfs.fetch-recent-always gitconfig setting is enabled), the only thing not on your list was the option to specify 1 or more explicit refs to fetch for (currently only 1 is supported), useful for people wanting to script it.

@sinbad
Copy link
Contributor Author

sinbad commented Jul 29, 2015

First PR: #527

@technoweenie
Copy link
Contributor

@sinbad: I think all of the prune items are done now too. Can you confirm?

@sinbad
Copy link
Contributor Author

sinbad commented Apr 6, 2016

Yep, all the prune items are done, forgot to circle back, sorry.

@technoweenie
Copy link
Contributor

🤘

@abbiekressner
Copy link

Include / exclude paths

This feature will allow people to fetch only files in specific paths, or exclude files in specific paths.
...
[x] Add lfs.include-paths and lfs.exclude-paths to config
[x] In fetch, if include-paths is specified, only file paths which match one or more include-paths items will be fetched
[x] In fetch, if exclude-paths is specified, file paths which match any exclude-paths items will not be fetched
[ ] This could be extended to prune too at a later date to allow objects to be retrospectively tidied by path to save disk space

Is there any way to accomplish retrospective pruning of excluded paths currently?

@ttaylorr
Copy link
Contributor

Is there any way to accomplish retrospective pruning of excluded paths currently?

@abbiekressner not currently, and I don't have this on my immediate roadmap. That being said, I'd be happy to help a contributor looking to get started in working with LFS by offering guidance through opening a pull request.

@abbiekressner
Copy link

@ttaylorr I would love to help out with this, but I am rather busy at the moment. I may have more time to look into it in a month or so, but I'm a bit of a newbie with actually contributing to projects like this so I would indeed need some guidance on where to start.

@ttaylorr
Copy link
Contributor

ttaylorr commented Mar 28, 2017

@abbiekressner that's great! Here a few places you could take a look to get started:

  1. The existing prune code. There are several goroutines that aggregate different types of objects to prune: stale objects, orphaned objects, and etc. The actual prune() function could be taught to accept a filepathfilter which would filter these results and only prune ones that match a given -I or -X (include, exclude) flag.
  2. Other commands which make use of the filepathfilter code, like fetch, pull, or clone. There's a buildFilepathFilter helper func in the commands package which would be helpful here as well.

@abbiekressner
Copy link

Great! Thanks, @ttaylorr! I will let you know if I manage to get anywhere with it.

I have one more question for you: Does including an .lfsconfig file in the repository with fetchexclude=* in the contents actually do anything? I read somewhere that this will prevent new clones from fetching files, but this did not work for me.

@ttaylorr
Copy link
Contributor

have one more question for you: Does including an .lfsconfig file in the repository with fetchexclude=* in the contents actually do anything?

It does 😄 , but only if you invoke it through git lfs clone instead of git clone. This is because those flags are checked from the git lfs clone code, but not in the filter-process code. This would be another great contribution 👍.

@abbiekressner
Copy link

Thanks for all the quick responses, @ttaylorr! Keep up the great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants