Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add command to undo fetch and checkout #1189

Open
shoelzer opened this issue Apr 28, 2016 · 24 comments

Comments

@shoelzer
Copy link

@shoelzer shoelzer commented Apr 28, 2016

Git LFS provides control over which files to fetch/cache (lfs.fetchinclude, lfs.fetchexclude) and which files to put in the working dir (checkout <filespec>). But once files are in the cache or working dir there is no (easy) way to undo those operations. I would like a command to change files into placeholders in the working dir and delete cached files.

The goal of this command is to free disk space used by LFS files. It goes much further than prune. I should be able to remove all LFS files from the working dir and cache. By default, the command should protect against data loss by verifying that files exist on the LFS server before deleting, but there should also be an option to skip that check.

I don't know if this should be a new command, multiple new commands, or options added to existing commands. Some ideas...

  • Single new command: scrub, clear, free
  • Multiple new commands: uncheckout and unfetch
  • New options for commands: checkout --placeholder and fetch --clear-cache
@technoweenie

This comment has been minimized.

Copy link
Member

@technoweenie technoweenie commented Apr 28, 2016

I think this sounds like a great idea. I think git lfs prune could get some kind of --all option that just basically does rm -rf .git/lfs/objects. "Uncheckout" (the act of replacing LFS tracked files in the working directory with the LFS pointer) should probably be a new command though. I don't like the idea of a command like git lfs checkout doing the opposite given a new flag like git lfs checkout --clear.

@sinbad

This comment has been minimized.

Copy link
Contributor

@sinbad sinbad commented Apr 28, 2016

Thanks for raising this, it's a solid point.

In terms of deleting fetched content in the .git/lfs/objects store, I think that's a valid extension to git lfs prune of anything that would have been omitted at fetch time because of lfs.fetchinclude and lfs.fetchexclude. That could be default behaviour, the rest of prune is just an inverse of fetch with a little date padding to avoid thrashing.

As for resetting what's currently in the working copy back to pointers, you can actually achieve this now using git reset --hard if the object isn't already fetched into .git/lfs/objects. The smudge filter will get invoked and if the object data isn't local already and the fetch is suppressed due to include/exclude settings then it will write the pointer to the working copy instead, over the top of any current content. The only reason this doesn't work as a solution right now is that you have no real way to easily remove all the objects you've already fetched that now match those settings, which the changes above for prune would do.

So I think this doesn't need any extra commands, just an enhancement to the default behaviour of git lfs prune.
[edit]Ha @technoweenie beat me while I was typing 😆

@technoweenie

This comment has been minimized.

Copy link
Member

@technoweenie technoweenie commented Apr 28, 2016

FWIW, an "uncheckout" command was requested in #944 (comment) too. I think I'd prefer having our own documented/tested command, instead of encouraging git reset --hard with caveats that the user has to worry about.

@shoelzer

This comment has been minimized.

Copy link
Author

@shoelzer shoelzer commented Apr 28, 2016

Thanks for the tips on git reset --hard. I can use that right now.

I like the idea of extending git lfs prune and adding "uncheckout". Perhaps git lfs checkin?

@sinbad

This comment has been minimized.

Copy link
Contributor

@sinbad sinbad commented Apr 29, 2016

I'd favour something like git lfs checkout --clean since it's not really doing the opposite of what checkout normally does, it's just doing it again from scratch with the latest settings.

@swordfly

This comment has been minimized.

Copy link

@swordfly swordfly commented Aug 11, 2017

Hi All, recently I'm evaluating LFS support big binary test data. I'm really like the idea to revert local checkout LFS file to pointer file. Is it available now in latest LFS version? Because we have many large test data, each one is around 2GB, so we want to revert all LFS files to pointer files even if those are latest version. Thanks!

@farleylai

This comment has been minimized.

Copy link

@farleylai farleylai commented Nov 28, 2017

I think people start to ask for more from Git LFS to serve large files on demand such as Google Drive File Stream or Dropbox smart sync despite limited local storage.
Before that, a clumsy way to manually convert the large file back is to use git lfs clean as follows:

$> mv largefile.bin largefile.bin.bak
$> cat largefile.bin.bak | git lfs clean > largefile.bin
$> rm largefile.bin.bak

Those lines can be wrapped as a script command.
Unfortunately, the clean filter needs to compute based on the entire file and is running slow.
Any suggestion to improve the performance would be highly anticipated.

@ttaylorr

This comment has been minimized.

Copy link
Member

@ttaylorr ttaylorr commented Nov 29, 2017

@farleylai Without more serious workarounds, I think that this is the "best" solution for now. That said, I really like the idea of an git-lfs-stash command. "stash" is perhaps confusing as it has other connotations, but something that would remove un-checkout the object and potentially prune it from your local cache.

This is a reasonably sized project, but I would be more than happy to guide you or anyone else through it as an OSS contribution. That said, if nobody takes this on, I'd be happy to add it myself within the next few releases (cc @technoweenie).

@farleylai

This comment has been minimized.

Copy link

@farleylai farleylai commented Nov 29, 2017

@ttaylorr and @swordfly I just figured out a way to check out the pointer file as is without re-computation by untracking the file type temporarily:

git lfs untrack '*.bin'
git checkout largefile.bin
git lfs track '*.bin'

However, I still imagine a handy flag works as follows:
When the flag is set true, the pointer files are always checked out as is with git checkout/fetch/pull. The user must explicitly download the large files with git lfs pull. The recovery is simply to check out again.
Otherwise, git-lfs always materializes the pointers implicitly with with git checkout/fetch/pull.
This essentially means to control the filtering by tracking add/push and untracking checkout/pull.
So far, the closest one is set by running git lfs install --skip-smudge but only works for the first time clone.
A little bit more flexibility would be appreciated.

@ttaylorr

This comment has been minimized.

Copy link
Member

@ttaylorr ttaylorr commented Dec 1, 2017

However, I still imagine a handy flag works as follows:

If I'm understanding your proposal correctly, I think that this is largely accomplishable with the --include and --exclude flags that are provided in Git LFS. If what you're looking for are ways to by default not check an LFS object out into the working tree, I think that this should be left to scripting within the repository.

@technoweenie

This comment has been minimized.

Copy link
Member

@technoweenie technoweenie commented Dec 1, 2017

If what you're looking for are ways to by default not check an LFS object out into the working tree, I think that this should be left to scripting within the repository.

You can do this by running:

# in your working directory
$ git config --file=.lfsconfig lfs.fetchexclude "*"

Add that .lfsconfig file to your repository, and the default exclude value will be used if no alternate is given (user git config, arguments to git lfs clone or git lfs pull, etc).

@farleylai

This comment has been minimized.

Copy link

@farleylai farleylai commented Dec 2, 2017

@ttaylorr Not exactly. I am aligned with the OP. So the requirement is for git to checkout/recover the pointer files as is in the repo if unchanged or absent in a transparent way. Sure enough, specifying --exlude or lfs.fetchexclude can serve as the hint but git-lfs does not seem to recover the pointer files essentially.

@ttaylorr

This comment has been minimized.

Copy link
Member

@ttaylorr ttaylorr commented Dec 2, 2017

@farleylai Sorry, can you explain what you mean by the phrase "recover the pointer file(s)"? Thanks.

@technoweenie

This comment has been minimized.

Copy link
Member

@technoweenie technoweenie commented Dec 4, 2017

I think he means the reverse of smudge. It sounds like he has a repository with the pointer files already replaced by the actual large files via the smudge filter, and wants to reclaim a little local disk space by removing them.

You can script it, or do it manually right now:

# all LFS files are real
$ git lfs ls-files
9252a75c94 * bin/again.bin
0263829989 * bin/b.bin
98ea6e4f21 * bin/hi.bin
b9f86fab47 * gif/atom-undo.gif
d1c8fab514 * gif/droidtocat.gif
d1c8fab514 * gif/dupe.gif
55d51edb30 * png/render.png

$ git config lfs.fetchexclude '*'
$ git show HEAD:bin/again.bin > bin/again.bin
$ git lfs pull # no-op because of lfs.fetchexclude

# bin/again.bin is just a pointer
$ git lfs ls-files
9252a75c94 - bin/again.bin
0263829989 * bin/b.bin
98ea6e4f21 * bin/hi.bin
b9f86fab47 * gif/atom-undo.gif
d1c8fab514 * gif/droidtocat.gif
d1c8fab514 * gif/dupe.gif
55d51edb30 * png/render.png

That's pretty cumbersome, and doesn't remove the file from .git/lfs/objects.

@farleylai

This comment has been minimized.

Copy link

@farleylai farleylai commented Dec 4, 2017

...
$ git show HEAD:bin/again.bin > bin/again.bin
...

is exactly the key command to getting the pointer file back but it seems to require the path matching what is listed by git lfs ls-files. Alternatively, turning off the lfs tracking temporarily and git checkout works as shown earlier in general for files and directories relative to cwd. Ultimately, removing the corresponding lfs files in .git/lfs/objects accordingly is welcome. So the followup question is how to get the lfs object path in .git/lfs/objects corresponding to the oid sha256. Is it sufficient to just delete it?

@technoweenie

This comment has been minimized.

Copy link
Member

@technoweenie technoweenie commented Dec 4, 2017

Yes, LFS will happily re-download the files if they're not in .git/lfs/objects.

@hannwong

This comment has been minimized.

Copy link

@hannwong hannwong commented Feb 28, 2018

@technoweenie Any effort started on this? Not that I need this urgently; I have a script that users run to empty the entire .git/lfs/objects folder and restore every pointer into the working directory. I even have a script for git lfs pull origin <some-large-file>.

This feature isn't urgent because non-tech users don't have huge files (Word documents less than 5MB). Tech users who need to flit between huge files (ISOs, binaries, etc) can already revert huge files to text pointers on their own.

@ttaylorr

This comment has been minimized.

Copy link
Member

@ttaylorr ttaylorr commented Feb 28, 2018

@hannwong no, but this is something that I think would be worth considering for the forthcoming v2.5.0 release.

@zzhang2019

This comment has been minimized.

Copy link

@zzhang2019 zzhang2019 commented Feb 19, 2019

Looking for this feature as well and google lead me to this thread, do we it supported ready?

@ttaylorr

This comment has been minimized.

Copy link
Member

@ttaylorr ttaylorr commented Feb 19, 2019

do we it supported ready?

Not yet, but we will make sure to update this issue if/when we do.

@fstefanov

This comment has been minimized.

Copy link

@fstefanov fstefanov commented Mar 15, 2019

I use this one from my git root:

lfs_files=($(git lfs ls-files -n))
for file in "${lfs_files[@]}"; do
  git cat-file -e "HEAD:${file}" && git cat-file -p "HEAD:${file}" > "$file"
done
@rokroskar

This comment has been minimized.

Copy link

@rokroskar rokroskar commented Jun 5, 2019

@fstefanov this will write back the pointer but not delete the cached objects in .git/lfs/objects which still use up disk space. It also makes git think that the file has changed, at least with my version of git (2.21.0).

@cardoso-neto

This comment has been minimized.

Copy link

@cardoso-neto cardoso-neto commented Aug 3, 2019

no, but this is something that I think would be worth considering for the forthcoming v2.5.0 release.

We're past 2.7 already and we're still using scripts we hacked together to accomplish this.
Any way you guys could devise your own git-lfs certified and approved command for the next release?

My team ❤️ git-lfs's ease of use and it'd be great to have these pruning features out-of-the-box.

@8ctopus

This comment has been minimized.

Copy link

@8ctopus 8ctopus commented Oct 10, 2019

Great feature, I look forward to it being implemented. Here's the cleanup script I use in the meanwhile based on @fstefanov
#!/bin/bash
lfs_files=($(git lfs ls-files -n))
for file in "${lfs_files[@]}"; do
git cat-file -e "HEAD:${file}" && git cat-file -p "HEAD:${file}" > "$file"
done
rm -rf .git/lfs/objects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.