Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: "lfs worktree add" command that's similar to "lfs clone" #1401

Closed
dakotahawkins opened this issue Jul 28, 2016 · 16 comments
Closed

Comments

@dakotahawkins
Copy link
Contributor

dakotahawkins commented Jul 28, 2016

Since using worktrees is my MO, and since it's a suggested workaround for #766, It would be great to have an explicit command that does as much lfs work as possible in bulk when you want to add a new worktree.

From recent experience, git worktree add took more time than a git lfs clone plus a git lfs fetch --all so I suspect there are savings to be had there that are very similar to the savings you're getting from git lfs clone for cloning a repo.

In truth, it probably needs to be little more than a wrapper around worktree add (sans filters), cd <worktree dir>, and lfs pull.

@ttaylorr
Copy link
Contributor

Worktree support would be something I'd like to see in LFS, though I'm not sure where it fits within our current roadmap. I'm going to CC in some other LFS folks and try and get some more 👀 on this.

/cc @technoweenie @rubyist @sinbad @larsxschneider

@dakotahawkins
Copy link
Contributor Author

Actually using the worktrees once you have them works great. It's especially good for LFS because lfs-fetch does store all the large files in one place so all your worktrees can share them.

Creating a new worktree, however, is slow like a git clone would be slower than a git lfs clone.

I have a git alias that does what I'm suggesting, but I think it would make a simple command and make a lot of sense alongside lfs clone.

@larsxschneider
Copy link
Member

@dakotahawkins What does your "git alias" do? I suspect it:

  1. Creates a worktree with a disabled LFS filter
  2. Run git lfs pull
    Is this right? If yes, then I believe the WIP: long running clean/smudge filter protocol #1382 could speed this up a lot 😄

@dakotahawkins
Copy link
Contributor Author

Yeah that's pretty much what it does. #1382 would speed the lfs checkout side of things up but an lfs worktree add would still speed up the download by doing it in batch for anybody who either doesn't want to or doesn't know how to make an alias to do that.

@dakotahawkins
Copy link
Contributor Author

I'm also going to try speeding up checkout by writing information directly to the index file, but #1382 would still be necessary for a lot of other uses of the filters!

@sinbad
Copy link
Contributor

sinbad commented Jul 29, 2016

Yeah I added worktree support a while ago but as with git checkout it still suffers from the multiple-smudge-process-fork performance issue. In the past my suggestion for this has been that we add more wrapper functions, for checkout, pull, merge mainly since those are more heavily used, but worktree obviously suffers just as badly as clone, it's just that fewer people use it.

I do believe #1382 will basically make all of the wrapper commands redundant. You mention that it won't address the batching, but actually it might be able to if we're smart about it. Although the main benefit of the first iteration of the filter streaming would be just to re-use the filter process, there's no reason why we can't refine this, and cheat: we could just pretend to get the real LFS content when called, but actually just write the pointer to begin with, and remember the list of files. When the filter process is terminated, we could then perform the batch calls to grab the real data, effectively an internal git lfs pull specifically for those files/oids. It would require the update-index optimisations too to attain top speed, but that would address the batching issue. We could potentially start calling batches as we hit 100 items in the filter, and start fetching the data, but the checkout would have to wait until the end of the process to avoid clashing with other index updates. We'd have to examine practical issues like how reliable is it that we'll get a terminate event we can do this 'mop up' with, and make sure that git doesn't have any unwritten index data at this point which will screw up any modifications we make to the working copy at the end of our filter stream process (maybe you can confirm that @larsxschneider?), but I think there are lots of opportunities opened by this model.

For that reason I think spending time on more wrapper functions would be a mistake at the moment.

@dakotahawkins
Copy link
Contributor Author

That's probably true, in which case close this with "no plans to implement".

I'd add a more generic concern that all of that seems like it will require locking the index for a relatively long time while a bunch of various operations must all succeed (file I/O, network transfers, etc) in order to wind up in a good state. If just part of one of those operations fails for any reason, you'd be hosed!

@dakotahawkins
Copy link
Contributor Author

Does LFS do any kind of global safety operation with the index (like copy it to a temp location and copy it back if part of something that involves index-writing fails)? At least that way it might have failed, but your index is at least back where it started.

@sinbad
Copy link
Contributor

sinbad commented Jul 29, 2016

I'd add a more generic concern that all of that seems like it will require locking the index for a relatively long time while a bunch of various operations must all succeed (file I/O, network transfers, etc) in order to wind up in a good state.

I don't think so - as I say you can write the pointer data to the working copy when the filter request initially comes through, so if the process is halted in the middle the state is stable & clean - the clean filter does a passthrough on pointer files in the working copy, not only because of this kind of problem but also because you can exclude folders you don't want actual LFS content for.

When it comes to updating the index at the end it would be the same as git lfs checkout.

Does LFS do any kind of global safety operation with the index (like copy it to a temp location and copy it back if part of something that involves index-writing fails)? At least that way it might have failed, but your index is at least back where it started.

Since we only ever update the index via git update-index we don't have to. Obviously any future direct writing to the index for performance will need to be a lot more careful.

@larsxschneider
Copy link
Member

@sinbad I like you batching idea. However, the big problem is to trigger the last batch for Git-LFS. I already tried to send a "git is done" event in my filter protocol but this is not easy as Git can die in any function. Maybe I send a "git is done and no errors happened" - this should be easier.

@sinbad
Copy link
Contributor

sinbad commented Jul 29, 2016

@larsxschneider yeah that makes sense - I think it would be ok that this event only happens on successful completion, because the intermediate state (pointers in working copy) is still 'safe'. Also only doing the checkout at the end on success reduces the number of possible state combinations we have to handle. My main concern is that the 'terminate' message should happen only when git is completely done modifying both the working copy and the index, so we're free to do any post-processing.

@larsxschneider
Copy link
Member

Sending a "git is done" event is difficult because Git would shutdown the filter process quickly after anyways. Can we intercept this shutdown? Could this be our trigger for the last batch?

@larsxschneider
Copy link
Member

@sinbad
Copy link
Contributor

sinbad commented Jul 29, 2016

Can't git send a graceful termination message then wait for the filter to exit? This is what for example the custom transfers system in git-lfs itself does when talking to external processes.

@larsxschneider
Copy link
Member

@larsxschneider
Copy link
Member

With Git 2.15 (*) and Git LFS 2.3.0 (via #2511) this issue should be fixed!

(*) I recommend to use at least Git 2.16 as the Git LFS download progress report was broken in 2.15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants