Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync only a subset of an LFS enabled repo #1894

Closed
tjaffri opened this issue Oct 17, 2018 · 9 comments
Closed

Sync only a subset of an LFS enabled repo #1894

tjaffri opened this issue Oct 17, 2018 · 9 comments

Comments

@tjaffri
Copy link

tjaffri commented Oct 17, 2018

Have you tried trouble shooting?

Trouble shooting doc
Yes

Agent Version and Platform

vsts-agent-linux-x64-2.140.2.tar.gz

OS of the machine running the agent? Hosted Linux Preview

VSTS Type and Version

VisualStudio.com

What's not working?

We have a repo with a lot of large files. We use git LFS to make this manageable.

I am creating a CI build to run a script on some of the files, using a hosted linux agent. My problem is that the very first step Get Sources seems to only have two choices:

  1. I can select "Checkout Files from LFS" which works, but the agent runs out of disk space when trying to pull down absolutely everything in the repo from LFS. In any case this is wasteful since I don't need all the LFS files, just a couple of them.

  2. I can ofcourse decide to not select "Checkout Files from LFS" (default) with the thought that I could write a manual step to do git lfs pull --include="foo.tar.gz" later. However, in this case the entire souce sync step fails with some cryptic LFS auth error (see below) specifically "Git credentials not found". So I can't even do a non LFS basic git sync on an LFS enabled repo. It seems that Loc, test class organization, various... #1 above is required to give the right credentials to LFS?

What's the correct way to pull only a couple of LFS files without sucking down absolutely everything?

Agent and Worker's Diagnostic Logs

2018-10-16T23:08:05.9219670Z Checking out files: 100% (1850/1850), done.
2018-10-16T23:08:05.9985720Z Downloading ***/****.tar (235 MB)
2018-10-16T23:08:06.0410400Z Error downloading object:***/****.tar (2e6ca68): Smudge error: Error downloading ***/****.tar (2e6ca684fd05580a0ca8439309aef7f505510ef442991622f40776c0fc42eb1b): batch response: Git credentials for https://****.visualstudio.com/data/_git/data not found.
2018-10-16T23:08:06.0432190Z 
2018-10-16T23:08:06.0432940Z Errors logged to /Users/vsts/agent/2.140.2/work/1/s/.git/lfs/logs/20181016T230806.040862.log
2018-10-16T23:08:06.0434050Z Use `git lfs logs last` to view the log.
2018-10-16T23:08:06.0445040Z error: external filter 'git-lfs filter-process' failed
2018-10-16T23:08:06.0445270Z fatal: ***/****.tar: smudge filter lfs failed
2018-10-16T23:08:06.1318140Z ##[error]System.InvalidOperationException: Git checkout failed with exit code: 128
   at Agent.Plugins.Repository.GitSourceProvider.<GetSourceAsync>d__10.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Agent.Plugins.Repository.CheckoutTask.<RunAsync>d__2.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Agent.PluginHost.Program.Main(String[] args)
2018-10-16T23:08:06.1360000Z ##[section]Finishing: Checkout
@TingluoHuang
Copy link
Contributor

@tjaffri for your scenario, i would recommend you go with 2) if want customize lfs fetch process.
you need to provide credential yourselves to git/git-lfs, you can learn the command line you might need from the agent's output.

@tjaffri
Copy link
Author

tjaffri commented Oct 17, 2018

@TingluoHuang thank you. I considered that but I have two concerns:

A. First, no sources are syncing in option 2 above for the LFS enabled repo due to the error (not even the simple git files since LFS insists on running as it is enabled on the remote, yet fails due to missing creds... I clarified above). So I have to turn off source syncing entirely for LFS enabled repos. That seems like a straight up bug, right? Isn't the CI process completely broken for LFS enabled repos if you have to sync all files (or do a complicated custom sync)? By definition LFS is when the repo has large files and you don't want to sync the entire repo. So I was hoping to report that as a bug. What's the process to do that?

B. Second, I guess I could do a manual git clone and I did look at the commands the agent is doing. But it seems to be doing complex git magic like merging PRs in the case of CI builds. I wanted to benefit from all that and not have to duplicate all that as a custom bash script. Is there a recommendation or template for how to correctly write a custom source sync?

Thanks so much for your time!

@TingluoHuang
Copy link
Contributor

@tjaffri without git lfs install git checkout shouldn't trigger lfs object download. i need to check the git config on the hosted linux pool, BTW, have you try Hosted Ubuntu 1604 pool instead of Hosted Linux Preview?

@tjaffri
Copy link
Author

tjaffri commented Oct 17, 2018

Thanks! I ran on Hosted Ubuntu 1604, as well as Mac OSX. Same error, specifically when Get Sources does NOT have "Checkout files from LFS" selected it is still trying to do some stuff with git lfs (same smudge error reported originally).

On my local machine for the same remote repo (on Ubuntu as well as Mac OS) I can simply do a git clone and that does not trigger git-lfs. I have to separately do a git lfs pull for files I want to pull from git lfs.

I researched this a bit and it seems when you install git-lfs on the agent image you should need to call git lfs install --skip-smudge (one time, to set up git lfs correctly). This avoids pulling down the entire repo when syncing a git lfs enabled image. There seems to be a way to configure this via an environment variable as well, specifically setting GIT_LFS_SKIP_SMUDGE=1 as a pipeline variable seems to work... only git files are cloned, not the lfs files. From there onwards, I think it is straightforward to write a custom bash script to git lfs include any specific files I want.

... I think I am unblocked with this workaround, but I'm sure others will run into this bug. It is quite confusing. What do you suggest as a fix?

My proposal:

  1. When "Checkout files from LFS" is checked, this should be the current behavior for those who want it. I doubt its value since this defeats the entire purpose of using LFS, but some may want it.

  2. When "Checkout files from LFS" is not check (default) then the GIT_LFS_SKIP_SMUDGE=1 environment variable should be set. In addition, there should be a link to document to the user how they can use explicit git lfs pull to pull down any files they want.

I'm happy to send a PR if you can point me the right direction? Sorry I'm new to Pipelines.

@TingluoHuang
Copy link
Contributor

@tjaffri i remembered git change its default behavior on git-lfs since last year, git now by default will checkout all git-lfs objects, it's part of git's system config.
in the windows agent, we package min-git with the agent, and we do change the default lfs option back to not checkout lfs object.
however, since we can't package git for linux, we don't have that control, and i guess the recent hosted image update finally got the version that will checkout lfs objects by default.

we need to figure a way out for Linux/macOS. :)

@tjaffri
Copy link
Author

tjaffri commented Oct 19, 2018

@TingluoHuang for linux/macOS maybe just set that env variable and that's it? All you need to do is set GIT_LFS_SKIP_SMUDGE=1 and that will make lfs not checkout the object (this is what I am doing to work around this issue... works great).

LMK if there's a place I can PR this to. I bet you guys have a config somewhere setting env vars for the agent...

@TingluoHuang
Copy link
Contributor

#1901

@moswald
Copy link
Member

moswald commented Nov 14, 2018

@TingluoHuang it looks like your PR has fixed this, yes? Can you close the issue, or is there still more work to be done?

@tjaffri
Copy link
Author

tjaffri commented Nov 15, 2018

I haven't verified the fix, but I suspect you're right @moswald we should be able to close this issue. My workaround is exactly what @TingluoHuang implemented in the PR.

@moswald moswald closed this as completed Nov 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants