Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-35687] Add simple git lfs support #232

Merged
merged 5 commits into from Feb 27, 2017

Conversation

@creste
Copy link
Contributor

commented Dec 29, 2016

This is a refactoring of @matthauck's changes from #206. I could not get his changes to work because git lfs fetch was failing with this error: can't resolve ref: "HEAD" (See jenkinsci/git-plugin#414 (comment)). After reading git-lfs/git-lfs#1307, I realized that git lfs fetch requires the code to be checked out before it can know which large files to download. The original code was attempting to run git lfs fetch after git fetch but before git checkout, which wasn't working because there wasn't a local HEAD yet. This means the changes proposed by @estyrke in jenkinsci/git-plugin#414 (comment) also won't work. The fix was to remove the call to git lfs fetch and add a call to git lfs pull after the code was checked out.

Next, I ran into problems with doing git lfs pull after the git checkout when credentials are required by the remote repository. The checkout command doesn't accept a set of credentials so there wasn't an obvious way to know which credentials should be used for the git lfs pull. I decided to retrieve the credentials based on the remote repository URL. That also was problematic because if the local repository has multiple remotes then there isn't a good way to know which remote should be used for git lfs pull. This is a known issue with git lfs and is documented at git-lfs/git-lfs#436. If no remote is provided to git lfs pull then git-lfs has some fallback logic that eventually defaults to origin. This fallback logic is different from the fallback logic in CLIGitAPIImpl::getDefaultRemote(). Since I already needed the remote URL to know which credentials to use for git lfs pull, I decided to use getDefaultRemote() to know which remote to pull from. In most cases it will pull from origin, which is very similar to the default behavior of git lfs pull. This could cause problems in the future. If it does, then we can add an additional parameter for users to specify the name of the remote to use with git lfs pull.

I tested these changes locally. I tried to add unit tests, but couldn't find any existing tests to act as a guideline. If unit tests are required then guidance would be greatly appreciated.

@MarkEWaite, I noticed this comment in #206:

Wouldn't it be better to check that LFS is available in the git version being used, rather than just throw an exception when it fails?

I agree with @matthauck that an explicit setting is needed. There are use cases where a repository has files stored using git-lfs but the user doesn't want to pull those files during the initial checkout. Making the git lfs pull an option gives users flexibility to decide when/if they want to pull the files.

A separate PR for jenkinsci/git-plugin is forthcoming.

@MarkEWaite

This comment has been minimized.

Copy link
Contributor

commented Dec 30, 2016

I'd really like an automated test for this. I think you could follow the pattern of the submodule tests. They rely on the existence of separate branches in this repository, tests/getSubmodules and tests/notSubmodules. Those repositories provide the "source data" that can be tested in the tests on the main branch.

You can refer to GitClientTest for an example that I recently added which uses the tests/getSubmodules branch as source data for a test.

@creste creste force-pushed the creste:JENKINS-35687 branch from e432224 to 35e3747 Jan 3, 2017

@creste

This comment has been minimized.

Copy link
Contributor Author

commented Jan 3, 2017

Thank you for the guidance. I created a test branch called tests/largeFileSupport but I cannot push files stored using git LFS to GitHub because GitHub wants to charge me $5/month to store them. See https://help.github.com/articles/purchasing-additional-storage-and-bandwidth-for-a-personal-account/ for more information.

Is there a way for the test branch to be created in this repository so that I don't have to sign up for GitHub's subscription service? I don't typically use GitHub except when necessary to contribute to projects like this.

I was planning to create a single .txt file in the repository, similar to this:

echo "5e7733d8acc94636850cb466aec524e4" > uuid.txt
git add uuid.txt
git commit -m "Added uuid.txt"

Then I was going to create a unit test to clone the repository with git LFS enabled and read the text file to make sure it contained the UUID instead of an LFS file pointer.

@creste creste force-pushed the creste:JENKINS-35687 branch from 35e3747 to 43e15c2 Jan 3, 2017

@creste

This comment has been minimized.

Copy link
Contributor Author

commented Jan 3, 2017

I added a unit test. The unit test will fail until the tests/LargeFileSupport branch is created in this repository. I verified that the unit test passes when used with my personal git server (not GitHub) that has the tests/LargeFileSupport branch.

The unit test also requires git-lfs to be installed on the system that runs the unit test.

I had to add support for specifying the name of the remote to pull from because the test repository created in GitClientTest always has two remotes: origin and upstream. origin is the local path to the plugin repository, which does not contain the files stored with LFS. The upstream repository points to this GitHub project which should have the LFS files in the tests/LargeFileSupport branch. The CheckoutCommand was defaulting to origin instead of upstream, so I had to provide a way for the unit test to specify which remote to use.

@creste creste force-pushed the creste:JENKINS-35687 branch from 43e15c2 to 87a2501 Jan 3, 2017

@MarkEWaite

This comment has been minimized.

Copy link
Contributor

commented Jan 3, 2017

If you'll provide me the instructions for creating the branch (and uploading the LFS binary), I'm happy to do that upload and cover the GitHub added cost (if they charge me for it). I use GitHub all the time, and was likely to need to pay them for LFS soon anyway.

I'll need to automate the deployment of LFS to the machines in my Docker based cluster. I've already automated the deployment of cmake, JDK 6, JDK 7, JDK 8, Ant, and Maven, so I don't expect that automated LFS deployment will be too painful.

I see in the JGit mailing list that JGit now supports LFS on the client. I'm not yet sure what changes will be required in the Jenkins plugins to support LFS from JGit, but it sounds promising.

@creste

This comment has been minimized.

Copy link
Contributor Author

commented Jan 3, 2017

Sure, here are the instructions for creating the test branch:

  1. Install https://git-lfs.github.com/. Make sure you run git lfs install after installing it. You can verify everything is configured correctly by checking git config filter.lfs.smudge to make sure it says something like git-lfs smudge -- %f.
  2. Clone my fork of this repository:
git clone https://github.com/creste/git-client-plugin.git git-client-plugin
  1. Change to the cloned repository:
cd git-client-plugin
  1. Check out the tests/largeFileSupport branch that I created in my fork:
git checkout tests/largeFileSupport
  1. Commit uuid.txt:
echo "5e7733d8acc94636850cb466aec524e4" > uuid.txt
git commit -m "Added uuid.txt" uuid.txt
  1. Add this repository as a remote:
git remote add upstream https://github.com/jenkinsci/git-client-plugin.git
  1. Push the branch to this repository:
git push upstream tests/largeFileSupport

If at this point you get an error from GitHub about not enough space to store the file then they want you to pay $5/month to store files using LFS.

Regarding JGit - you're right! Looks like JGit 4.6 supports LFS without having git-lfs installed.

@MarkEWaite

This comment has been minimized.

Copy link
Contributor

commented Jan 3, 2017

I wasn't thinking clearly enough about the impact of the repository being hosted in the jenkinsci organization. I have purchased a data pack ($5 per month, low price considering how much I use GitHub repositories for Jenkins plugin automated tests), but it appears that purchase applies to my personal account not to the jenkinsci organization.

I'll need to discuss with @rtyler to see what options are available to enable git LFS for the jenkinsci organization.

I attempted to push to my personal repository, and it also continues to report that I need to purchase a data pack. I've purchased the data pack, so I'm hopeful that within an hour or two that purchase will allow me to upload to my fork.

@creste

This comment has been minimized.

Copy link
Contributor Author

commented Jan 3, 2017

I did some more research and discovered these git-lfs bugs:

Here is the key quote:

On GitHub.com, you can't push LFS assets to a public fork unless the original repo already has LFS objects, or you have push access to the original repo. This is a server side rule to prevent abuse. Once the public repository has LFS objects, anyone should be able to push LFS objects to their forks. If you see an issue here, let me know the repository fork and the owner trying to push so we can look into it.

My repository is a fork of a fork of this repository, which is public. However, I'm not seeing the "can not upload new objects to public fork" error message. But maybe the above issues are related in some way. To test that theory, I created a repository at creste/git-lfs-test and was able to push an LFS file to it. Therefore, I think we are both affected by the above issues and the fix is to somehow enable LFS for this repo so that our public forks can work with LFS.

@MarkEWaite

This comment has been minimized.

Copy link
Contributor

commented Jan 3, 2017

Thanks! I've sent a message to GitHub support asking for their help to explain why I can't push to my repository. I think your research shows why I can't push, and that we will need their help before a push can be successful.

@MarkEWaite

This comment has been minimized.

Copy link
Contributor

commented Jan 4, 2017

GitHub support detected some errors in the metrics they were gathering. They've corrected those errors and I was able to push the tests/largeFileSupport branch to my personal repo and to this repo. Can you try your tests now so that we can see if that is enough?

I installed git lfs on my machine before the push, and just did a push. Is there anything else needed to complete my part of the work?

@creste

This comment has been minimized.

Copy link
Contributor Author

commented Jan 4, 2017

I modified my local copy of the code to use the tests/largeFileSupport branch in this repository and the tests passed. I also manually verified that the branch looks good. Thank you :)

The only thing left to do is to install git-lfs on the machine that Jenkins uses to run unit tests. The GitClientTest class automatically skips the git-lfs test if git-lfs is not installed. The log output during the last build shows that 14 tests in the GitClientTest class were skipped:

Tests run: 132, Failures: 0, Errors: 0, Skipped: 14, Time elapsed: 87.02 sec - in org.jenkinsci.plugins.gitclient.GitClientTest

That number of skipped tests should be 13 if it is running the git-lfs unit test that I added.

@matthauck

This comment has been minimized.

Copy link
Contributor

commented Jan 6, 2017

Thanks for jumping in with this btw., @creste! Haven't been available to get to this recently -- really appreciate you marching this change forward!

@MarkEWaite
Copy link
Contributor

left a comment

Thanks for submitting the pull request.

I will need to update my test infrastructure (docker images, physical machines, etc.) with Git LFS, and will need to perform some interactive tests.

I was very glad to see that there is already one user that has taken the proposed changes and is using them.

@@ -49,4 +49,12 @@
* @return a {@link org.jenkinsci.plugins.gitclient.CheckoutCommand} object.
*/
CheckoutCommand timeout(Integer timeout);

This comment has been minimized.

Copy link
@MarkEWaite

MarkEWaite Jan 6, 2017

Contributor

Could you remove the trailing white space on this line?

The formatting in the project is already somewhat messy, and I'd much rather not add another line of surprising formatting


/**
* Call "git lfs pull" for the given remote after checkout.
*

This comment has been minimized.

Copy link
@MarkEWaite

MarkEWaite Jan 6, 2017

Contributor

Could you remove the trailing white space from this line as well?

@@ -1987,6 +1988,11 @@ public CheckoutCommand timeout(Integer timeout) {
this.timeout = timeout;
return this;
}

This comment has been minimized.

Copy link
@MarkEWaite

MarkEWaite Jan 6, 2017

Contributor

Please remove trailing white space from this line as well

@@ -2040,7 +2054,22 @@ public void execute() throws GitException, InterruptedException {
args.add("-f");
}
args.add(ref);
launchCommandIn(args, workspace, environment, timeout);
launchCommandIn(args, workspace, checkoutEnv, timeout);

This comment has been minimized.

Copy link
@MarkEWaite

MarkEWaite Jan 6, 2017

Contributor

Please remove the trailing white space from this line as well

@@ -107,6 +108,16 @@ public GitClientTest(final String gitImplName) throws IOException, InterruptedEx
CLI_GIT_SUPPORTS_SUBMODULES = cliGitClient.isAtLeastVersion(1, 8, 0, 0);
CLI_GIT_SUPPORTS_SUBMODULE_DEINIT = cliGitClient.isAtLeastVersion(1, 9, 0, 0);
CLI_GIT_SUPPORTS_SUBMODULE_RENAME = cliGitClient.isAtLeastVersion(1, 9, 0, 0);

This comment has been minimized.

Copy link
@MarkEWaite

MarkEWaite Jan 6, 2017

Contributor

Remove trailing white space from this line as well

@@ -553,6 +564,18 @@ public void testCheckoutBranch() throws Exception {
gitClient.checkoutBranch(branch, remote + "/" + branch);
assertTrue(src.isDirectory());
}

This comment has been minimized.

Copy link
@MarkEWaite

MarkEWaite Jan 6, 2017

Contributor

Remove trailing white space from this line as well

@MarkEWaite

This comment has been minimized.

Copy link
Contributor

commented Jan 6, 2017

I've updated my lts and lts-with-plugins docker images to include git lfs, and have added a job which checks for existence of lfs on machines running at least git 1.8.2. I've also deployed git lfs into most of my cluster of machines at home.

Once I've done the same on another set of computers, I'll be ready to test this. Likely won't be until next week, but I'm making progress...

@creste

This comment has been minimized.

Copy link
Contributor Author

commented Jan 6, 2017

Oops, sorry about the whitespace problems. I removed all the instances you pointed out and will do better in the future.

I'm glad to hear that adding git-lfs to the test containers is going well. There is at least one other git-lfs improvement I have in mind (for a different PR that doesn't exist yet) that would need a unit test.

@baffles

This comment has been minimized.

Copy link
Member

commented Jan 11, 2017

FWIW, I've built the modified plugin and it seems to be working fine for me (we needed LFS support within Jenkins).

@MarkEWaite

This comment has been minimized.

Copy link
Contributor

commented Jan 27, 2017

As implemented, if a user has git lfs installed and the repository includes a large file reference and lfsRemote is not defined, then the checkout will fail. Prior to this implementation, the checkout would have succeeded and would have ignored the large file content. I've made a change in 37a00fb which allows the checkout to succeed even if lfsRemote is not set for the job.

@creste does 37a00fb have problems that I should not include it?

@creste

This comment has been minimized.

Copy link
Contributor Author

commented Jan 27, 2017

I am not sure if there is a problem or not.

I think some of the current users of the git plugin (without LFS support) are manually setting up git lfs to work correctly with their LFS repos. For example, I think one way to do that is:

  1. Install git lfs
  2. If credentials are required, then set up an SSH agent or configure git to use a private SSH key stored locally
  3. Rely on the smudge filter run during git checkout to pull files one at a time instead of relying on git lfs pull.

I'm basing the above steps on comments made by other users in the following posts:

This is from comments in commit 37a00fb9275a4aeb49bde4450fb0a6e54a470851:

If smudge filter is not disabled with git LFS 1.5.4 and there is an LFS reference in the repository, then the git checkout will fail.

I think that is expected for current users of the git plugin (without LFS support). If they haven't manually configured Git LFS to work on their Jenkins nodes then the checkout will fail. But if they have manually configured Git LFS with the correct credentials then checkout will succeed because the smudge filter will work correctly. When those same users upgrade to the new version of the git plugin (with LFS support) they might forget to explicitly enable LFS support in Jenkins and not realize that the LFS smudge filter is now being skipped for all checkouts. That could cause their existing jobs to fail because git LFS is no longer being called in the smudge filter like it was before.

I'm not familiar enough with Jenkins and it's development process to know how to handle the potential for breaking existing plugin users in the manner described above. Perhaps it is sufficient to tell them to explicitly enable Git LFS in Jenkins and make sure the correct credentials are configured in Jenkins.

Other than that, I don't see any potential problems with 37a00fb.

@MarkEWaite

This comment has been minimized.

Copy link
Contributor

commented Jan 27, 2017

Thanks @creste for the research and the description. Those are exactly the types of compatibility problems that worried me. My changes are exactly the wrong thing to do for compatibility.

I'll construct some additional integration test case jobs which show those cases, and will likely include one or more of them in the unit tests for the plugin.

Further evaluation yet to come...

@MarkEWaite MarkEWaite merged commit 470f125 into jenkinsci:master Feb 27, 2017

1 of 2 checks passed

continuous-integration/jenkins/pr-merge This commit cannot be built
Details
Jenkins This pull request looks good
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.