-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Git package install optimizations #38
Conversation
There's an explicit git checkout command issued moments later, so there's no need to create a working copy during the clone.
If a SubDir is configured for the package, everything but that directory will be thrown away after the package is installed.
If the server supports it, fetch a specific revision with --depth 1. Otherwise, fall back to the normal fetch. This replaces the previous "clone" operation. The bandwidth and time savings can be significant depending on the history of the repository (number of commits).
@davidovich please have a look when you have a chance :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very good to me so far! Some minor things to adress.
👍
cmd.Stdout = os.Stdout | ||
cmd.Stderr = os.Stderr | ||
cmd.Dir = dir | ||
err = cmd.Run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This err
should be checked that it is actually a failed fetch and not something else.
archiveFilepath := fmt.Sprintf("%s.tar.gz", dir) | ||
|
||
defer os.Remove(archiveFilepath) | ||
commitSha, err := downloadGitHubArchive(archiveFilepath, archiveUrl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future providers (gitlab which has a similar archive functionality) I would fold the archive URL computation inside a downloadArchive
function. This one could return a struct containing sha or version, depending on the capabilities of the service. (I believe gitlab does not support sha1 Etag).
return commitSha, nil | ||
} | ||
|
||
cmd := exec.CommandContext(ctx, "git", "init") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the git clone --depth 1 --branch [version]
here instead of the git init
dance ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find a way to combine shallow fetch and sparse checkout into a single git clone
command, hence this init
+ fetch
+ checkout
combo. Do you have an idea?
These are awesome improvements!!! 🎉 |
Sorry for missing this. I'm happy to review this once the above comments are addressed and rebased. Very awesome work, thank you! 🙂 |
This looks a great improvement! 🎉 @bgagnon Did you had time to address the comments and rebase? |
# Conflicts: # pkg/git.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged with master and addressed some review comments
this will allow the optimization to work for both ssh and https
Hmm something went wrong in the merge; the new |
Alright, better now. I notice the unit tests do not do any assertions on the contents of the installed package inside the |
Agreed. As for now, this is in good shape I think. Let's move ahead with it! Thanks a lot for taking the time to do these very awesome improvements! 🎉 |
jb is sooooo fast now. Thanks so much! 🚀 |
This introduces three distinct optimizations for install operations on Git packages (the only type supported in jb at this point):
GitHub archive download
The most efficient case. If the repo is from
https://github.com
, download a tarball at the requested revision. The ETag header contains the resolved commit SHA1. When unpacking the tarball, we can skip anything that isn't under the SubDir path since it won't be needed.This does not require authentication with GitHub. At least, not for public repositories. Perhaps this needs some adjustment to avoid breaking private repo compatibility.
Shallow clones
If the server supports it, perform a shallow clone (actually a
fetch
) of--depth 1
so only the objects for the requested revision are downloaded. Not all servers support this, and some do it only if the request is reasonable (ie. it won't cause too much load on its side). For this reason, recover from the error and do a full fetch (all revisions). This has been tested to work with GitLab and GitHub (in some cases, but the archive download fast track covers those).This change replaces the previous
git clone
step with distinctgit init
,git remote add
andgit fetch
steps. This has the side effect of not creating an extra working copy before thegit checkout
step (ie. equivalent togit clone -n
).Sparse checkouts
For cases where a sub directory is specified, there is no need to create a full working copy; the subdir path can act as a filter. Remember that there is no need to end up with a functional git repository since it will all be deleted by the time the package installer is done.
Benchmarks
On kube-prometheus:
before:
after:
YMMV.
Will happily accept any criticism on my Go style as I am still learning this language and its standard library. For a future improvement, I would love to make the package installs concurrent (or at least, the IO-bound operations like downloads).