New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix shallow git clone in docker-build #33704
Conversation
Some more context (pasted from #33701)
See also: #12502 (comment), which is the reason that we need to check for smart-vs-dumb http to begin with. |
47969fa
to
3519cc5
Compare
@@ -70,7 +70,7 @@ func fetchArgs(remoteURL *url.URL, ref string) []string { | |||
shallow := true | |||
|
|||
if strings.HasPrefix(remoteURL.Scheme, "http") { | |||
res, err := http.Head(fmt.Sprintf("%s/info/refs?service=git-upload-pack", remoteURL)) | |||
res, err := http.Get(fmt.Sprintf("%s/info/refs?service=git-upload-pack", remoteURL)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should call res.Body.Close()
on successful request.
The GET
request to get all remote refs is usually quite slow but it shouldn't matter here much as we are not consuming the body. I wouldn't mind actually having a whitelist with github here to avoid this request for that host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a common approach is to perform a HEAD
request, then fallback to GET
if not supported by the server, not sure if that works for this, but would avoid special-casing GitHub
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HEAD
and fallback sounds reasonable, but it doesn't solve the special case with GitHub, as GitHub doesn't support HEAD
. In the interest of performance, we may just want to do no requests at all with GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid special casing any particular provider whenever we can so that a) all providers can benefit from improvements (be they performance, robustness, protocol compatibility) and b) we avoid building in reliance on the behaviour's of particular services as they happen to be configured today and miss out on future improvements etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, added the initial HEAD
request.
a64f684
to
9ec9797
Compare
Sorry for the back and forth; discussing with @tonistiigi - we were wondering if there's still servers around that don't support this option, and what kind of error would be returned by such a server. Perhaps we can do a "happy path" and just assume it's not supported, and if it fails, fall back to non-shallow clone? Would you be interested in investigating that, and see if that would be a viable option? |
@ijc perhaps you know something about that? |
"that" is the prevalence of smart http servers which can support shallow cloning? I'm afraid that other than knowing that the smart server has been around for ages I don't really know how widespread it is. |
Basically considering to "just do a shallow clone" and assuming it is supported; if (e.g.) 99% of the servers out there support this, then the extra handling is resulting in extra overhead for 99% of the cases, and only used for 1%. I spent some time to look into what makes a "dumb" server, and how git/docker works with that. Apparently, git should fallback to the "dumb" protocol, but I guess this doesn't work if https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols
This is how to test a "dumb" server; Build a dumb server image; $ docker build -t dumb-git -<<EOF
FROM nginx:alpine
WORKDIR /usr/share/nginx/html/
RUN apk add --no-cache git
RUN git clone --bare https://github.com/thaJeztah/pgadmin4-docker.git \
&& cd pgadmin4-docker.git \
&& git update-server-info
EOF Start the server $ docker run -d --name gitty -p 80:80 dumb-git Try to clone from this server; With $ git clone http://localhost/pgadmin4-docker.git --depth 1
Cloning into 'pgadmin4-docker'...
fatal: dumb http transport does not support shallow capabilities Without (dumb) $ git clone http://localhost/pgadmin4-docker.git
Cloning into 'pgadmin4-docker'... Next, I built a Docker CLI, but with the feature detection disabled (i.e., always $ docker build -t foo http://192.168.65.2/pgadmin4-docker.git
unable to prepare context: unable to 'git clone' to temporary context directory: error fetching: fatal: dumb http transport does not support shallow capabilities
: exit status 128 So, possibly we can detect the |
Unfortunately setting up a dumb server is super trivial (expose a directory via http(s) and try to remember to run Perhaps the smart one supports some sort of |
Well, I was thinking if the request itself could be used; it looks to fail directly if it's not supported, so if we catch the error, and fallback to cloning without |
My concern was that if the clone failed for some other reason we don't just assume it was due to the shallow clone, perhaps matching on |
@ijc I think the |
@ecnerwala I agree. I think the code is correct but could do with some additional comments explaining why it is looking at the things it is looking at, in particular the aspect that For
So the GH implementation (which rejects I think the patch could be improved by believing the checking content type on a |
c99053b
to
ac87593
Compare
Alright, I made the change to check for a |
ac87593
to
e79db1f
Compare
3654877
to
7853907
Compare
// Only smart-HTTP git servers support shallow clone | ||
|
||
// URL to test for the service | ||
serviceURL := fmt.Sprintf("%s/info/refs?service=git-upload-pack", remoteURL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this can just be remoteURL + "/info/refs?service=git-upload-pack"
rather than needing fmt.Sprintf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 if not needed, it's best to avoid Sprintf
You should probably use I don't know enough about these things to say if you should also accept other (or all) |
I originally wrote it to accept all |
7853907
to
598370d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some suggestions, but let me know what you think
args = append(args, "--depth", "1") | ||
} | ||
|
||
return append(args, "origin", ref) | ||
} | ||
|
||
func supportsShallowCheckout(remoteURL string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of putting inline comments for each step, a better approach would be to describe the global intent of this function (oh, should this be called supportsShallowClone
(clone instead of checkout)? Something like;
// supportsShallowClone checks if a remote git repository supports
// shallow cloning. Shallow clone is supported over non-HTTP connections
// and HTTP connections to smart servers.
args = append(args, "--depth", "1") | ||
} | ||
|
||
return append(args, "origin", ref) | ||
} | ||
|
||
func supportsShallowCheckout(remoteURL string) bool { | ||
// Check if the URL is an HTTP url | ||
if urlutil.IsURL(remoteURL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an early return here prevents having to explain the HTTP <--> non HTTP twice (once here, and once before the return true
, e.g.;
if !urlutil.IsURL(remoteURL) {
// non-HTTP protocols always support shallow clones
return true
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it in this form because I'm afraid we might eventually have special logic for other transport types. I think this really should look like a switch protocol
statement with a default, but that doesn't quite work here.
contentTypeResponse := "application/x-git-upload-pack-advertisement" | ||
|
||
// First try a HEAD request | ||
res, err := http.Head(serviceURL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you tried to reduce complexity here by keeping the HEAD
and GET
parts separate, however doing so probably made it harder to grasp;
- the
contentTypeResponse
variable is defined a lot further from where it's used - explaining things like "if x succeeds", "check content type" should not be needed; the code can be self-explanatory
If you combine the paths, a single explanation of what we're doing can be sufficient;
// try HEAD, fallback to GET if not supported
res, err := http.Head(serviceURL)
if err != nil {
res, err = http.Get(serviceURL)
if err != nil {
return false
}
res.Body.Close()
}
After that, you'd just have to check the response code (should be self-explanatory), and the content-type (may need a comment), something like;
if res.StatusCode != http.StatusOK {
return false
}
if res.Header.Get("Content-Type") != "application/x-git-upload-pack-advertisement" {
// not a smart server
return false
}
return true
I dropped the contentTypeResponse
variable, in favour of a comment explaining we detected it's not a "smart" server. The function description itself already explained that that means "shallow clone" is not supported
Although this would've worked for the return;
return res.Header.Get("Content-Type") == "application/x-git-upload-pack-advertisement"
(mainly "taste"); I think it's less easy to grasp here, and having all returns false
(early fail) makes it slightly more consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, definitely looks a bit cleaner. Thanks!
If the HEAD request fails, use a GET request to properly test if git server is smart-http. Signed-off-by: Andrew He <he.andrew.mail@gmail.com>
598370d
to
85afbbc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
ping @tonistiigi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks all for the review. The PR could be merged now as all Jenkins tests have passed as well. |
- What I did
Properly test if git servers are smart HTTP for cloning.
Fixes #33701, introduced by #12502
- How I did it
Change the Head request to a Get.
Requires update on https://github.com/docker/cli.