New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix shallow git clone in docker-build #33704

Merged
merged 1 commit into from Aug 1, 2017

Conversation

Projects
None yet
7 participants
@ecnerwala
Contributor

ecnerwala commented Jun 15, 2017

- What I did
Properly test if git servers are smart HTTP for cloning.
Fixes #33701, introduced by #12502

- How I did it
Change the Head request to a Get.

Requires update on https://github.com/docker/cli.

@ecnerwala

This comment has been minimized.

Show comment
Hide comment
@ecnerwala

ecnerwala Jun 16, 2017

Contributor

Some more context (pasted from #33701)

docker build github.com/moby/moby.git should only clone with depth 1. However, it's fetching the entire repository. This is because we only git clone --depth 1 if we think the server is "smart", by checking with a HEAD request for /info/refs?service=git-upload-pack (code here). However, github doesn't actually support HEAD requests (Git documentation never mentions HEAD requests).

See also: #12502 (comment), which is the reason that we need to check for smart-vs-dumb http to begin with.

Contributor

ecnerwala commented Jun 16, 2017

Some more context (pasted from #33701)

docker build github.com/moby/moby.git should only clone with depth 1. However, it's fetching the entire repository. This is because we only git clone --depth 1 if we think the server is "smart", by checking with a HEAD request for /info/refs?service=git-upload-pack (code here). However, github doesn't actually support HEAD requests (Git documentation never mentions HEAD requests).

See also: #12502 (comment), which is the reason that we need to check for smart-vs-dumb http to begin with.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jul 20, 2017

Member

Sorry for the back and forth; discussing with @tonistiigi - we were wondering if there's still servers around that don't support this option, and what kind of error would be returned by such a server.

Perhaps we can do a "happy path" and just assume it's not supported, and if it fails, fall back to non-shallow clone?

Would you be interested in investigating that, and see if that would be a viable option?

Member

thaJeztah commented Jul 20, 2017

Sorry for the back and forth; discussing with @tonistiigi - we were wondering if there's still servers around that don't support this option, and what kind of error would be returned by such a server.

Perhaps we can do a "happy path" and just assume it's not supported, and if it fails, fall back to non-shallow clone?

Would you be interested in investigating that, and see if that would be a viable option?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jul 20, 2017

Member

@ijc perhaps you know something about that?

Member

thaJeztah commented Jul 20, 2017

@ijc perhaps you know something about that?

@thaJeztah thaJeztah removed this from backlog in maintainers-session Jul 20, 2017

@ijc

This comment has been minimized.

Show comment
Hide comment
@ijc

ijc Jul 21, 2017

Contributor

"that" is the prevalence of smart http servers which can support shallow cloning?

I'm afraid that other than knowing that the smart server has been around for ages I don't really know how widespread it is.

Contributor

ijc commented Jul 21, 2017

"that" is the prevalence of smart http servers which can support shallow cloning?

I'm afraid that other than knowing that the smart server has been around for ages I don't really know how widespread it is.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jul 21, 2017

Member

Basically considering to "just do a shallow clone" and assuming it is supported; if (e.g.) 99% of the servers out there support this, then the extra handling is resulting in extra overhead for 99% of the cases, and only used for 1%.

I spent some time to look into what makes a "dumb" server, and how git/docker works with that. Apparently, git should fallback to the "dumb" protocol, but I guess this doesn't work if --depth 1 is set;

https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols

If the server does not respond with a Git HTTP smart service, the Git client will try to fall back to the simpler “dumb” HTTP protocol. The Dumb protocol expects the bare Git repository to be served like normal files from the web server.

This is how to test a "dumb" server;

Build a dumb server image;

$ docker build -t dumb-git -<<EOF
FROM nginx:alpine

WORKDIR /usr/share/nginx/html/
RUN apk add --no-cache git
RUN git clone --bare https://github.com/thaJeztah/pgadmin4-docker.git \
 && cd pgadmin4-docker.git \
 && git update-server-info
EOF

Start the server

$ docker run -d --name gitty -p 80:80 dumb-git

Try to clone from this server;

With --depth 1

$ git clone http://localhost/pgadmin4-docker.git --depth 1

Cloning into 'pgadmin4-docker'...
fatal: dumb http transport does not support shallow capabilities

Without (dumb)

$ git clone http://localhost/pgadmin4-docker.git
Cloning into 'pgadmin4-docker'...

Next, I built a Docker CLI, but with the feature detection disabled (i.e., always --depth 1);

$ docker build -t foo http://192.168.65.2/pgadmin4-docker.git
unable to prepare context: unable to 'git clone' to temporary context directory: error fetching: fatal: dumb http transport does not support shallow capabilities
: exit status 128

So, possibly we can detect the fatal: dumb http transport does not support shallow capabilities error, and in that case fallback?

Member

thaJeztah commented Jul 21, 2017

Basically considering to "just do a shallow clone" and assuming it is supported; if (e.g.) 99% of the servers out there support this, then the extra handling is resulting in extra overhead for 99% of the cases, and only used for 1%.

I spent some time to look into what makes a "dumb" server, and how git/docker works with that. Apparently, git should fallback to the "dumb" protocol, but I guess this doesn't work if --depth 1 is set;

https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols

If the server does not respond with a Git HTTP smart service, the Git client will try to fall back to the simpler “dumb” HTTP protocol. The Dumb protocol expects the bare Git repository to be served like normal files from the web server.

This is how to test a "dumb" server;

Build a dumb server image;

$ docker build -t dumb-git -<<EOF
FROM nginx:alpine

WORKDIR /usr/share/nginx/html/
RUN apk add --no-cache git
RUN git clone --bare https://github.com/thaJeztah/pgadmin4-docker.git \
 && cd pgadmin4-docker.git \
 && git update-server-info
EOF

Start the server

$ docker run -d --name gitty -p 80:80 dumb-git

Try to clone from this server;

With --depth 1

$ git clone http://localhost/pgadmin4-docker.git --depth 1

Cloning into 'pgadmin4-docker'...
fatal: dumb http transport does not support shallow capabilities

Without (dumb)

$ git clone http://localhost/pgadmin4-docker.git
Cloning into 'pgadmin4-docker'...

Next, I built a Docker CLI, but with the feature detection disabled (i.e., always --depth 1);

$ docker build -t foo http://192.168.65.2/pgadmin4-docker.git
unable to prepare context: unable to 'git clone' to temporary context directory: error fetching: fatal: dumb http transport does not support shallow capabilities
: exit status 128

So, possibly we can detect the fatal: dumb http transport does not support shallow capabilities error, and in that case fallback?

@ijc

This comment has been minimized.

Show comment
Hide comment
@ijc

ijc Jul 21, 2017

Contributor

Unfortunately setting up a dumb server is super trivial (expose a directory via http(s) and try to remember to run git update-server-info in it, even if you forget that many things still just work) compared with setting up a smart one (which involves cgi), so even though the smart one has been around for ages I would expect there will still be relatively significant numbers of dumb ones, especially internal/private (e.g. company) ones which we have no visibility onto. IOW I'm afraid I doubt it is as many as 99% of servers.

Perhaps the smart one supports some sort of ?query=foo type syntax which the dumb one certainly doesn't and which could be used as a probe?

Contributor

ijc commented Jul 21, 2017

Unfortunately setting up a dumb server is super trivial (expose a directory via http(s) and try to remember to run git update-server-info in it, even if you forget that many things still just work) compared with setting up a smart one (which involves cgi), so even though the smart one has been around for ages I would expect there will still be relatively significant numbers of dumb ones, especially internal/private (e.g. company) ones which we have no visibility onto. IOW I'm afraid I doubt it is as many as 99% of servers.

Perhaps the smart one supports some sort of ?query=foo type syntax which the dumb one certainly doesn't and which could be used as a probe?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jul 21, 2017

Member

Well, I was thinking if the request itself could be used; it looks to fail directly if it's not supported, so if we catch the error, and fallback to cloning without --depth=1. Or would that be really bad?

Member

thaJeztah commented Jul 21, 2017

Well, I was thinking if the request itself could be used; it looks to fail directly if it's not supported, so if we catch the error, and fallback to cloning without --depth=1. Or would that be really bad?

@ijc

This comment has been minimized.

Show comment
Hide comment
@ijc

ijc Jul 21, 2017

Contributor

My concern was that if the clone failed for some other reason we don't just assume it was due to the shallow clone, perhaps matching on fatal: dumb http transport does not support shallow capabilities but that might be fragile due to differences in different git versions or due to l18n?

Contributor

ijc commented Jul 21, 2017

My concern was that if the clone failed for some other reason we don't just assume it was due to the shallow clone, perhaps matching on fatal: dumb http transport does not support shallow capabilities but that might be fragile due to differences in different git versions or due to l18n?

@ecnerwala

This comment has been minimized.

Show comment
Hide comment
@ecnerwala

ecnerwala Jul 21, 2017

Contributor

@ijc I think the GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0 is the standard probe to differentiate between smart and dumb servers (https://www.kernel.org/pub/software/scm/git/docs/technical/http-protocol.html).

Contributor

ecnerwala commented Jul 21, 2017

@ijc I think the GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0 is the standard probe to differentiate between smart and dumb servers (https://www.kernel.org/pub/software/scm/git/docs/technical/http-protocol.html).

@ijc

This comment has been minimized.

Show comment
Hide comment
@ijc

ijc Jul 24, 2017

Contributor

@ecnerwala I agree. I think the code is correct but could do with some additional comments explaining why it is looking at the things it is looking at, in particular the aspect that $GIT_URL/info/refs?service=git-upload-pack is specified for both dumb and smart servers but with distinct Content-Types specified.

For GET vs HEAD I note that the spec says WRT `HEAD:

   The HEAD method is identical to GET except that the server MUST NOT
   send a message body in the response (i.e., the response terminates at
   the end of the header section).  The server SHOULD send the same
   header fields in response to a HEAD request as it would have sent if
   the request had been a GET, except that the payload header fields
   (Section 3.3) MAY be omitted. 

So the GH implementation (which rejects HEAD with 405 Method Not Allowed) is, I suppose, compliant since it only violates a SHOULD, although it does go rather against the spirit of things.

I think the patch could be improved by believing the checking content type on a 2xx response to the HEAD request and only falling back to a GET check if the HEAD gets a non-2xx response. i.e. if the HEAD results in a 200 OK with Content-Type != application/x-git-upload-pack-advertisement then there is no need to fallback to a GET in that case, we know it is a dumb server. I don't think that's an absolute requirement though, it just avoids a redundant check for a dumb server which does respond to the HEAD based probe.

Contributor

ijc commented Jul 24, 2017

@ecnerwala I agree. I think the code is correct but could do with some additional comments explaining why it is looking at the things it is looking at, in particular the aspect that $GIT_URL/info/refs?service=git-upload-pack is specified for both dumb and smart servers but with distinct Content-Types specified.

For GET vs HEAD I note that the spec says WRT `HEAD:

   The HEAD method is identical to GET except that the server MUST NOT
   send a message body in the response (i.e., the response terminates at
   the end of the header section).  The server SHOULD send the same
   header fields in response to a HEAD request as it would have sent if
   the request had been a GET, except that the payload header fields
   (Section 3.3) MAY be omitted. 

So the GH implementation (which rejects HEAD with 405 Method Not Allowed) is, I suppose, compliant since it only violates a SHOULD, although it does go rather against the spirit of things.

I think the patch could be improved by believing the checking content type on a 2xx response to the HEAD request and only falling back to a GET check if the HEAD gets a non-2xx response. i.e. if the HEAD results in a 200 OK with Content-Type != application/x-git-upload-pack-advertisement then there is no need to fallback to a GET in that case, we know it is a dumb server. I don't think that's an absolute requirement though, it just avoids a redundant check for a dumb server which does respond to the HEAD based probe.

@ecnerwala

This comment has been minimized.

Show comment
Hide comment
@ecnerwala

ecnerwala Jul 24, 2017

Contributor

Alright, I made the change to check for a 200 OK response.

Contributor

ecnerwala commented Jul 24, 2017

Alright, I made the change to check for a 200 OK response.

@ijc

This comment has been minimized.

Show comment
Hide comment
@ijc

ijc Jul 25, 2017

Contributor

You should probably use http.StatusOK rather than the hard coded 200 but otherwise LGTM (I left one minor nit as a comment).

I don't know enough about these things to say if you should also accept other (or all) 2xx results, I think we can leave it as you have it unless some expert says otherwise.

Contributor

ijc commented Jul 25, 2017

You should probably use http.StatusOK rather than the hard coded 200 but otherwise LGTM (I left one minor nit as a comment).

I don't know enough about these things to say if you should also accept other (or all) 2xx results, I think we can leave it as you have it unless some expert says otherwise.

@ecnerwala

This comment has been minimized.

Show comment
Hide comment
@ecnerwala

ecnerwala Jul 25, 2017

Contributor

I originally wrote it to accept all 2xx results, but the git spec actually specifies that the client should validate that the response is 200 OK or 304 Not Modified (for cached entries), so I went with this instead. If anyone knows better, please let me know!

Contributor

ecnerwala commented Jul 25, 2017

I originally wrote it to accept all 2xx results, but the git spec actually specifies that the client should validate that the response is 200 OK or 304 Not Modified (for cached entries), so I went with this instead. If anyone knows better, please let me know!

@thaJeztah

left some suggestions, but let me know what you think

Show outdated Hide outdated builder/remotecontext/git/gitutils.go
Show outdated Hide outdated builder/remotecontext/git/gitutils.go
Show outdated Hide outdated builder/remotecontext/git/gitutils.go
Fix shallow git clone in docker-build
If the HEAD request fails, use a GET request to properly test if git
server is smart-http.

Signed-off-by: Andrew He <he.andrew.mail@gmail.com>
@thaJeztah

LGTM, thanks!

@yongtang

This comment has been minimized.

Show comment
Hide comment
@yongtang
Member

yongtang commented Aug 1, 2017

@moby moby deleted a comment from GordonTheTurtle Aug 1, 2017

@tonistiigi

LGTM

@yongtang

This comment has been minimized.

Show comment
Hide comment
@yongtang

yongtang Aug 1, 2017

Member

Thanks all for the review. The PR could be merged now as all Jenkins tests have passed as well.

Member

yongtang commented Aug 1, 2017

Thanks all for the review. The PR could be merged now as all Jenkins tests have passed as well.

@yongtang yongtang merged commit f7d09a0 into moby:master Aug 1, 2017

6 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 35848 has succeeded
Details
janky Jenkins build Docker-PRs 44463 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 4844 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 15844 has succeeded
Details
z Jenkins build Docker-PRs-s390x 4542 has succeeded
Details

@ecnerwala ecnerwala deleted the ecnerwala:33701-shallow-clone branch Aug 1, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment