Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: allow extraction of urls used to download dependencies #35922

Open
williamh opened this issue Dec 1, 2019 · 15 comments

Comments

@williamh
Copy link

@williamh williamh commented Dec 1, 2019

Hello,

I am the go package maintainer on Gentoo Linux, and I maintain several packages written in Go as well.

Our package manager does not allow network access during the build process after downloading the source for a package, so it need to be able to download the .zip files for the modules a package needs in advance.

I believe I can download the .zip files to a path, which I will call DISTDIR, then during the build, set GOPROXY="file://${DISTDIR}" and avoid network access.

To do that, I need a way to extract all of the the URLs for the .zip files for the dependencies of a package so I can put them in a list for the package manager to download.

Is there a way to do this?

Thanks much,

William

I am going to tag @robbat2 on this report also to include him since he was part of my discussion on our IRC channel.

@mvdan

This comment has been minimized.

Copy link
Member

@mvdan mvdan commented Dec 1, 2019

@mvdan

This comment has been minimized.

Copy link
Member

@mvdan mvdan commented Dec 1, 2019

A starting point could be go mod download -json, though note that it first downloads the modules, and also that it shows the location of the zip on the local cache once downloaded.

A better approach might be go list -m -json all to get information about all the modules involved in the current module, and constructing the URLs to download the go.mod files or zip source archives from https://proxy.golang.org/. You can use go help goproxy to see what the REST interface looks like.

I'm sure there could be better ways to handle this, though. For example, if you just want to build a subset of the module, you probably don't need to download all of the modules required directly or indirectly by the main module.

@williamh

This comment has been minimized.

Copy link
Author

@williamh williamh commented Dec 2, 2019

@mvdan My thought is to create a cache, e.g. file://${DISTDIR}/go-cache which could be pointed to by GOPROXY so that when the package manager attempts to build the main module it will not need to download from the network. Is this the best way to handle this? Also, how are the paths in the cache created?

@robbat2

This comment has been minimized.

Copy link

@robbat2 robbat2 commented Dec 2, 2019

@mvdan neither of those commands (go mod download -json, go list -m -json all) print the locations of the upstream URLs for zipfiles.

Given a go.mod and go.sum, produce a listing of the URLs, stable filenames to map to.
Using _ as a sample replacement for _ here. Not set on that character yet.

$PROXY/k8s.io/minikube/@v/v1.5.2.info => goproxy-k8s.io_minikube_@v_v1.5.2.info
$PROXY/k8s.io/minikube/@v/v1.5.2.mod => goproxy-k8s.io_minikube_@v_v1.5.2.mod
$PROXY/k8s.io/minikube/@v/v1.5.2.zip  => goproxy-k8s.io_minikube_@v_v1.5.2.zip

The package manager tracks those RHS filenames, and repopulates the expected directory structure for the GOPROXY=file:///... to use.

FYI The package manager also captures & verifies checksums on the URLs.

The trivial case for well-versions stuff I can see producing from this case per the goproxy REST API, but it's the corner cases that I don't follow.

E.g, this line from minikube-1.5.2 go.mod:

github.com/olekukonko/tablewriter v0.0.0-20160923125401-bdcc175572fd

That version doesn't appear in the list endpoint.

@dmitshur dmitshur changed the title allow extraction of urls used to download dependencies cmd/go: allow extraction of urls used to download dependencies Dec 2, 2019
@dmitshur dmitshur added this to the Backlog milestone Dec 2, 2019
@robbat2

This comment has been minimized.

Copy link

@robbat2 robbat2 commented Dec 2, 2019

Are there any specific ASCII characters that are NOT permitted to occur in module strings or version strings?

@mvdan

This comment has been minimized.

Copy link
Member

@mvdan mvdan commented Dec 2, 2019

neither of those commands (go mod download -json, go list -m -json all) print the locations of the upstream URLs for zipfiles.

Yes, I realise that. Please read the rest of my comment above. I meant these as examples to point you in the right direction, not as your perfect solution.

@hyangah

This comment has been minimized.

Copy link
Contributor

@hyangah hyangah commented Dec 2, 2019

@robbat2 Is it not possible to use the cache in the $GOPATH/pkg/mod/cache/download' (the module cache) after running go list -m -json all? The directory structure reflects proxy requests sent to the proxy except .zip. Zip files needed for actual build will have the same base and path but with the .zip extension.

(I wonder if there is any magic flag in list or build that downloads required .zip files as well but skips actual builds)

The details of the proxy protocol including encoding is https://golang.org/cmd/go/#hdr-Module_proxy_protocol. Currently accepted characters and encoding rule is described in https://godoc.org/golang.org/x/mod/module#hdr-Unicode_Restrictions

@jayconrod

This comment has been minimized.

Copy link
Contributor

@jayconrod jayconrod commented Dec 2, 2019

Just to confirm what @mvdan and @hyangah have said:

Running go mod download without arguments within a module will download all the files a module needs to build. After that, it should be possible to build only from the module cache by setting GOPROXY=off.

You can control the location of the module cache by setting GOPATH: it will be in $GOPATH/pkg/mod. Downloaded files are in $GOPATH/pkg/mod/cache/download. It's possible to use the module cache as a proxy by setting GOPROXY=file://$GOPATH/pkg/mod/cache/download.


@williamh @robbat2 One thing I was a little unclear on: is there a restriction against using go mod download to populate the module cache? It sounds like you want to create the cache only using package manager infrastructure without running the go command.

To make a list of URLs for that, you could run go mod download manually once in an empty cache, the convert the file names to URLs. You only need .info, .mod, and .zip files. Something like this might work?

cd go/src/golang.org/x/tools/gopls   # or any other module
export GOPATH=$(mktemp -d)
go mod download
find $GOPATH/pkg/mod/cache/download -type f | \
    grep '\.\(mod\|info\|zip\)$' | \
    sed -e "s,$GOPATH/pkg/mod/cache/download,https://proxy.golang.org,"

(https://proxy.golang.org/ can also be replaced with any other server that implements the proxy protocol).

@jayconrod

This comment has been minimized.

Copy link
Contributor

@jayconrod jayconrod commented Dec 2, 2019

The trivial case for well-versions stuff I can see producing from this case per the goproxy REST API, but it's the corner cases that I don't follow.

E.g, this line from minikube-1.5.2 go.mod:

github.com/olekukonko/tablewriter v0.0.0-20160923125401-bdcc175572fd
That version doesn't appear in the list endpoint.

You shouldn't need to cache the list or latest endpoints. Those are needed to find new versions of modules, but if go.mod is not missing any requirements (i.e., go mod tidy does not change it), then building packages within the module will not cause the go command to hit those endpoints.

@jayconrod

This comment has been minimized.

Copy link
Contributor

@jayconrod jayconrod commented Dec 2, 2019

Are there any specific ASCII characters that are NOT permitted to occur in module strings or version strings?

golang.org/x/mod/module.CheckPath documents the restrictions on module paths.

Additionally, in the proxy protocol and within the module cache, module paths are case encoded so that the cache can be stored on a case-insensitive file system without conflict. go help goproxy explains that.

Sorry the documentation is not in great shape right now. I'm working on a module reference specification that will include all this for Go 1.14 (#33637).

@jayconrod

This comment has been minimized.

Copy link
Contributor

@jayconrod jayconrod commented Dec 2, 2019

(I wonder if there is any magic flag in list or build that downloads required .zip files as well but skips actual builds)

go list all and go build -n all will do something very similar to go mod download. But note that they will apply build constraints (not following imports in excluded source files), and they won't catch test imports unless you ask for them specifically.

@robbat2

This comment has been minimized.

Copy link

@robbat2 robbat2 commented Dec 3, 2019

Just to confirm what @mvdan and @hyangah have said:

Running go mod download without arguments within a module will download all the files a module needs to build. After that, it should be possible to build only from the module cache by setting GOPROXY=off.

Yes, I understood that much already, please see further below.

The package manager tooling will re-create layout of the cache, for all specifically declared modules to the package manager (generated by the package maintainer based on go.mod).

@williamh @robbat2 One thing I was a little unclear on: is there a restriction against using go mod download to populate the module cache? It sounds like you want to create the cache only using package manager infrastructure without running the go command.

Correct, the cache would be pre-populated by the package manager, in the correct layout.

To make a list of URLs for that, you could run go mod download manually once in an empty cache, the convert the file names to URLs. You only need .info, .mod, and .zip files. Something like this might work?

(omit example)

Yes, that example works, but still requires network connectivity. My ask was asking for a trivial modification of go mod download that emits the (absolute or relative) URLs without actually doing the download at that phase.

Package maintainer steps:

  1. (human) Get new go package from upstream that they want to package, verifies the initial download if possible & meaningful (HTTPS, GPG etc)
  2. (tooling) Run maintainer-specific tooling get-ego-vender (or successor) that converts go.mod to package manager directives (URLs etc)
  3. (human) Maintainer makes further edits to the directives for gentoo-specific things (init scripts, documentation, config files).
  4. (tooling) maintainer-specific tooling captures & stores traditional checksums for all files (Gentoo Manifest files)
  5. (human/tooling) maintainer commits ebuild & Manifest.

User steps

# emerge somegopackage
.. input files are somegopackage-version.ebuild, Manifest
.. package manager fetches the declared URLs to local package manager cache
.. package manager starts network sandbox/container
.. package manager arranges the files from it's cache into the expected goproxy cache layout (symlinks or hardlinks to the real locations)
.. package manager calls build process

(https://proxy.golang.org/ can also be replaced with any other server that implements the proxy protocol).

Related question here. I was reviewing the h1: hash mechanism, and it seems that it would be stable for the content & relative paths of files, but it would not capture any file metadata (mtime, permissions, ownership). As such the h1: hash should be stable between any server that implements the proxy protocol, but it's not clear if conventional checksums will be identical (this matters to the package manager).

@robbat2

This comment has been minimized.

Copy link

@robbat2 robbat2 commented Dec 3, 2019

Are there any specific ASCII characters that are NOT permitted to occur in module strings or version strings?

Thanks.

golang.org/x/mod/module.CheckPath documents the restrictions on module paths.

Thanks, as a tidbit there: it tries to describe part of the rules:
the leading path element (up to the first slash, if any), by convention a domain name,
But them it has an incomplete test described for domain names: specifically . and - should not appear adjacent in any domain name, or at the start & end. One . also cannot be adjacent to another ..

Additionally, in the proxy protocol and within the module cache, module paths are case encoded so that the cache can be stored on a case-insensitive file system without conflict. go help goproxy explains that.

Yes, I caught that part already.

@jayconrod

This comment has been minimized.

Copy link
Contributor

@jayconrod jayconrod commented Dec 3, 2019

Yes, that example works, but still requires network connectivity. My ask was asking for a trivial modification of go mod download that emits the (absolute or relative) URLs without actually doing the download at that phase.

We can't provide a general solution for this. If there are multiple sources in the GOPROXY list, the go command will attempt to download from each one, falling back to later sources if an earlier sources returns a "not found" error (either 404 or 410). If one of the sources is direct, there's another process for locating the origin repository, cloning all or part of it, and extracting a zip file from the repository. That can't really be represented with a URL field in the JSON output.

Also, go mod download won't go out to the network at all for modules that are already in the cache. So we couldn't report anything for cached modules unless we also saved where they came from.

Related question here. I was reviewing the h1: hash mechanism, and it seems that it would be stable for the content & relative paths of files, but it would not capture any file metadata (mtime, permissions, ownership). As such the h1: hash should be stable between any server that implements the proxy protocol, but it's not clear if conventional checksums will be identical (this matters to the package manager).

That's true: we only hash module contents, not the archives themselves. There's no promise that module zip files have stable hashes over time; for example file order or compression could change. We ignore metadata when creating and extracting zip files.

(IMO, it would have been better to hash the zip files themselves, but that ship has sailed).

Thanks, as a tidbit there: it tries to describe part of the rules:
the leading path element (up to the first slash, if any), by convention a domain name,
But them it has an incomplete test described for domain names: specifically . and - should not appear adjacent in any domain name, or at the start & end. One . also cannot be adjacent to another ..

Maybe we can tighten that up without breaking anyone. It's technically possible to have a module path that isn't a domain name if it's only served from a proxy server (i.e., there's no need to look up the origin repository). There is code that checks that dots are not allowed at the beginning or end of a path element or together. I don't think a.- is rejected though.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Dec 4, 2019

I think it would be helpful to step up a level so that we can understand the higher-level problem that you want to solve.

Specifically, I would like to understand the need to download .zip files using the Gentoo package manager tooling, rather than downloading the .zip files using go mod download or source files using go mod vendor on the maintainer side of the workflow.

Downloading on the maintainer side of the workflow also seems like it would provide the required checksum stability: if the maintainer, rather than the user, downloads the files, then the maintainer can compute the package mainager's checksum based on that specific instance of those files rather than relying on a specific Go proxy to serve a zipfile with exactly the same bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.