Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: [modules + integration] use several goproxy sources simultaneously #31304

Open
nim-nim opened this Issue Apr 6, 2019 · 9 comments

Comments

Projects
None yet
4 participants
@nim-nim
Copy link

nim-nim commented Apr 6, 2019

This report is part of a series, filled at the request of @mdempsky, focused at making Go modules integrator-friendly.

Please do not close or mark it as duplicate before making sure you’ve read and understood the general context. A lot of work went into identifying problems points precisely.

Needed feature

Go needs to allow using multiple goproxy sources simultaneously (at least 2).

Constrains

  • the sources may be specified in an environment variable, or a config file
  • the config file is probably the best solution as URLs can get long and messy
  • if using a config file, it should reside in /etc/go/something completed or masked by ~/.config/go/something on Linux systems, as per the Filesystem Hierarchy Standard and the XDG Base Directory Specification
  • it is desirable for the list of goproxy sources to be hierarchical:
    • if a suitable module is found in several sources, take it from the higher-priority source
    • or even do not search for better (closer to HDR) module versions in lower priority sources, once something suitable has been found in higher-priority sources
  • one source may be “Internet” (direct download)
  • the “Internet” source must be optional

Motivation

In integrator workflows, a single goproxy can not be used, due to the distinction between:

  • modules that passed QA in another CI/CD job and
  • modules being created in this CI/CD job

For build reliability and security, the first class of modules must be deployed on a read-only goproxy. The second class however, has to be deployed on a read-write goproxy (because the aim of the CI/CD job is to create and write those modules).

@beoran

This comment has been minimized.

Copy link

beoran commented Apr 8, 2019

For this use case, a singe multiplexing goproxy, that forwards to other go proxies could be implemented. Or, simply a file based proxy url in conjunction with go mod pack.

@nim-nim

This comment has been minimized.

Copy link
Author

nim-nim commented Apr 8, 2019

@beoran

It is intended to be used mostly with file goproxies¹ and those proxies are intended to be populated via go mod pack (#31302)

However, a single file goproxy can not be used, due to the distinction between:

  • modules that passed QA in another CI/CD job and
  • modules being created in this CI/CD job

For build reliability and security, the first class of modules must be deployed on a read-only goproxy. The second class however, has to be deployed on a read-write goproxy (because the aim of the CI/CD job is to create and write those modules).

Having to instantiate a go-specific proxy server in each go CI/CD job just because go tools can not read more than one directory of modules, would be complex, inefficient and fragile. As long as it's just copy files, make directory available, run directory indexing command, it's within easy capabilities of any CI/CD system. Adding a go-specific network server to the mix is not.

Remember that a lot of CI/CD systems are not go-specific. They have to since a lot of software is not written in a single language. Anything that requires go-specific processing by the CI/CD system (and is not included in CI/CD system default extension points) is causing problems.

¹ the usual case will be local file goproxy sources (because it's simpler to popular CI/CD job-specific directories than handle the security aspects of allowing access to some urls but not others). Of course, some of the file goproxies may be deployed on network filesystems (but the go command does not need to be aware or that).

@beoran

This comment has been minimized.

Copy link

beoran commented Apr 8, 2019

I see, this is for use with file based goproxies. I agree that having a go mod pack command to easily populate a file based goproxy would be very useful.

I would think that the problem you are describing could be solved by having two different users, one to run the current job that creates the packed modules and that has read only permissions to the goproxy file location, and another user with write permissions to the goproxy directories who, if the build is sucessful, installs them there?

Also, go modules are in essence source only modules without binaries. I wonder, how do distribution maintainers solve this problem for a library in a different programming language, such as Ruby, which also has gem files with source only modules?

@nim-nim

This comment has been minimized.

Copy link
Author

nim-nim commented Apr 8, 2019

I see, this is for use with file based goproxies. I agree that having a go mod pack command to easily populate a file based goproxy would be very useful.

Thanks

I would think that the problem you are describing could be solved by having two different users, one to run the current job that creates the packed modules and that has read only permissions to the goproxy file location, and another user with write permissions to the goproxy directories who, if the build is sucessful, installs them there?

On a FHS Linux system, the read-only goproxy directory would not be just protected against writing, it would be owned by root and deployed at a fixed filesystem location. CI/CD read-write directories however can exist wherever the CI/CD job wants to create them in its own filesystem space.

So it's not just a read-only/read-write separation, you also have a strong filesystem location separation.

Also, go modules are in essence source only modules without binaries. I wonder, how do distribution maintainers solve this problem for a library in a different programming language, such as Ruby, which also has gem files with source only modules?

As far as I know the CI/CD environment works the same for other languages. It creates a contained environment, where things that are not owned by the CI/CD job are deployed in standard locations, and locked against modifications. The CI/CD job can create files and directories in its own separate read-write filesystem hierarchy.

If the CI/CD job is successful a subset of created files and directories is collected, with a mapping to canonical filesystem locations¹. Another CI/CD run can then request to use all of part of the result, that will then be exposed in the read/only filesystem space.

¹ The rpm idea on how to define the mapping is very basic, here is an empty directory, pretend it's / deploy everything you want to be reused inside this directory at the correct location relative to the fake root.

@nim-nim

This comment has been minimized.

Copy link
Author

nim-nim commented Apr 10, 2019

I wonder, how do distribution maintainers solve this problem for a library in a different programming language, such as Ruby, which also has gem files with source only modules?

BTW part of the motivation of the report series (especially #31300) is to help Go software benefit and catch up to the state of the art in CI/CD systems Linux side, which is evolving right now due to requests from the Rust and Golang Fedora SIGs (rpm-software-management/rpm#104 rpm-software-management/rpm#593 rpm-software-management/mock#245)

Other language SIGs like Python and Java were also involved in the design in a less direct way. A lot of our system tools use Python so anything that does not work for Python would have been DOA. Java would really need this too, but is hampered by the multiplicity of its component systems, and years of code rot (due to the "peg specific commit" / "rename and fork on change" / "never merge back" Java dev mindset). I’m pretty pessimistic of Java being able to leverage any CI/CD improvement in the short term.

@mdempsky

This comment has been minimized.

Copy link
Member

mdempsky commented Apr 15, 2019

This task seems best handled by a multiplexing proxy server like @beoran suggested. Or if a file based approach is preferred, then building a tool that creates a symlink tree on disk. I don't see a need to complicate cmd/go.

Moreover, it seems like a separate tool/server would make it easier to adapt to the CI/CD system's evolving needs, rather than being blocked waiting on the Go project to review changes.

Having to instantiate a go-specific proxy server in each go CI/CD job just because go tools can not read more than one directory of modules, would be complex, inefficient and fragile.

Making tools responsible for more tasks when they can be split out separately seems contrary to UNIX design. E.g., we have tee(1) instead of adding the ability to every command to write to both stdout and one or more files.

As long as it's just copy files, make directory available, run directory indexing command, it's within easy capabilities of any CI/CD system. Adding a go-specific network server to the mix is not.

The CI/CD system is running Go-specific commands though when building a Go package, right? What's the difference if one of those commands starts an extra background process?

A proxy server doesn't have to be long-lived. It can be ephemerally launched during a build, run while Go is building, and then get torn down with the rest of the build container.

@nim-nim

This comment has been minimized.

Copy link
Author

nim-nim commented Apr 15, 2019

@mdempsky

Pointing a command to a directory with module files is simple fast and without side effects. Simple is good. Simple is reliable.

Even the init process is able to read several directories of unit files, without needing help, let alone the network. Directories exist to help organize and manage files (here organize external-to-job and internal-to-job modules). What's so strange or difficult about using multiple directories? Even Go modules use multiple package directories inside their zip file. No other computing language has a problem reading more than one directory of components.

Both symlink trees and server processes are a management headache.

Housekeeping a symlink tree is always surprisingly tricky, way more complex than reading several directories.

Launching a server process, no matter how ephemeral, is a can of worms in terms of ip/port collisions, network filtering and access rules. Remember that the CI/CD is a secure environment, it's not open bar, the network layer is contained and the kind of containment and filter varies from environment to environment.

In unix everything is a file and reading files is encouraged. Dumping everything on the network to avoid file reads is definitely not unix philosophy.

@mdempsky

This comment has been minimized.

Copy link
Member

mdempsky commented Apr 15, 2019

What's so strange or difficult about using multiple directories?

It's complexity that most users don't need. E.g., GOPATH supports multiple directories, and based on the number of blog posts and shell snippets I see (even from expert Go programmers) that use $GOPATH as though it'll always expand to a single directory, I suspect usage and even awareness of that feature is very low.

Moreover, it's unnecessary complexity. The extension point to build an external tool that makes multiple directories look like a single one already exists.

Housekeeping a symlink tree is always surprisingly tricky, way more complex than reading several directories.

So use an overlay or union filesystem then.

@nim-nim

This comment has been minimized.

Copy link
Author

nim-nim commented Apr 16, 2019

@mdempsky

What's so strange or difficult about using multiple directories?

It's complexity that most users don't need.

It's complexity so basic and common it's even included in the init process.

E.g., GOPATH supports multiple directories [...] I suspect usage and even awareness of that feature is very low.

And we use it heavily. Modules are breaking our CI/CD setup. The CI/CD setup is used for more than Go software and is not going to change drastically just for Go modules. What will probably happen is either some form of disabling of Go modules in Fedora and RHEL, or years of cludges giving Fedora and Go a bad rep because the end result won't work well.

Housekeeping a symlink tree is always surprisingly tricky, way more complex than reading several directories.

So use an overlay or union filesystem then.

That's drastic CI/CD rework for Go land. Again, one of the core objective of the CI/CD system is to be simple, understandable and without side effects. Heavy hammers like custom overlays and network access aren't in this category.

Besides, the way Go modules specified list indexes intermingled with module payload files, it is not possible to separate writes into different overlay layers (even if it was not a huge can of worms to start with) without deep overlay awareness to write corresponding indexes at the correct layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.