New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per-interface sysctls #47686
base: master
Are you sure you want to change the base?
Per-interface sysctls #47686
Conversation
79f4458
to
745a4c4
Compare
// Only try to migrate settings for "eth0", anything else would always | ||
// have behaved unpredictably. | ||
if spl[3] != "eth0" { | ||
return "", fmt.Errorf(`unable to determine network endpoint for sysctl %s, use '--network=name=%s,sysctl=%s' or compose 'driver_opts: "%s":"%s"`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't put CLI-specific or Compose-specific remediation steps here -- the API could be called by other tools where those steps won't make any sense.
OTOH CLI error messages sometimes looks cryptic for users not familiar with our API. I think we don't have a plan for 'augmenting' CLI error messages with remediation steps. Maybe that's something we need to discuss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's not great, but I'm not sure how best to improve it.
I think it's quite important that we give good clues about how to specify per-interface sysctls - here, for the migration case below, and for when we refuse to migrate from the top level --sysctl
in a future release.
A "for example" might help a little, since CLI and compose are probably the common cases. But, not really.
Maybe the best we can do is just delete the hints, and hope the user's able to find the right section of the docs.
I'm not sure how augmented CLI messages would work, perhaps the API would need to return some token that'd tell the client to explain how to set per-interface sysctls in its world (extended --network
syntax for the CLI, or driver-opts
in compose)? I'm probably missing the point (?!), but any sort of mechanism like that sounds like a big change that'd have to be out-of-scope here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the message (and updated the examples in PR description to show it) ... now it only mentions the driver-opt label - and the user will have to figure out how to use it.
But I've also updated the CLI PR docker/cli#4994 to get rid of the --network sysctl=
option and document the use of [create|run] --network driver-opt=com.docker.network.endpoint.sysctls=[value]
and network connect --driver-opt=com.docker.network.endpoint.sysctls=
to set multiple sysctls for an endpoint.
return "", nil | ||
} | ||
|
||
// TODO(robmry) - refuse to do the migration, generate an error if API > some-future-version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next API version should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should land in release 27.0 (along with re-removing the SetKey hook that requires it). So, we'd want to deprecate per-interface sysctls in --sysctl
in 27.0, and remove the auto-migration in 28.0.
The current API version is 1.45, and it will be in the upcoming release 26.1. But, it might change in 27.0? In that case, if we make this code check for API version >1.45, we'll have accidentally removed the auto-migration in release 27.0.
So, it's probably best to raise a new issue to say a version check needs to be added, and mark it for milestone 28.0?
// TODO(robmry) - refuse to do the migration, generate an error if API > some-future-version. | ||
|
||
newDriverOpt := strings.Join(netIfSysctls, ",") | ||
warning := fmt.Sprintf(`Migrated %s to DriverOpts{"%s":"%s"}. (Use "--network=name=%s,sysctl=%s", or compose "driver_opts".)`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as for the error above.
745a4c4
to
5d0ab3f
Compare
5d0ab3f
to
5d70d23
Compare
5d70d23
to
2edd300
Compare
Signed-off-by: Rob Murray <rob.murray@docker.com>
Signed-off-by: Rob Murray <rob.murray@docker.com>
2edd300
to
2681c58
Compare
// TODO(robmry) - refuse to do the migration, generate an error if API > some-future-version. | ||
|
||
newDriverOpt := strings.Join(netIfSysctls, ",") | ||
warning := fmt.Sprintf(`Migrated %s to DriverOpts{"%s":"%s"}.`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning := fmt.Sprintf(`Migrated %s to DriverOpts{"%s":"%s"}.`, | |
warning := fmt.Sprintf(`Migrated sysctl %q to DriverOpts{%q:%q}.`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second example in the description:
# docker run --rm -ti --name c1 --network mynet --sysctl=net.ipv6.conf.eth0.accept_ra=2 --sysctl=net.ipv6.conf.eth0.forwarding=1 alpine
WARNING: Migrated net.ipv6.conf.eth0.accept_ra,net.ipv6.conf.eth0.forwarding to DriverOpts{"com.docker.network.endpoint.sysctls":"ipv6.conf.accept_ra=2,ipv6.conf.forwarding=1"}.
Would become:
# docker run --rm -ti --name c1 --network mynet --sysctl=net.ipv6.conf.eth0.accept_ra=2 --sysctl=net.ipv6.conf.eth0.forwarding=1 alpine
WARNING: Migrated sysctl "net.ipv6.conf.eth0.accept_ra,net.ipv6.conf.eth0.forwarding" to DriverOpts{"com.docker.network.endpoint.sysctls":"ipv6.conf.accept_ra=2,ipv6.conf.forwarding=1"}.
Maybe that's ok, or would be without the quotes around the list, and ignoring sysctl vs. sysctls.
Another option would be:
# docker run --rm -ti --name c1 --network mynet --sysctl=net.ipv6.conf.eth0.accept_ra=2 --sysctl=net.ipv6.conf.eth0.forwarding=1 alpine
WARNING: Migrated sysctl "net.ipv6.conf.eth0.accept_ra" to DriverOpts{"com.docker.network.endpoint.sysctls":"ipv6.conf.accept_ra=2"}.
WARNING: Migrated sysctl "net.ipv6.conf.eth0.forwarding" to DriverOpts{"com.docker.network.endpoint.sysctls":"ipv6.conf.forwarding=1"}.
But, that doesn't show the value that really needs to be passed to DriverOpts.
I think it's unambiguous as-is, the named sysctls won't be interpreted as anything but sysctls from the request.
But, could do Migrated sysctl %s to DriverOpts{%q:%q}.
if you think it's necessary?
@@ -615,6 +615,18 @@ func validateEndpointSettings(nw *libnetwork.Network, nwName string, epConfig *n | |||
} | |||
} | |||
|
|||
if sysctls, ok := epConfig.DriverOpts[netlabel.EndpointSysctls]; ok { | |||
for _, sysctl := range strings.Split(sysctls, ",") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naively using comma as a delimiter without any escaping could come to bite us. While there are no documented per-iface sysctls I could find which have a comma-separated list as the value, there is existing precedent in other sysctls. While we could come up with a microformat which allows commas in values to be quoted or escaped, that seems like a lot of fuss. Could there be a better way? (Apologies if I'm rehashing some stuff we discussed earlier.)
The problem is with packing multiple sysctls into a single DriverOpts value. What if we instead encoded each sysctl as a distinct DriverOpt so that the keys and values are already structured?
{
"com.docker.endpoint.sysctl.net.ipv4.conf.arp_announce": "1",
"com.docker.endpoint.sysctl.net.ipv6.conf.dad_transmits": "4"
}
The only downside to this scheme is that it leaves the order that sysctls are applied undefined unless we take steps to define an ordering. As applying per-iface sysctls out-of-order is currently only a theoretical issue, we can probably get away with defining an arbitrary consistent order that is independent of Go's map iteration randomization. (I have a hunch that sorting lexicographically by path, — splitting the sysctl key on periods and sorting by the resulting tuples — would yield a different and more intuitive order than a naive strings.Sort
, but I have not yet found a counterexample.)
YAGNI, but if user-controlled sysctl order is actually required, we could afford users a way to provide an explicit sort order discriminant, kinda like the classic "prefix your init scripts with a number to control the order they are invoked." For example:
{
"com.docker.endpoint.sysctl.net.ipv6.conf.forwarding": "1",
"com.docker.endpoint.sysctl[42].net.ipv6.conf.accept_redirects": "1",
"com.docker.endpoint.sysctl[99].net.ipv6.conf.accept_ra": "2"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, yes - there's some chance that maybe one day there might be a sysctl value format that causes problems.
In the meantime, it's a trade-off. Per-sysctl options would unnecessarily make commands more verbose right now, and sacrifice the ordering. The ordering isn't important at the moment - but I don't think we can predict with any certainty whether future sysctls will introduce comma-formatting or a requirement for ordering.
docker run --network='name=bad="driver-opt=com.docker.network.endpoint.sysctls=ipv4.conf.log_martians=1,ipv4.conf.forwarding=0"' ...
docker run --network=name=worse=driver-opt=com.docker.network.endpoint.sysctl.ipv4.conf.log_martians=1,driver-opt=com.docker.network.endpoint.sysctl.ipv4.conf.forwarding=0 ...
So, for the foreseeable future, I think it's better as-is. Most likely YAGNI but, if the situation does arise, per-sysctl options can be implemented as an alternative at that point with no issues for backwards compatibility.
libnetwork/osl/interface_linux.go
Outdated
scPath = append(scPath, sk[2:]...) | ||
|
||
sysPath := filepath.Join(scPath...) | ||
errC := make(chan error, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n.InvokeFunc
is a synchronous blocking call so there is no need to communicate through a channel.
var errF error
f := func() {
if err := ...; err != nil {
errF = err
return
}
}
if err := n.InvokeFunc(f); err != nil {
return err
}
if errF != nil {
return err
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done.
Until now it's been possible to set per-interface sysctls using, for example, '--sysctl net.ipv6.conf.eth0.accept_ra=2'. But, the index in the interface name is allocated serially, and the numbering in a container with more than one interface may change when a container is restarted. The change to make it possible to connect a container to more than one network when it's created increased the ambiguity. This change adds label "com.docker.network.endpoint.sysctls" to the DriverOpts in EndpointSettings. This option is explicitly associated with the interface. Settings in "--sysctl" for "eth0" are migrated to DriverOpts. Because using "--sysctl" with any interface apart from "eth0" would have unpredictable results, it is now an error to use any other interface name in the top level "--sysctl" option. The error message includes a hint at how to use the new per-interface setting. The per-endpoint sysctl name is a shortened form of the sysctl name, intended to limit settings to 'net.*', and to eliminate the need to identify the interface by name. For example: net.ipv6.conf.eth0.accept_ra=2 becomes: ipv6.conf.accept_ra=2 The value of DriverOpts["com.docker.network.endpoint.sysctls"] is a comma separated list of these short-form sysctls. Settings from '--sysctl' are applied by the runtime lib during task creation. So, task creation fails if the endpoint does not exist. Applying per-endpoint settings during interface configuration means the endpoint can be created later, which paves the way for removal of the SetKey OCI prestart hook. Unlike other DriverOpts, the sysctl label itself is not driver-specific, but each driver has a chance to check settings/values and raise an error if a setting would cause it a problem - no such checks have been added in this initial version. As a future extension, if required, it would be possible for the driver to echo back valid/extended/modified settings to libnetwork for it to apply to the interface. (At that point, the syntax for the options could become driver specific to allow, for example, a driver to create more than one interface). Signed-off-by: Rob Murray <rob.murray@docker.com>
2681c58
to
93e43d5
Compare
- What I did
Until now it's been possible to set per-interface sysctls using, for example,
--sysctl net.ipv6.conf.eth0.accept_ra=2
. But, the index in the interface name is allocated serially, and the numbering in a container with more than one interface may change when a container is restarted. The change to make it possible to connect a container to more than one network when it's created increased the ambiguity.This change adds label
com.docker.network.endpoint.sysctls
to the DriverOpts in EndpointSettings. This option is explicitly associated with the interface.Settings in
--sysctl
foreth0
are migrated toEndpointSettings.DriverOpts
.Because using
--sysctl
with any interface apart frometh0
would have unpredictable results, it is now an error to use any other interface name in the top level--sysctl
option. The error message includes a hint at how to use the new per-interface setting.The per-endpoint sysctl name is a shortened form of the sysctl name, intended to limit settings to 'net.*', and to eliminate the need to identify the interface by name. For example:
net.ipv6.conf.eth0.accept_ra=2
becomes:
ipv6.conf.accept_ra=2
The value of
DriverOpts["com.docker.network.endpoint.sysctls"]
is a comma separated list of these short-form sysctls.Settings from
--sysctl
are applied by the runtime lib during task creation. So, task creation fails if the endpoint does not exist. Applying per-endpoint settings during interface configuration means the endpoint can be created later, which paves the way for removal of the SetKey OCI prestart hook.Unlike other DriverOpts, the sysctl label itself is not driver-specific, but each driver has a chance to check settings/values and raise an error if a setting would cause it a problem - no such checks have been added in this initial version. As a future extension, if required, it would be possible for the driver to echo back valid/extended/modified settings to libnetwork for it to apply to the interface. (At that point, the syntax for the options could become driver specific to allow, for example, a driver to create more than one interface.)
Related changes are needed in the CLI, to make it possible to set the new DriverOpts value ... it's not possible to set them using the existing advanced
--network
syntax, because the list of sysctl values includes=
and,
characters (so, can't be distinguished from separators in the--network
syntax)... docker/cli#4994- How I did it
- How to verify it
New unit and integration tests.
And ...
Migration of one or two top level
--sysctl
settings ...Inspect output ...
No migration for
eth1
...Attempt to set a per-interface sysctl for mpls ...
Nonexistent
sysctl
(very verbose error message at the moment) ...But, a lot of that waffle will go-away once the prestart hook is removed and settings are applied after task creation. It'll be more like ...
- Description for the changelog