New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate IPAM config before handing it over to libnetwork #45759
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in the maintainers call / @akerouanton is making the below changes:
- Semantics for existing networks look good
- We want to hard fail in new major versions of the engine, and soft fail in older versions:
- This is in place of the current behavioral difference based on API version.
- This should be accomplished by a two-commit dance: one commit introduces the warning, another changes it to an error.
- Cherry-picks to older branches should exclude the commit that makes this an error.
257543e
to
1a2a9ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks; I left some suggestions; let me know what you think.
I committed some of my suggestions in a branch, and just opened a PR against your fork in case that helps you seeing them as a whole (except for moving the validation to libnetwork if you agree on that);
type ipFamily int | ||
|
||
const ( | ||
ip4 ipFamily = 4 | ||
ip6 ipFamily = 6 | ||
) | ||
|
||
func validateIPAM(ipam *network.IPAM, enableIPv6 bool) *multierror.Error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit on the fence on having all this validation code in the API router. Given that we're already importing libnetwork, perhaps it'd make sense to move the validation code there (not sure where the right place would be inside libnetwork, perhaps in the ipam
or ipamapi
package.
We can either have the function exported (and call it here; ipam.Validate(...)
), or perhaps as part of the final commit (which turns it into an error), move it into backend.CreateNetwork()
. That way calling that function would also validate before creating (regardless of where it's called from).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit on the fence on having all this validation code in the API router.
I think the best would be to add IPAM validation call to (*Daemon).CreateNetwork()
. However, the following method from swarmkit would prevent converting errors into warnings when the network is created from a non-leader manager node, as it returns either the network created or an error, but not both:
Also, there's already some sort of network validation in the daemon pkg (ie. d.containerStart() -> d.initializeNetworking() -> d.allocateNetwork() -> d.connectToNetwork() -> d.updateNetworkConfig() -> validateNetworkingConfig()
).
Given that we're already importing libnetwork, perhaps it'd make sense to move the validation code there
Currently the API only uses constants and errors from libnetwork packages. If we call IPAM validation helper from the API due to my previous point, these helpers would need to either:
- Rely on the API type
IPAMConfig
, and thus import it into libnetwork -- that'd be a bad separation of concern IMO ; - Convert API type
IPAMConfig
into libnetworkIpamConf
. And then, errors would have to either reference fields from the latter struct and re-translated to match the former -- that'd make the validation layer more complex with no real gains. Or modeled afterIPAMConfig
, but then we're back at bad separation of concerns ;
Also, I think we should make sure as much as possible that libnetwork data structures never end up in an invalid state -- that's the crux of the original issue I'm solving here. Maintaining libnetwork is way harder if we can't trust this rule is enforced. This also discards option 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the API only uses constants and errors from libnetwork packages. If we call IPAM validation helper from the API due to my previous point, these helpers would need to either:
- Rely on the API type IPAMConfig, and thus import it into libnetwork -- that'd be a bad separation of concern IMO ;
- Convert API type IPAMConfig into libnetwork IpamConf. And then, errors would have to either reference fields from the latter struct and re-translated to match the former -- that'd make the validation layer more complex with no real gains. Or modeled after IPAMConfig, but then we're back at bad separation of concerns ;
- Also, I think we should make sure as much as possible that libnetwork data structures never end up in an invalid state -- that's the crux of the original issue I'm solving here. Maintaining libnetwork is way harder if we can't trust this rule is enforced. This also discards option 2.
- Looking at the validation we have in this PR, we only need
create.IPAM
, andcreate.EnableIPv6
. backend.CreateNetwork()
is implemented bydaemon.CreateNetwork()
/daemon.createNetwork()
- .. which already imports
"github.com/docker/docker/libnetwork/ipamapi"
- .. and has a
getIpamConfig()
function (which can produce an error for invalid configuration)
And on the calling site (network_routes.go
), we already have some checking for error-types returned by the backend (e.g. libnetwork.NetworkNameError
) to turn errors into warnings.
So my thinking is to
- have the validation either in the backend (
daemon.createNetwork()
->getIpamConfig()
) itself (or in theipamapi
package, and called from there) - make it return a sentinel error (for example,
libnetwork.InvalidIPAMConfig
) - ^^ Ideally that error would implement
errdefs.InvalidParam
For the current release, we catch that (libnetwork.InvalidIPAMConfig
), and turn them into a warning, and for the next release, we remove that check, and return the error as-is (producing a 4XX status if it implements errdefs.InvalidParam
)
Would that work?
145e598
to
49f45ad
Compare
b5d9ef0
to
6390bb1
Compare
I did a few things since the last round of reviews:
|
6390bb1
to
4e9df48
Compare
PR moby#45759 is going to use the new `errors.Join` function to return a list of validation errors. Signed-off-by: Albin Kerouanton <albinker@gmail.com>
PR moby#45759 is going to use the new `errors.Join` function to return a list of validation errors. Signed-off-by: Albin Kerouanton <albinker@gmail.com>
PR moby#45759 is going to use the new `errors.Join` function to return a list of validation errors. Signed-off-by: Albin Kerouanton <albinker@gmail.com>
4e9df48
to
8c5db02
Compare
Some network integration tests are creating networks with subnet `2001:389::1/64`. This is an invalid subnet as the host fragment is non-zero (ie. `::1`). PR moby/moby#45759 is adding strict validation of network configuration. Docker API will now return a warning (on API < 1.44) or an error (on API >= 1.44) whenever a bad subnet is passed. Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Some network integration tests are creating networks with subnet `2001:389::1/64`. This is an invalid subnet as the host fragment is non-zero (ie. it should be `2001:389::/64`). PR moby/moby#45759 is adding strict validation of network configuration. Docker API will now return an error whenever a bad subnet is passed. Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Currently, IPAM config is never validated by the API. Some checks are done by the CLI, but they're not exhaustive. And some of these misconfigurations might be caught early by libnetwork (ie. when the network is created), and others only surface when connecting a container to a misconfigured network. In both cases, the API would return a 500. Although the `NetworkCreate` endpoint might already return warnings, these are never displayed by the CLI. As such, it was decided during a maintainer's call to return validation errors _for all API versions_. Signed-off-by: Albin Kerouanton <albinker@gmail.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Prior to moby#44968, libnetwork would happily accept a ChildSubnet with a bigger mask than its parent subnet. In such case, it was producing IP addresses based on the parent subnet, and the child subnet was not allocated from the address pool. This commit automatically fixes invalid ChildSubnet for networks stored in libnetwork's datastore. Signed-off-by: Albin Kerouanton <albinker@gmail.com>
fbb597e
to
3e8af08
Compare
Pushed a small fix to temporarily skip the docker-py tests that we know are currently failing until we have a version with docker/docker-py#3169 included; https://github.com/moby/moby/compare/fbb597e1cf2f207fb84ae7649222f31fceaaa055..3e8af0817a968809d477cd8bdf1832dacc791d69 - : "${PY_TEST_OPTIONS:=--junitxml=${DEST}/junit-report.xml --deselect=tests/integration/api_container_test.py::AttachContainerTest::test_attach_no_stream --deselect=tests/integration/api_container_test.py::CreateContainerTest::test_create_with_device_cgroup_rules --deselect=tests/integration/api_volume_test.py::TestVolumes::test_prune_volumes}"
+ # TODO re-enable test_connect_with_ipv6_address after we updated to a version of docker-py with https://github.com/docker/docker-py/pull/3169
+ # TODO re-enable test_create_network_ipv6_enabled after we updated to a version of docker-py with https://github.com/docker/docker-py/pull/3169
+ # TODO re-enable test_create_with_ipv6_address after we updated to a version of docker-py with https://github.com/docker/docker-py/pull/3169
+ : "${PY_TEST_OPTIONS:=--junitxml=${DEST}/junit-report.xml --deselect=tests/integration/api_container_test.py::AttachContainerTest::test_attach_no_stream --deselect=tests/integration/api_container_test.py::CreateContainerTest::test_create_with_device_cgroup_rules --deselect=tests/integration/api_volume_test.py::TestVolumes::test_prune_volumes --deselect=tests/integration/api_network_test.py::TestNetworks::test_connect_with_ipv6_address --deselect=tests/integration/api_network_test.py::TestNetworks::test_create_network_ipv6_enabled --deselect=tests/integration/api_network_test.py::TestNetworks::test_create_with_ipv6_address}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if CI goes green 👍
Fix moby#46368. PR moby#45759 added a validation step to `NetworkCreate` to ensure no IPv6 subnet could be set on a network if its `EnableIPv6` parameter is false. Before that, the daemon was accepting such request but was doing nothing with the IPv6 subnet. This validation step is now deleted, and we automatically set `EnableIPv6` if an IPv6 subnet was specified. Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Fix moby#46368. PR moby#45759 added a validation step to `NetworkCreate` to ensure no IPv6 subnet could be set on a network if its `EnableIPv6` parameter is false. Before that, the daemon was accepting such request but was doing nothing with the IPv6 subnet. This validation step is now deleted, and we automatically set `EnableIPv6` if an IPv6 subnet was specified. Signed-off-by: Albin Kerouanton <albinker@gmail.com>
- What I did
Prior to this change, IPAM config was never validated by the API. Some checks are done by the CLI, but it's not exhaustive (and doesn't cover API users anyway). Some of these errors might be caught early by libnetwork (eg. when the network is created), and other errors would surface only when connecting a container to a network with faulty IPAM. In both cases, the API would return a 500 instead of returning a proper 400 at creation time.
Although the
NetworkCreate
endpoint might already return warnings, these are never displayed by the CLI. As such, it was decided during a maintainer's call to return validation errors for all API versions.Prior to #44968, the daemon would happily accept an
--ip-range
with a larger mask than its parent subnet. In such case, the IPAM allocator would implicitly ignore the ip-range (not because it detected it was a wrong value, but due to the way computation was done). As such, the 2nd commit explicitly discard the ip-range and replace it with the subnet value when it detects it's a bad value. This error would be caught by validation added in the first commit, so it's only useful for network restored from state stored on-disk.One such example can be seen in docker/for-mac#6870. The user creates a network with an ip-range bigger than its subnet, and with some host bits set. When they try to create a container connected to that network, the daemon returns this error:
docker: Error response from daemon: invalid bit range [0, 16777215).
. This defensive check is right, but the user should be warned of bad IPAM config earlier and in a more graceful way.- How to verify it
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)