Skip to content
This repository has been archived by the owner on Jul 28, 2019. It is now read-only.

DinDinD - Need way to configure NAT64 V4 Subnet without multicluster kicking in #220

Closed
leblancd opened this issue Sep 20, 2018 · 8 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@leblancd
Copy link
Contributor

For running K-D-C within a container (DinDinD configuration, as used in CI), we would like to have a way of specifying the NAT64 V4 subnet without having the scripts set up a cluster in multi-cluster mode. It would be better if the setting of NAT64 V4 subnet and the triggering of multi-cluster mode could be done independently.

Background:
When running K-D-C within a container (DinDinD configuration), we need a way of specifying the NAT64 V4 Subnet to be something different than the host Docker network (typically 172.17.0.0/16) and different than the container's base docker network (typically 172.18.0.0/16). For example, we'd like to set the K-D-C NAT64 V4 subnet to something like 172.20.0.0/16.

With the current K-D-C scripts, we would configure a NAT64 V4 subnet of 172.20.0.0/16 with the following environment variables:
export NAT64_V4_SUBNET_PREFIX=172
export CLUSTER_ID=20

The problem with this is that by specifying CLUSTER_ID, the K-D-C scripts automatically assume multi-cluster operation. For DinDinD, we don't need multi-cluster operation... the simpler non-multi-cluster operation would work fine in each individual DinDinD container.

For example, with the environment variables configured above, because multi-cluster mode is initiated, kubectl commands executed in the host DinDinD container must include a context:

root@d1c26b6ea88d:~/.kube# kubectl get pods
The connection to the server localhost:8080 was refused - did you specify the right host or port?
root@d1c26b6ea88d:~/.kube# kubectl --context dind-cluster-20 get pods
NAME                                             READY     STATUS    RESTARTS   AGE
kube-apiserver-kube-master-cluster-20            1/1       Running   1          2m
kube-controller-manager-kube-master-cluster-20   1/1       Running   0          2m
kube-scheduler-kube-master-cluster-20            1/1       Running   0          2m
root@d1c26b6ea88d:~/.kube# 

There may be other aspects of multi-cluster operation that are not desirable for DinDinD operation.

@pmichali
Copy link
Contributor

pmichali commented Sep 20, 2018

For DinDinD you can use a cluster ID of zero, which means first cluster of a multi-cluster configuration or single cluster mode. You can omit the CLUSTER_ID and it will default to zero.

So, in a single-cluster or first cluster of a multi-cluster, you can omit CLUSTER_ID and end up with names kube-master, kube-node-1, kube-node-2,...

For multi-cluster, additional clusters can use the CLUSTER_ID to uniquely identify the cluster.

@pmichali
Copy link
Contributor

pmichali commented Sep 20, 2018

From discussion with @leblancd, the NAT64 mapping network, should be in RFC-1918 private network range. Currently, one can use a prefix of 10, and comply with that requirement, but we may want to consider using the 172.16.0.0/12 subnet and ensure addresses are within limits.

@pmichali
Copy link
Contributor

Some options are:

Option A

User specifies a two octet NAT64 prefix, iff they want to override. Default: 172.17. Subnet will be ..0.0/16 always.

Single cluster, no settings needed, get 172.17.0.0/16.
Single cluster that needs special subnet, set NAT64_V4_SUBNET_PREFIX=172.20, for example.

Multicluster, for all but the first cluster (as long as no docker conflict), user sets NAT64_V4_SUBNET_PREFIX=172.X, where X is unique and between 17..31 and doesn’t conflict with docker.

Advantages: Can use any prefix (e.g. 10.50).
Disadvantages: For multi-cluster, must specify this, along with CLUSTER_ID, for all but first cluster.

Option B

User specifies two octet NAT64 prefix, with default 172.17. Subnet will be .<PREFIX+CLUSTER_ID>.0.0/16

Single cluster, no settings needed, get 172.17.0.0/16.
Single cluster that needs special subnet, set NAT64_V4_SUBNET_PREFIX=172.20, for example.

Multicluster, if use 172.X as prefix, X must be between 17..31 and doesn’t conflict with docker. In addition, CLUSTER_ID must be 0..(31-X), so that it stays within limits of private network. If use 10.X, just need cluster ID 0..(254-X).

Advantages: Don’t need to specify, unless have a docker issue.
Disadvantages: Applies some limits on CLUSTER_ID and the limits differ based on 172.x or 10.y used for prefix.

Option C

User specifies one octet NAT64 base, between 17..31, with default 17. Subnet will be 172.<BASE+CLUSTER_ID>.0.0/16.

Single cluster, no settings needed, get 172.17.0.0/16.
Single cluster that needs special subnet, set NAT64_V4_SUBNET_BASE=20, for example.

Multicluster: can override base, if needed. CLUSTER_ID must be 0..(31-base), so that it stays within limits of private network.

Advantages: Don’t need to specify, unless have a docker issue. Maybe easier specification.
Disadvantages: CLUSTER_ID must be in a specific range. Doesn't allow use of 10.x.

Variation: Allow prefix to specify first octet, which makes this effectively the same as option B, only more complicated, by specifying two items. Subnet would be .<BASE+CLUSTER_ID>.0.0/16.

@pmichali
Copy link
Contributor

Working up a solution for this issue, should be posted today.

pmichali pushed a commit to pmichali/kubeadm-dind-cluster that referenced this issue Sep 24, 2018
This commit does several things related to the NAT64 prefix, as specified
by the NAT64_V4_SUBNET_PREFIX environment variable. This prefix is for a
/16 subnet.

First, we want the prefix to be within one of the two private network
ranges (172.16.0.0/12 or 10.0.0.0/8).

Second, to accommodate that, the NAT64_V4_SUBNET_PREFIX will be two octets,
instead of one. The default, if not specified, will be 172.18, to avoid
docker usage of that private network.

Third, the code will range check the prefix, to ensure that it is within
range, based on the private network selected. 172.16 to 172.31 or 10.0 to
10.253 values are allowed.

Fourth, the cluster ID is added to the prefix, so that a unique subnet is
used for each cluster. This affects the allowable values for the prefix.

For 172.16.0.0/12, the prefix plus cluster ID must be from 172.16 to
172.31. For 10.0.0.0/8, the prefix plus cluster ID must be from 10.0 to
10.253. So, for example, if the default 172.18 is used, then cluster IDs
can be from 0 to 13.

Another side effect of this change is w.r.t. legacy mode, where the user
specifies (only) the DIND_LABEL. In that case, a cluster ID is generated,
and we now will use numbers from 1..13 to help keep the values within the
range for the V4 mapping prefix (using 13 instead of 15 as the default
prefix is 172.18).

If the user wants to use the legacy DIND_LABEL, but have a larger range
for cluster IDs, they can set the NAT64_V4_SUBNET_PREFIX to the 10.0.0.0/8
subnet and/or explicitly set the CLUSTER_ID.

Fixes Issue: kubernetes-retired#220
pmichali pushed a commit to pmichali/kubeadm-dind-cluster that referenced this issue Sep 24, 2018
This commit does several things related to the NAT64 prefix, as specified
by the NAT64_V4_SUBNET_PREFIX environment variable. This prefix is for a
/16 subnet.

First, we want the prefix to be within one of the two private network
ranges (172.16.0.0/12 or 10.0.0.0/8).

Second, to accommodate that, the NAT64_V4_SUBNET_PREFIX will be two octets,
instead of one. The default, if not specified, will be 172.18, to avoid
docker usage of that private network.

Third, the code will range check the prefix, to ensure that it is within
range, based on the private network selected. 172.16 to 172.31 or 10.0 to
10.253 values are allowed.

Fourth, the cluster ID is added to the prefix, so that a unique subnet is
used for each cluster. This affects the allowable values for the prefix.

For 172.16.0.0/12, the prefix plus cluster ID must be from 172.16 to
172.31. For 10.0.0.0/8, the prefix plus cluster ID must be from 10.0 to
10.253. So, for example, if the default 172.18 is used, then cluster IDs
can be from 0 to 13.

Another side effect of this change is w.r.t. legacy mode, where the user
specifies (only) the DIND_LABEL. In that case, a cluster ID is generated,
and we now will use numbers from 1..13 to help keep the values within the
range for the V4 mapping prefix (using 13 instead of 15 as the default
prefix is 172.18).

If the user wants to use the legacy DIND_LABEL, but have a larger range
for cluster IDs, they can set the NAT64_V4_SUBNET_PREFIX to the 10.0.0.0/8
subnet and/or explicitly set the CLUSTER_ID.

Fixes Issue: kubernetes-retired#220
pmichali pushed a commit to pmichali/kubeadm-dind-cluster that referenced this issue Sep 24, 2018
This commit does several things related to the NAT64 prefix, as specified
by the NAT64_V4_SUBNET_PREFIX environment variable. This prefix is for a
/16 subnet.

First, we want the prefix to be within one of the two private network
ranges (172.16.0.0/12 or 10.0.0.0/8).

Second, to accommodate that, the NAT64_V4_SUBNET_PREFIX will be two octets,
instead of one. The default, if not specified, will be 172.18, to avoid
docker usage of that private network.

Third, the code will range check the prefix, to ensure that it is within
range, based on the private network selected. 172.16 to 172.31 or 10.0 to
10.253 values are allowed.

Fourth, the cluster ID is added to the prefix, so that a unique subnet is
used for each cluster. This affects the allowable values for the prefix.

For 172.16.0.0/12, the prefix plus cluster ID must be from 172.16 to
172.31. For 10.0.0.0/8, the prefix plus cluster ID must be from 10.0 to
10.253. So, for example, if the default 172.18 is used, then cluster IDs
can be from 0 to 13.

Another side effect of this change is w.r.t. legacy mode, where the user
specifies (only) the DIND_LABEL. In that case, a cluster ID is generated,
and we now will use numbers from 1..13 to help keep the values within the
range for the V4 mapping prefix (using 13 instead of 15 as the default
prefix is 172.18).

If the user wants to use the legacy DIND_LABEL, but have a larger range
for cluster IDs, they can set the NAT64_V4_SUBNET_PREFIX to the 10.0.0.0/8
subnet and/or explicitly set the CLUSTER_ID.

For the multicluster IPv6 CI test, it creates a cluster using the default
cluster ID (0), one with cluster ID specified (20), and legacy mode with q
cluster ID generated between 1..13. Since the default prefix is 172.18, there
is a 7% chance that the generated cluster ID is 2 causing a conflict with
the 20 used in the second cluster (172.18 + 2 = 172.20). To prevent this
in the CI test, we'll change the base NAT64 prefix to 10.100, where this
private network has a larger available range, and the generated cluster ID
will not create a conflicting prefix (10.101 to 10.113 for the third cluster
versus 10.120 for the second cluster and 10.100 for the first cluster).

Fixes Issue: kubernetes-retired#220
pmichali pushed a commit to pmichali/kubeadm-dind-cluster that referenced this issue Sep 24, 2018
This commit does several things related to the NAT64 prefix, as specified
by the NAT64_V4_SUBNET_PREFIX environment variable. This prefix is for a
/16 subnet.

First, we want the prefix to be within one of the two private network
ranges (172.16.0.0/12 or 10.0.0.0/8).

Second, to accommodate that, the NAT64_V4_SUBNET_PREFIX will be two octets,
instead of one. The default, if not specified, will be 172.18, to avoid
docker usage of that private network.

Third, the code will range check the prefix, to ensure that it is within
range, based on the private network selected. 172.16 to 172.31 or 10.0 to
10.253 values are allowed.

Fourth, the cluster ID is added to the prefix, so that a unique subnet is
used for each cluster. This affects the allowable values for the prefix.

For 172.16.0.0/12, the prefix plus cluster ID must be from 172.16 to
172.31. For 10.0.0.0/8, the prefix plus cluster ID must be from 10.0 to
10.253. So, for example, if the default 172.18 is used, then cluster IDs
can be from 0 to 13.

Another side effect of this change is w.r.t. legacy mode, where the user
specifies (only) the DIND_LABEL. In that case, a cluster ID is generated,
and we now will use numbers from 1..13 to help keep the values within the
range for the V4 mapping prefix (using 13 instead of 15 as the default
prefix is 172.18).

If the user wants to use the legacy DIND_LABEL, but have a larger range
for cluster IDs, they can set the NAT64_V4_SUBNET_PREFIX to the 10.0.0.0/8
subnet and/or explicitly set the CLUSTER_ID.

For the multicluster IPv6 CI test, it creates a cluster using the default
cluster ID (0), one with cluster ID specified (20), and legacy mode with q
cluster ID generated between 1..13. Since the default prefix is 172.18, the
second cluster will create a prefix (172.18 + 20 = 172.38) that is outside
the 172.16.0.0/12 private network and will be rejected. To avoid this, we'll
use a base prefix of 10.100. That will use 10.100 for the first cluster,
10.120 for the second cluster, and a random value of 10.101 to 10.113 for
the third cluster. This avoids any conflict, and ensures that the prefix is
within the 10.0.0.0/8 private network.

Fixes Issue: kubernetes-retired#220
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 25, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 25, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants