Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to start sdn pod when clusterCIDR is equal to hostCIDR #250

Closed
guillaumerose opened this issue Jan 22, 2021 · 4 comments
Closed

Fail to start sdn pod when clusterCIDR is equal to hostCIDR #250

guillaumerose opened this issue Jan 22, 2021 · 4 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@guillaumerose
Copy link

When running a single node, the user might configure the network like this:

 networking:
   clusterNetwork:
   - cidr: 10.217.0.0/23
     hostPrefix: 23
   machineNetwork:
   - cidr: 192.168.126.0/24
   networkType: OpenShiftSDN
   serviceNetwork:
   - 10.217.2.0/23

We tried it for CRC and sadly it fails (see crc-org/snc#311). The installer fails: the sdn pod exited with the following error.

[root@crc-d4mxd-master-0 core]# crictl ps -a
CONTAINER           IMAGE                                                                                                                    CREATED             STATE               NAME                      ATTEMPT             POD ID
424b2df2c1e48       eab80d387b5835140e41965e775371ab9f75cc64422605bd56f7b8b89bd52381                                                         7 seconds ago       Running             kube-multus               3                   1e8937168a96d
0265f5cfb4d29       2c5d2c2b51082e6ce5deca684aaa7a8f3c970616f7d192accfa34bc75011fb6c                                                         4 minutes ago       Exited              sdn                       10                  77f4ca6be7c2a
34f470bfffa5b       5283a59259736046ba55075e4f4ff03675d8d41553fbbdc3d1e6d267c5360c4d                                                         4 minutes ago       Exited              kube-rbac-proxy           7                   77f4ca6be7c2a
1dd95ba351421       eab80d387b5835140e41965e775371ab9f75cc64422605bd56f7b8b89bd52381                                                         10 minutes ago      Exited              kube-multus               2                   1e8937168a96d
72eb46c63539e       2c5d2c2b51082e6ce5deca684aaa7a8f3c970616f7d192accfa34bc75011fb6c                                                         11 minutes ago      Running             sdn-controller            1                   d4e6726774492
e17fa833d1c39       9e292852b769f6133e6e25f7a6b6b4f457d5c00ddd7735bffa39724868056a01                                                         30 minutes ago      Exited              whereabouts-cni           0                   1e8937168a96d
0700071978d56       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a0de69542fc5c98f06e794bc6d522b76ca626d9089a49215510dcba158f1250b   30 minutes ago      Exited              whereabouts-cni-bincopy   0                   1e8937168a96d
58142749dd746       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:831ac3823614ef1230fbc786d990ee186cfe3a54540ed266decabdf64475032c   31 minutes ago      Exited              routeoverride-cni         0                   1e8937168a96d
0af004bcc0682       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9222d21a1664062c0f0be3e0269392ea951fc346b1d51f831b4b080aca752b61   31 minutes ago      Exited              cni-plugins               0                   1e8937168a96d
63b8097640355       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:42de8035ebe256cc1efe062cf8eef5a42d06fd4657469a5b5cd16c18520e08f8   31 minutes ago      Exited              sdn-controller            0                   d4e6726774492
034e70174b347       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:42de8035ebe256cc1efe062cf8eef5a42d06fd4657469a5b5cd16c18520e08f8   31 minutes ago      Running             openvswitch               0                   9393c86f957d1
dec2262d6670e       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6ad0033cdf25dca68753355915935bf2471d4d11ba568c3eb331cae403d4fa2c   31 minutes ago      Exited              multus-binary-copy        0                   1e8937168a96d
596a0b780af04       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3829b56f156b88642013133072be9de9ec600c570db5153f9b45ae5868aa5257   31 minutes ago      Running             network-operator          0                   a579fffb6333f
[root@crc-d4mxd-master-0 core]# crictl logs 0265f5cfb4d29
I0122 09:25:03.704947   21855 cmd.go:121] Reading proxy configuration from /config/kube-proxy-config.yaml
I0122 09:25:03.705897   21855 feature_gate.go:243] feature gates: &{map[]}
I0122 09:25:03.705920   21855 cmd.go:216] Watching config file /config/kube-proxy-config.yaml for changes
I0122 09:25:03.705935   21855 cmd.go:216] Watching config file /config/..2021_01_22_08_57_59.443374971/kube-proxy-config.yaml for changes
I0122 09:25:03.725141   21855 node.go:152] Initializing SDN node "crc-d4mxd-master-0" (192.168.126.11) of type "redhat/openshift-ovs-networkpolicy"
I0122 09:25:03.725278   21855 cmd.go:159] Starting node networking (v0.0.0-alpha.0-233-g7106dab9)
I0122 09:25:03.725283   21855 node.go:340] Starting openshift-sdn network plugin
I0122 09:25:03.804877   21855 sdn_controller.go:139] [SDN setup] full SDN setup required (cluster CIDR not found)
F0122 09:25:04.035397   21855 cmd.go:111] Failed to start sdn: node SDN setup failed: file exists

When looking at the code, the failure seems to be related to the fact that route -n doesn't contain any routes to the cluster CIDR which is normal in the single node case.

[root@crc-d4mxd-master-0 core]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.126.1   0.0.0.0         UG    100    0        0 ens3
192.168.126.0   0.0.0.0         255.255.255.0   U     100    0        0 ens3

Would it be enough to remove this check in the case of clusterCIDR == hostCIDR ?

cfergeau added a commit to cfergeau/snc that referenced this issue Jan 22, 2021
This was initially changed because it was causing issues when running
crc in an OpenShift cluster.
However the new one causes collisions in a different environment.

Let's try a 3rd different value and see if we are lucky this time :)

I picked:
- an odd number
- bigger than 200

in the hope that sysadmins are less likely to see this as a good value
to use for internal networks.

This commit also picks smaller network ranges for the serviceNetwork and
clusterNetwork, as snc is single node, a /22 for clusterNetwork should
be plenty enough. A /23 can't be used because of openshift/sdn#250
This also changes the service network to use a /23
right after the network used for clusterNetwork. This means crc uses
addresses from 10.217.0.0 to 10.217.5.255.
guillaumerose pushed a commit to crc-org/snc that referenced this issue Jan 22, 2021
This was initially changed because it was causing issues when running
crc in an OpenShift cluster.
However the new one causes collisions in a different environment.

Let's try a 3rd different value and see if we are lucky this time :)

I picked:
- an odd number
- bigger than 200

in the hope that sysadmins are less likely to see this as a good value
to use for internal networks.

This commit also picks smaller network ranges for the serviceNetwork and
clusterNetwork, as snc is single node, a /22 for clusterNetwork should
be plenty enough. A /23 can't be used because of openshift/sdn#250
This also changes the service network to use a /23
right after the network used for clusterNetwork. This means crc uses
addresses from 10.217.0.0 to 10.217.5.255.
guillaumerose pushed a commit to guillaumerose/snc that referenced this issue Jan 29, 2021
This was initially changed because it was causing issues when running
crc in an OpenShift cluster.
However the new one causes collisions in a different environment.

Let's try a 3rd different value and see if we are lucky this time :)

I picked:
- an odd number
- bigger than 200

in the hope that sysadmins are less likely to see this as a good value
to use for internal networks.

This commit also picks smaller network ranges for the serviceNetwork and
clusterNetwork, as snc is single node, a /22 for clusterNetwork should
be plenty enough. A /23 can't be used because of openshift/sdn#250
This also changes the service network to use a /23
right after the network used for clusterNetwork. This means crc uses
addresses from 10.217.0.0 to 10.217.5.255.

(cherry picked from commit e3ae681)
@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 22, 2021
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 22, 2021
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this as completed Jun 21, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 21, 2021

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

3 participants