Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

installing calico requires a change to net.ipv4.conf.all.rp_filter #891

Closed
mauilion opened this issue Sep 30, 2019 · 9 comments · Fixed by #897
Assignees
Labels
Milestone

Comments

@mauilion
Copy link
Contributor

@mauilion mauilion commented Sep 30, 2019

What happened:
When deploying calico against kind I used the following kind config:

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
networking:
  disableDefaultCNI: True
nodes:
- role: control-plane
- role: worker
- role: worker
kubeadmConfigPatches:
- |
  apiVersion: kubeadm.k8s.io/v1beta2
  kind: ClusterConfiguration
  metadata:
    name: config
  networking:
    serviceSubnet: "10.96.0.1/12"
    podSubnet: "192.168.0.0/16"

I then apply the latest calico manifest from: https://docs.projectcalico.org/latest/getting-started/kubernetes/installation/calico

This results in a crashlooping calico-node pod on each host with the following presented in the log:

2019-09-30 18:38:28.452 [FATAL][42] int_dataplane.go 1037: Kernel's RPF check is set to 'loose'.  This would allow endpoints to spoof their IP address.  Calico requires net.ipv4.conf.all.rp_filter to be set to 0 or 1. If you require loose RPF and you are not concerned about spoofing, this check can be disabled by setting the IgnoreLooseRPF configuration parameter to 'true'.

This can be worked around by running the following:

kind get nodes --name=kind   | xargs -n1 -I {} docker exec {} sysctl -w net.ipv4.conf.all.rp_filter=0

adjust the --name argument to the name of your cluster or leave it off for the "default" kind cluster.

I then looked into when this value was being set.

In the standard bring up this is the configured value:

docker exec -ti kind-control-plane  sysctl -a | grep all.rp_filter
net.ipv4.conf.all.rp_filter = 2

which it appears is being set by

/etc/sysctl.d/10-network-security.conf:net.ipv4.conf.default.rp_filter=2
/etc/sysctl.d/10-network-security.conf:net.ipv4.conf.all.rp_filter=2

This was in turn changed to 2 with this issue:
https://bugs.launchpad.net/ubuntu/+source/procps/+bug/1814262

in my other findings I found that:
the base image that we use ubuntu:19.04 has it set:

11:27 $ docker run -it ubuntu:19.04 sysctl -a | grep all.rp_filter
net.ipv4.conf.all.rp_filter = 1

and the base-image freshly built:

11:32 $ docker run -it  --tmpfs /tmp --tmpfs /run  --privileged --entrypoint /bin/bash mauilion/base 
root@914dc973cf59:/# sysctl -a | grep rp_filter
net.ipv4.conf.all.rp_filter = 1
root@914dc973cf59:/# exit

altho! the security file is present at this time!.

root@2f587629b72c:/# cat /etc/sysctl.d/10-network-security.conf 

# Turn on Source Address Verification in all interfaces to
# prevent some spoofing attacks.
net.ipv4.conf.default.rp_filter=2
net.ipv4.conf.all.rp_filter=2

I think what's happening is that the sysctl is being honored when we start up the "real" node-image and that is what's causing the problem for calico.

What you expected to happen:
that rp_filter would be set to 0 or 1 as it is set by default in 19.03

How to reproduce it (as minimally and precisely as possible):
This is true of most of the recent base images.

Anything else we need to know?:
In my opinion it's safe to set all.rp_filter to a value of 1 explicitly.

Environment:

  • kind version: (use kind version): 0.5.1
  • Kubernetes version: (use kubectl version):
  • Docker version: (use docker info):
  • OS (e.g. from /etc/os-release):
@mauilion mauilion added the kind/bug label Sep 30, 2019
@BenTheElder BenTheElder self-assigned this Sep 30, 2019
@aojea

This comment has been minimized.

Copy link
Contributor

@aojea aojea commented Sep 30, 2019

nice catch

@mauilion

This comment has been minimized.

Copy link
Contributor Author

@mauilion mauilion commented Sep 30, 2019

write up on this by Alex as well I like his solution as well!

https://twitter.com/alexbrand/status/1178768251024760833?s=20

kubectl -n kube-system set env daemonset/calico-node FELIX_IGNORELOOSERPF=true

@BenTheElder

This comment has been minimized.

Copy link
Member

@BenTheElder BenTheElder commented Oct 1, 2019

/assign

mauilion added a commit to mauilion/kind that referenced this issue Oct 1, 2019
Signed-off-by: Duffie Cooley <cooleyd@vmware.com>
@BenTheElder

This comment has been minimized.

Copy link
Member

@BenTheElder BenTheElder commented Oct 1, 2019

thanks @mauilion !

@bjethwan

This comment has been minimized.

Copy link

@bjethwan bjethwan commented Oct 11, 2019

@mauilion

kubectl -n kube-system set env daemonset/calico-node FELIX_IGNORELOOSERPF=true

It seems to be messing up DNS (I was following you TGIK 075)
And while everything is working fine with kind default cni.
Calico one is giving issues, when I used "kubectl -n kube-system set env daemonset/calico-node FELIX_IGNORELOOSERPF=true"

$ k exec -it nginxd-667bdf4c99-qsbrv -- bash
root@nginxd-667bdf4c99-qsbrv:/# curl google.com
curl: (6) Could not resolve host: google.com
root@nginxd-667bdf4c99-qsbrv:/# nslookup google.com
Server: 10.96.0.10
Address: 10.96.0.10#53

** server can't find google.com: SERVFAIL

root@nginxd-667bdf4c99-qsbrv:/# exit

@BenTheElder

This comment has been minimized.

Copy link
Member

@BenTheElder BenTheElder commented Oct 11, 2019

new images set rp_filter so you won't have to do this anymore

@dlipovetsky

This comment has been minimized.

Copy link

@dlipovetsky dlipovetsky commented Oct 22, 2019

@BenTheElder New images created since this bug was fixed include v1.16.1 and v1.16.2. Is it worth patching the v1.15.3 (or older) images, or is that out of scope for kind?

@BenTheElder BenTheElder added this to the v0.6.0 milestone Oct 22, 2019
@BenTheElder

This comment has been minimized.

Copy link
Member

@BenTheElder BenTheElder commented Oct 22, 2019

I'll push new images with https://github.com/kubernetes-sigs/kind/milestone/8 which is primarily blocked on rounding out some stability fixes. I'm back on that now.

@dlipovetsky

This comment has been minimized.

Copy link

@dlipovetsky dlipovetsky commented Oct 23, 2019

@BenTheElder Thank you! I wasn't sure if older images would get fixes like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.