New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico network support in Rancher cattle #8603

Closed
niusmallnan opened this Issue Apr 27, 2017 · 12 comments

Comments

Projects
None yet
9 participants
@niusmallnan
Member

niusmallnan commented Apr 27, 2017

I have updated network-manager, it can be able to support other CNI drivers. So calico network support can be achieved. #8535

For catalog item:
There is a catalog item that is being tested. Thanks to @leodotcloud , I am just on your basis to improve the follow-up work.
Repo: https://github.com/niusmallnan/community-catalog
Branch: dev
Just add this repo to your rancher catalog setting. Pls use 1.0.2-x version when you deploy the Calico item.

For network-manager:
Calico and Rancher have their own IPAM. In order to ensure consistency, I prefer to use Rancher IPAM. Calico IPAM can accept the "IP" parameter, so I want to pass this "IP"(get from Rancher IPAM) parameter to Calico IPAM.
Test Image: niusmallnan/network-manager:dev

For rancher-metadata:
In a Calico network, each host acts as a gateway router for the workloads that it hosts. In container deployments, Calico uses 169.254.1.1 as the address for the Calico router.
So rancher-metadata can not get the real IP by default, this will cause you can not get all the information when you access "http://169.254.169.250/latest/self/"
But rancher-metadata can get the real IP from the request header "X-Forwarded-For". To do this, rancher-metadata need to enable the "-xff" parameter.

For rancher-lb:
The rancher-lb need to access "http://169.254.169.250/latest/self/services". As mentioned above, rancher-lb should add "X-Forwarded-For" header while accessing rancher-metadata.
Test Image: niusmallnan/lb-service-haproxy:dev

Other tips:

  1. enable NAT Outgoing on an IP Pool
  2. update profile resource and make source Calico can allows all traffic

These steps will be added to the auto execute scripts.

Ensure that the Rancher server version is v1.6.0+.
If you want to deploy this in AWS, pls see http://docs.projectcalico.org/master/reference/public-cloud/aws .

I have built a setup with Calico, it can work fine.
After fully test, I will submit the relevant PR.

@leodotcloud

This comment has been minimized.

Show comment
Hide comment
@leodotcloud

leodotcloud Apr 27, 2017

Member

@niusmallnan

In order to ensure consistency, I prefer to use Rancher IPAM.
I haven't tested this, not sure what the implications are to replace it with Rancher IPAM. Also I wouldn't recommend doing this as Calico uses BGP and a complete subnet is assigned to a host which makes the routing rules much simpler. With Rancher we have to add /32 routes which is not efficient at all.

Is it possible to change the rancher-metadata IP address to something else? Might have been hard coded in multiple places.

Looking forward for the testing results.

Member

leodotcloud commented Apr 27, 2017

@niusmallnan

In order to ensure consistency, I prefer to use Rancher IPAM.
I haven't tested this, not sure what the implications are to replace it with Rancher IPAM. Also I wouldn't recommend doing this as Calico uses BGP and a complete subnet is assigned to a host which makes the routing rules much simpler. With Rancher we have to add /32 routes which is not efficient at all.

Is it possible to change the rancher-metadata IP address to something else? Might have been hard coded in multiple places.

Looking forward for the testing results.

@niusmallnan

This comment has been minimized.

Show comment
Hide comment
@niusmallnan

niusmallnan Apr 27, 2017

Member

@leodotcloud
I know your concern. Here https://github.com/projectcalico/cni-plugin/blob/v1.5.6/ipam/calico-ipam.go#L58

type ipamArgs struct {
	types.CommonArgs
	IP net.IP `json:"ip,omitempty"`
}

Calico IPAM can accept the "IP" as a runtime arg.

I just update cniglue, add this "IP" arg.
https://github.com/niusmallnan/cniglue/blob/dev/cni.go#L78

	if containerIPCIDR, ok := state.Config.Labels["io.rancher.container.ip"]; ok {
		ip, _, err := net.ParseCIDR(containerIPCIDR)
		if err == nil {
			c.runtimeConf.Args = append(c.runtimeConf.Args, [2]string{"IP", ip.String()})
		}
	}

I will build a setup on AWS, then you can check the results easily.

Member

niusmallnan commented Apr 27, 2017

@leodotcloud
I know your concern. Here https://github.com/projectcalico/cni-plugin/blob/v1.5.6/ipam/calico-ipam.go#L58

type ipamArgs struct {
	types.CommonArgs
	IP net.IP `json:"ip,omitempty"`
}

Calico IPAM can accept the "IP" as a runtime arg.

I just update cniglue, add this "IP" arg.
https://github.com/niusmallnan/cniglue/blob/dev/cni.go#L78

	if containerIPCIDR, ok := state.Config.Labels["io.rancher.container.ip"]; ok {
		ip, _, err := net.ParseCIDR(containerIPCIDR)
		if err == nil {
			c.runtimeConf.Args = append(c.runtimeConf.Args, [2]string{"IP", ip.String()})
		}
	}

I will build a setup on AWS, then you can check the results easily.

@cjellick

This comment has been minimized.

Show comment
Hide comment
@cjellick

cjellick May 5, 2017

Member

@niusmallnan So, will metadata be broken-by-default for all ocntainers if they dont send a specific IP like you did in the lb-controller?

Member

cjellick commented May 5, 2017

@niusmallnan So, will metadata be broken-by-default for all ocntainers if they dont send a specific IP like you did in the lb-controller?

@niusmallnan

This comment has been minimized.

Show comment
Hide comment
@niusmallnan

niusmallnan May 6, 2017

Member

@cjellick Yes, my test result is that.

image

Member

niusmallnan commented May 6, 2017

@cjellick Yes, my test result is that.

image

@cjellick

This comment has been minimized.

Show comment
Hide comment
@cjellick

cjellick May 8, 2017

Member

@niusmallnan I think this might be a deal breaker. Nearly all our infrastructure services and many services written by the user rely on being able to hit rancher-metadata without any additional configuration or knowledge.

Member

cjellick commented May 8, 2017

@niusmallnan I think this might be a deal breaker. Nearly all our infrastructure services and many services written by the user rely on being able to hit rancher-metadata without any additional configuration or knowledge.

@niusmallnan

This comment has been minimized.

Show comment
Hide comment
@niusmallnan

niusmallnan May 9, 2017

Member

@cjellick Agree with you. In my test results, only the rancher-lb have this problem, because it needs to read the "/self" key.

Overall, considering all of the infrastructure services, I do not have a good way, just want to focus on the document when the users wants to use Calico.

Of course, by updating the go-rancher-metadata to do the overall compatibility is also a way, but also involves a lot of infrastructure services update the dependency package.

Member

niusmallnan commented May 9, 2017

@cjellick Agree with you. In my test results, only the rancher-lb have this problem, because it needs to read the "/self" key.

Overall, considering all of the infrastructure services, I do not have a good way, just want to focus on the document when the users wants to use Calico.

Of course, by updating the go-rancher-metadata to do the overall compatibility is also a way, but also involves a lot of infrastructure services update the dependency package.

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud May 9, 2017

Member

@niusmallnan Is it possible to inject custom routes into calico? Current that is what our CNI driver does is adds a route for 169.254.169.250/32 to eth0.

Member

ibuildthecloud commented May 9, 2017

@niusmallnan Is it possible to inject custom routes into calico? Current that is what our CNI driver does is adds a route for 169.254.169.250/32 to eth0.

@niusmallnan

This comment has been minimized.

Show comment
Hide comment
@niusmallnan

niusmallnan May 10, 2017

Member

@ibuildthecloud I'm afraid not. Calico-IPAM does not support custom routes. Here is the doc .

Check the Calico NetConf type, cannot see the routes field.

	IPAM struct {
		Name       string
		Type       string  `json:"type"`
		Subnet     string  `json:"subnet"`
		AssignIpv4 *string `json:"assign_ipv4"`
		AssignIpv6 *string `json:"assign_ipv6"`
	} `json:"ipam,omitempty"`
	MTU            int        `json:"mtu"`

For further verification, I manually add the route to the container's namespace, verify that custom routes are useful.
Map the container's namespace, and check the default routes:

root@calicoh1:~# ip netns exec cb6a7e966704  ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether de:24:34:27:a6:c2 brd ff:ff:ff:ff:ff:ff
    inet 192.168.219.191/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::dc24:34ff:fe27:a6c2/64 scope link
       valid_lft forever preferred_lft forever
root@calicoh1:~# ip netns exec cb6a7e966704  ip route
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0  scope link

Add a route for 169.254.169.250/32 to eth0:

root@calicoh1:~# ip netns exec cb6a7e966704 ip route add 169.254.169.250/32 dev eth0
root@calicoh1:~# ip netns exec cb6a7e966704 ip route
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0  scope link
169.254.169.250 dev eth0  scope link

Check the metadata problem:

root@calicoh1:~# ip netns exec cb6a7e966704 curl http://169.254.169.250/latest/self/
host/
=====
root@calicoh1:~# ip netns exec cb6a7e966704 curl -H "X-Forwarded-For:192.168.219.191" http://169.254.169.250/latest/self/
container/
host/
service/
stack/

Even if we do custom routes, we can not solve this problem.

Member

niusmallnan commented May 10, 2017

@ibuildthecloud I'm afraid not. Calico-IPAM does not support custom routes. Here is the doc .

Check the Calico NetConf type, cannot see the routes field.

	IPAM struct {
		Name       string
		Type       string  `json:"type"`
		Subnet     string  `json:"subnet"`
		AssignIpv4 *string `json:"assign_ipv4"`
		AssignIpv6 *string `json:"assign_ipv6"`
	} `json:"ipam,omitempty"`
	MTU            int        `json:"mtu"`

For further verification, I manually add the route to the container's namespace, verify that custom routes are useful.
Map the container's namespace, and check the default routes:

root@calicoh1:~# ip netns exec cb6a7e966704  ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether de:24:34:27:a6:c2 brd ff:ff:ff:ff:ff:ff
    inet 192.168.219.191/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::dc24:34ff:fe27:a6c2/64 scope link
       valid_lft forever preferred_lft forever
root@calicoh1:~# ip netns exec cb6a7e966704  ip route
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0  scope link

Add a route for 169.254.169.250/32 to eth0:

root@calicoh1:~# ip netns exec cb6a7e966704 ip route add 169.254.169.250/32 dev eth0
root@calicoh1:~# ip netns exec cb6a7e966704 ip route
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0  scope link
169.254.169.250 dev eth0  scope link

Check the metadata problem:

root@calicoh1:~# ip netns exec cb6a7e966704 curl http://169.254.169.250/latest/self/
host/
=====
root@calicoh1:~# ip netns exec cb6a7e966704 curl -H "X-Forwarded-For:192.168.219.191" http://169.254.169.250/latest/self/
container/
host/
service/
stack/

Even if we do custom routes, we can not solve this problem.

@amielheyde

This comment has been minimized.

Show comment
Hide comment
@amielheyde

amielheyde Jul 5, 2017

I gave the above instructions a go and had two problems

  1. Container got a different IP (allocated by callico) than what rancher thought it had
    So far I have manually hacked the container/host to change the IP to the one rancher is using to prove that things work when the IPs align.
  2. The metadata issue mentioned above
    I appear to have gotten around this by disabling outgoing nat in my calico ip pool (before this traffic in the metadata container was appearing to be from 172.17.0.1 due to the natting). As my whole setup is internal, I want my pod IPs to be routable outside the cluster.

amielheyde commented Jul 5, 2017

I gave the above instructions a go and had two problems

  1. Container got a different IP (allocated by callico) than what rancher thought it had
    So far I have manually hacked the container/host to change the IP to the one rancher is using to prove that things work when the IPs align.
  2. The metadata issue mentioned above
    I appear to have gotten around this by disabling outgoing nat in my calico ip pool (before this traffic in the metadata container was appearing to be from 172.17.0.1 due to the natting). As my whole setup is internal, I want my pod IPs to be routable outside the cluster.
@Unstoppable94

This comment has been minimized.

Show comment
Hide comment
@Unstoppable94

Unstoppable94 Sep 28, 2017

环境:
Docker: v1.12.6
Rancher: v1.6.9
遇到问题:
基础容器启动之后healthcheck和scheduler一直重启

查看healthcheck日志报错如下:
日志中的http://192.168.101.92:8081/v1是我的Rancher部署虚拟机ip
2017/9/27 下午5:12:11time="2017-09-27T09:12:11Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: i/o timeout"
2017/9/27 下午5:12:29time="2017-09-27T09:12:29Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:12:34time="2017-09-27T09:12:34Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:12:38time="2017-09-27T09:12:38Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:12:44time="2017-09-27T09:12:44Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:12:49time="2017-09-27T09:12:49Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:12:54time="2017-09-27T09:12:54Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:12:59time="2017-09-27T09:12:59Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:05time="2017-09-27T09:13:05Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:10time="2017-09-27T09:13:10Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:15time="2017-09-27T09:13:15Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:19time="2017-09-27T09:13:19Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:23time="2017-09-27T09:13:23Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:27time="2017-09-27T09:13:27Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:32time="2017-09-27T09:13:32Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:36time="2017-09-27T09:13:36Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:40time="2017-09-27T09:13:40Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:44time="2017-09-27T09:13:44Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:49time="2017-09-27T09:13:49Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:14:46time="2017-09-27T09:14:46Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: i/o timeout"
求教这是什么原因导致?

Unstoppable94 commented Sep 28, 2017

环境:
Docker: v1.12.6
Rancher: v1.6.9
遇到问题:
基础容器启动之后healthcheck和scheduler一直重启

查看healthcheck日志报错如下:
日志中的http://192.168.101.92:8081/v1是我的Rancher部署虚拟机ip
2017/9/27 下午5:12:11time="2017-09-27T09:12:11Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: i/o timeout"
2017/9/27 下午5:12:29time="2017-09-27T09:12:29Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:12:34time="2017-09-27T09:12:34Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:12:38time="2017-09-27T09:12:38Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:12:44time="2017-09-27T09:12:44Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:12:49time="2017-09-27T09:12:49Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:12:54time="2017-09-27T09:12:54Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:12:59time="2017-09-27T09:12:59Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:05time="2017-09-27T09:13:05Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:10time="2017-09-27T09:13:10Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:15time="2017-09-27T09:13:15Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:19time="2017-09-27T09:13:19Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:23time="2017-09-27T09:13:23Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:27time="2017-09-27T09:13:27Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:32time="2017-09-27T09:13:32Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:36time="2017-09-27T09:13:36Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:40time="2017-09-27T09:13:40Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:44time="2017-09-27T09:13:44Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:13:49time="2017-09-27T09:13:49Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: getsockopt: no route to host"
2017/9/27 下午5:14:46time="2017-09-27T09:14:46Z" level=fatal msg="Get http://192.168.101.92:8081/v1: dial tcp 192.168.101.92:8081: i/o timeout"
求教这是什么原因导致?

@Unstoppable94

This comment has been minimized.

Show comment
Hide comment
@Unstoppable94

Unstoppable94 Sep 28, 2017

@niusmallnan can you help me ?? thanks

Unstoppable94 commented Sep 28, 2017

@niusmallnan can you help me ?? thanks

@loganhz

This comment has been minimized.

Show comment
Hide comment
@loganhz

loganhz Oct 12, 2018

Member

Thanks for your report!

With the release of Rancher 2.0, development on v1.6 is only limited to critical bug fixes and security patches.

If you think we should keep this issue open, please let me know.

Member

loganhz commented Oct 12, 2018

Thanks for your report!

With the release of Rancher 2.0, development on v1.6 is only limited to critical bug fixes and security patches.

If you think we should keep this issue open, please let me know.

@loganhz loganhz closed this Oct 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment