New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when using node port, the VM is accessible only from via the node it is actually running on #848

Closed
yuvalif opened this Issue Mar 28, 2018 · 7 comments

Comments

Projects
None yet
5 participants
@yuvalif
Contributor

yuvalif commented Mar 28, 2018

According to documentation, as well as the behavior in case of containers, the VM should be accessible via the node port on any of the nodes of the cluster.
To reproduce, I use the kubevirt demo: https://github.com/kubevirt/demo and use the following yaml to expose SSH as a service from the cirros VM:

apiVersion: v1
kind: Service
metadata:
  labels:
    kubevirt.io: virt-launcher
    kubevirt.io/domain: testvm
  name: testnp
spec:
  ports:
  - name: testnp
    port: 22
  type: NodePort
  selector:
    kubevirt.io: virt-launcher
    kubevirt.io/domain: testvm
status:
  loadBalancer: {}

Then, on a multi-node cluster I try to access the port on all nodes:
ssh cirros@<node-public-ip> -p <node-port>
It works only when I use the IP of the node, on which the VM actually runs. On the other nodes I get a connection timeout.

@mlsorensen

This comment has been minimized.

Show comment
Hide comment
@mlsorensen

mlsorensen Mar 28, 2018

Contributor

This may be related to #824, which also deals with cross-node pod traffic. Is this reproducible on master?

I never had the problem mentioned in #824, and I've also been using nodePort to expose 22 since v0.3.0 and have had no issues accessing from any node, but its probably dependent on the network plugin and whether it adds routes to pods.

Contributor

mlsorensen commented Mar 28, 2018

This may be related to #824, which also deals with cross-node pod traffic. Is this reproducible on master?

I never had the problem mentioned in #824, and I've also been using nodePort to expose 22 since v0.3.0 and have had no issues accessing from any node, but its probably dependent on the network plugin and whether it adds routes to pods.

@vladikr

This comment has been minimized.

Show comment
Hide comment
@vladikr

vladikr Mar 28, 2018

Member

I've been looking into this issue today.
The problem is that on openshift, replies from the VM have a different mac address than the source
and therefore don't reach the node.

897	27.341916	10.128.0.1	10.129.0.24	ICMP	148	Echo (ping) request  id=0x1c6c, seq=12/3072, ttl=64 (reply in 898)
Ethernet II, Src: c6:69:ee:58:3c:f6 (c6:69:ee:58:3c:f6), Dst: 0a:58:0a:81:00:18 (0a:58:0a:81:00:18)
Internet Protocol Version 4, Src: 10.128.0.1, Dst: 10.129.0.24

898	27.342853	10.129.0.24	10.128.0.1	ICMP	148	Echo (ping) reply    id=0x1c6c, seq=12/3072, ttl=64 (request in 897)
Ethernet II, Src: 0a:58:0a:81:00:18 (0a:58:0a:81:00:18), Dst: 76:43:ef:54:85:ee (76:43:ef:54:85:ee)
Internet Protocol Version 4, Src: 10.129.0.24, Dst: 10.128.0.1

The reason for this is that the route to the remote node that we are setting via DHCP is incorrect.
Instead of 10.128.0.0/14 dev eth0 we are setting 10.128.0.0/14 via 10.129.0.1 dev eth0 (where 10.129.0.1 is the default gateway)

Once the correct route was set I could ping the VM from different nodes in the cluster

ping 10.129.0.24                                                  
PING 10.129.0.24 (10.129.0.24) 56(84) bytes of data.                                                     
64 bytes from 10.129.0.24: icmp_seq=1 ttl=64 time=1.83 ms                                                
64 bytes from 10.129.0.24: icmp_seq=2 ttl=64 time=0.937 ms        

and access it via the clusterIP and externalIP

$ oc get svc                                                                 
NAME      TYPE           CLUSTER-IP      EXTERNAL-IP                     PORT(S)        AGE                               
mytest    LoadBalancer   172.30.15.157   172.29.170.163,172.29.170.163   22:31932/TCP   19s

$ ssh 172.30.15.157 
The authenticity of host '172.30.15.157 (172.30.15.157)' can't be established.                           
ECDSA key fingerprint is SHA256:L01/tXVjBDmeIwozIFz791reoVG9yQS4XZeKjMUBcmA.                             
ECDSA key fingerprint is MD5:07:ac:0d:09:59:4c:95:55:66:d0:37:ca:25:79:a2:a6.                            
Are you sure you want to continue connecting (yes/no)? ^C                                                

$ ssh 172.29.170.163 
The authenticity of host '172.29.170.163 (172.29.170.163)' can't be established.                         
ECDSA key fingerprint is SHA256:L01/tXVjBDmeIwozIFz791reoVG9yQS4XZeKjMUBcmA.                             
ECDSA key fingerprint is MD5:07:ac:0d:09:59:4c:95:55:66:d0:37:ca:25:79:a2:a6.                            
Are you sure you want to continue connecting (yes/no)? ^C

I'm looking into what's the right way to provide the routes via DHCP option 121 so it will be set correctly in the VM...

Member

vladikr commented Mar 28, 2018

I've been looking into this issue today.
The problem is that on openshift, replies from the VM have a different mac address than the source
and therefore don't reach the node.

897	27.341916	10.128.0.1	10.129.0.24	ICMP	148	Echo (ping) request  id=0x1c6c, seq=12/3072, ttl=64 (reply in 898)
Ethernet II, Src: c6:69:ee:58:3c:f6 (c6:69:ee:58:3c:f6), Dst: 0a:58:0a:81:00:18 (0a:58:0a:81:00:18)
Internet Protocol Version 4, Src: 10.128.0.1, Dst: 10.129.0.24

898	27.342853	10.129.0.24	10.128.0.1	ICMP	148	Echo (ping) reply    id=0x1c6c, seq=12/3072, ttl=64 (request in 897)
Ethernet II, Src: 0a:58:0a:81:00:18 (0a:58:0a:81:00:18), Dst: 76:43:ef:54:85:ee (76:43:ef:54:85:ee)
Internet Protocol Version 4, Src: 10.129.0.24, Dst: 10.128.0.1

The reason for this is that the route to the remote node that we are setting via DHCP is incorrect.
Instead of 10.128.0.0/14 dev eth0 we are setting 10.128.0.0/14 via 10.129.0.1 dev eth0 (where 10.129.0.1 is the default gateway)

Once the correct route was set I could ping the VM from different nodes in the cluster

ping 10.129.0.24                                                  
PING 10.129.0.24 (10.129.0.24) 56(84) bytes of data.                                                     
64 bytes from 10.129.0.24: icmp_seq=1 ttl=64 time=1.83 ms                                                
64 bytes from 10.129.0.24: icmp_seq=2 ttl=64 time=0.937 ms        

and access it via the clusterIP and externalIP

$ oc get svc                                                                 
NAME      TYPE           CLUSTER-IP      EXTERNAL-IP                     PORT(S)        AGE                               
mytest    LoadBalancer   172.30.15.157   172.29.170.163,172.29.170.163   22:31932/TCP   19s

$ ssh 172.30.15.157 
The authenticity of host '172.30.15.157 (172.30.15.157)' can't be established.                           
ECDSA key fingerprint is SHA256:L01/tXVjBDmeIwozIFz791reoVG9yQS4XZeKjMUBcmA.                             
ECDSA key fingerprint is MD5:07:ac:0d:09:59:4c:95:55:66:d0:37:ca:25:79:a2:a6.                            
Are you sure you want to continue connecting (yes/no)? ^C                                                

$ ssh 172.29.170.163 
The authenticity of host '172.29.170.163 (172.29.170.163)' can't be established.                         
ECDSA key fingerprint is SHA256:L01/tXVjBDmeIwozIFz791reoVG9yQS4XZeKjMUBcmA.                             
ECDSA key fingerprint is MD5:07:ac:0d:09:59:4c:95:55:66:d0:37:ca:25:79:a2:a6.                            
Are you sure you want to continue connecting (yes/no)? ^C

I'm looking into what's the right way to provide the routes via DHCP option 121 so it will be set correctly in the VM...

@vladikr

This comment has been minimized.

Show comment
Hide comment
@vladikr

vladikr Mar 29, 2018

Member

I'm looking into what's the right way to provide the routes via DHCP option 121 so it will be set correctly in the VM...

Actually adding 0.0.0.0 as a next-hop does the trick.

Member

vladikr commented Mar 29, 2018

I'm looking into what's the right way to provide the routes via DHCP option 121 so it will be set correctly in the VM...

Actually adding 0.0.0.0 as a next-hop does the trick.

@fabiand

This comment has been minimized.

Show comment
Hide comment
@fabiand

fabiand Apr 4, 2018

Member

@vladikr is this issue now fixed with #851 or is something else needed?

Member

fabiand commented Apr 4, 2018

@vladikr is this issue now fixed with #851 or is something else needed?

@fabiand

This comment has been minimized.

Show comment
Hide comment
@fabiand

fabiand Apr 4, 2018

Member

@duyanyan can you confirm independently that this bug is fixed?

or @karmab ?

Member

fabiand commented Apr 4, 2018

@duyanyan can you confirm independently that this bug is fixed?

or @karmab ?

@karmab

This comment has been minimized.

Show comment
Hide comment
@karmab

karmab Apr 12, 2018

Contributor

this is fixed

Contributor

karmab commented Apr 12, 2018

this is fixed

@fabiand

This comment has been minimized.

Show comment
Hide comment
@fabiand

fabiand Apr 12, 2018

Member

Thanks!

Member

fabiand commented Apr 12, 2018

Thanks!

@fabiand fabiand closed this Apr 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment