Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when doing a ping from a ipsec container #9377

Closed
leodotcloud opened this issue Jul 18, 2017 · 14 comments
Closed

Error when doing a ping from a ipsec container #9377

leodotcloud opened this issue Jul 18, 2017 · 14 comments

Comments

@leodotcloud
Copy link
Contributor

While doing a ping of a remote container from a ipsec container, got this error:
ping: sendmsg: Value too large for defined data type

@leodotcloud
Copy link
Contributor Author

leodotcloud commented Aug 7, 2017

Steps to reproduce:

  • Launch a 3 host ipsec environment using AWS (t2.large) Ubuntu 16.04
  • Launch a global service with ubuntu: 14.04 (Say the containers created are C1, C2, C3).
  • Create a mesh iperf server client connections between these containers.

Open multiple exec sessions to run the following:
on C1:

  1. iperf3 -s -p 10002
  2. iperf3 -s -p 10003
  3. iperf3 -c C2_IP -p 20001 -t 0
  4. iperf3 -c C3_IP -p 30001 -t 0

on C2:

  1. iperf3 -s -p 20001
  2. iperf3 -s -p 20003
  3. iperf3 -c C1_IP -p 10002 -t 0
  4. iperf3 -c C3_IP -p 30002 -t 0

on C3:

  1. iperf3 -s -p 30001
  2. iperf3 -s -p 30002
  3. iperf3 -c C1_IP -p 20003 -t 0
  4. iperf3 -c C3_IP -p 10003 -t 0
  • Now inside ipsec container if you check cat /proc/net/xfrm_stat, you will see XfrmInStateSeqError going up.
  • After running this setup for few days, no more communication is possible between any containers.

Steps to install iperf3:

apt-get remove -y iperf3 libiperf0 && \
cd /tmp && \
wget https://iperf.fr/download/ubuntu/libiperf0_3.1.3-1_amd64.deb && \
wget https://iperf.fr/download/ubuntu/iperf3_3.1.3-1_amd64.deb && \
dpkg -i libiperf0_3.1.3-1_amd64.deb iperf3_3.1.3-1_amd64.deb && \
rm libiperf0_3.1.3-1_amd64.deb iperf3_3.1.3-1_amd64.deb

@yasker
Copy link
Member

yasker commented Aug 7, 2017

Add @niusmallnan 's health checking tool here: https://github.com/niusmallnan/network-healthcheck

@yasker
Copy link
Member

yasker commented Aug 7, 2017

@niusmallnan Can you try to reproduce the issue and see if your health checking tool works as expected?

@leodotcloud
Copy link
Contributor Author

Test image/catalog:

URL: https://github.com/leodotcloud/rancher-catalog.git
Branch: ipsec_config_options_jd

@leodotcloud
Copy link
Contributor Author

leodotcloud commented Aug 8, 2017

When there is a high speed traffic on AWS, the packets from one host to another seem to arrive out of order. This seems to upset ipsec containers tracking the order/sequence numbers, which causes the XfrmInStateSeqError errors to go up. Once this number overflows, we are getting the EOVERFLOW error and no traffic seems to be passing through the tunnels.

@fokoenecke
Copy link

We were running into similar problems earlier this week. After fixing a bug, that created high network traffic, everything went down to normal. I'd like to provide information on this, but unfortunately my ipsec-router containers don't have a /proc/net/xfrm_stat i could share.

@ppiccolo
Copy link

ppiccolo commented Aug 15, 2017

@leodotcloud

On the problematic machine the following command
docker exec -it $(docker ps | grep ipsec-router | awk '{print $1}') bash cat /proc/net/xfrm_stat
return this :

XfrmInError             	0
XfrmInBufferError       	0
XfrmInHdrError          	0
XfrmInNoStates          	1
XfrmInStateProtoError   	0
XfrmInStateModeError    	0
XfrmInStateSeqError     	8269913
XfrmInStateExpired      	0
XfrmInStateMismatch     	0
XfrmInStateInvalid      	0
XfrmInTmplMismatch      	0
XfrmInNoPols            	2
XfrmInPolBlock          	0
XfrmInPolError          	0
XfrmOutError            	0
XfrmOutBundleGenError   	0
XfrmOutBundleCheckError 	0
XfrmOutNoStates         	0
XfrmOutStateProtoError  	0
XfrmOutStateModeError   	0
XfrmOutStateSeqError    	0
XfrmOutStateExpired     	0
XfrmOutPolBlock         	0
XfrmOutPolDead          	0
XfrmOutPolError         	0
XfrmFwdHdrError         	0
XfrmOutStateInvalid     	0
XfrmAcquireError        	0

any suggestion ?

thanks.

@leodotcloud
Copy link
Contributor Author

@niusmallnan Did you get a chance to try your tool after reproducing the above error?

@leodotcloud
Copy link
Contributor Author

Fixed in ipsec: 0.1.4 with image: rancher/net:v0.11.8.

@leodotcloud
Copy link
Contributor Author

oops, accidental close

@niusmallnan
Copy link
Contributor

@leodotcloud @yasker
My health checking tool worked as expected. Find some information in the log:

time="2017-08-17T15:19:39Z" level=warning msg="Lose ping data from 10.42.144.192"
time="2017-08-17T15:19:40Z" level=error msg="From 10.42.235.225 to 10.42.144.192 is unreachable, isRouter: false, UUID: 742e527e-5ce3-4d08-9746-05e6cc0d1e87"
time="2017-08-17T15:19:40Z" level=error msg="From 10.42.235.225 to 10.42.144.192 is unreachable, isRouter: false, UUID: 742e527e-5ce3-4d08-9746-05e6cc0d1e87"
time="2017-08-17T15:19:42Z" level=error msg="From 10.42.235.225 to 10.42.144.192 is unreachable, isRouter: false, UUID: 742e527e-5ce3-4d08-9746-05e6cc0d1e87"
time="2017-08-17T15:19:42Z" level=error msg="From 10.42.235.225 to 10.42.144.192 is unreachable, isRouter: false, UUID: 742e527e-5ce3-4d08-9746-05e6cc0d1e87"

@sangeethah
Copy link
Contributor

Tested with rancher-server version- v1.6.8-rc4 with rancher/net:v0.11.9
This version of ipsec has Replay Window size set to 1024 which is configurable.

@sangeethah
Copy link
Contributor

Tested with rancher-server version- v1.6.8-rc4 with rancher/net:v0.11.9
Environment with 3 hosts.
Deployed a global service.
Created traffic between all 3 containers of using using iperf3 as mentioned in the steps to reproduce.
After ~12 hours , inter host connectivity continues to be good between all 3 hosts in the setup.

XfrmInStateSeqError in ipsec containers as reported by /proc/net/xfrm_stat is 0 in all ipsec containers.

When a similar run was done on rancher-server version- v1.6.7 , even within few minutes of starting the run , we see the XfrmInStateSeqError in ipsec containers as reported by /proc/net/xfrm_stat raise to high values like XfrmInStateSeqError 49093 and keeps increasing constantly.

@sangeethah
Copy link
Contributor

Continued to have the above setup running for couple of days. Inter host connectivity continues to work as expected in the setup and XfrmInStateSeqError in ipsec containers as reported by /proc/net/xfrm_stat raise is still 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants