-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http: Segmentation violation when using an IPSec VPN Tunnel (Go 1.19.8/Go 1.20.4/Go 1.21.4) #64020
Comments
This is strange.
At that line, the pointer dereference can't possibly be nil. This looks like memory corruption. Have you tried running with the race detector? |
Adding the GC and SCHED last line calls before the panic SCHED
GC
|
Sorry for the lack of follow-up here; it got into our triage queue last week, but we didn't get to it. We looked at it this week, but probably no one will follow up until next week because of the US holiday. I tried to reproduce at tip-of-tree and with go1.21.0 (just what I had lying around) with some slightly modifications (3 servers all running on localhost on 3 different ports), but I haven't been able to yet. I'll leave it running for a while. How quickly does this reproduce for you? This is mysterious enough that maybe we want to take some shots in the dark:
I also noticed that quite a few people have given this issue a thumbs-up; could anyone who did so briefly follow-up? Are you also affected? Is this happening often? Is there any data you can share about the execution environment? Thanks. |
Hey @mknyszek, I've gathered some of the information you were looking for. I'll get back to you about the Kernel Version.
If run the program about 10 times and stop everytime after 1 minute, I'm guaranteed to see it at least once.
This is happening on Centos 7.9.
I rebuilt on the same machine using the latest Go version and still reproduced the issue.
Yes it still reproduces the issue. |
Thanks. Does this fail on other Linux machines? Perhaps with different Linux distros and/or different Linux kernel versions? I haven't been able to reproduce so far. |
@mknyszek I've been able to reproduce it on other linux machines as long as it was the same OS. I haven't been able to reproduce it on any other distros or kernel version. I've only tested it on Amazon AL2 other than that. |
Hi @mknyszek, Happy New Year 🎉 . I apologize about the delay in responding but I haven’t forgotten about this issue nor have I stopped working on it. Today I was actually able to find much more detailed instruction and information that could point more closely why this issue occurs. In summary, the problem seems to arise specifically when there is an IPSec tunnel between two nodes. I have consistently and quickly reproduced this bug when the connection between the two nodes is secured using an IPSec tunnel. Below are detailed instructions on how to reproduce this issue. Please note that my testing has been in an AWS environment, so I've tailored the instructions to align closely with AWS. Feel free to make adjustments to suit your specific environment. Create two EC2 instances
Setup the VPN Tunnel
sudo yum install -y libreswan # Using Libreswan version 3.25
sudo systemctl enable ipsec --now
openssl rand -base64 128 # The output should be put in one line when using in the subsequent files
ipsec auto --add instance-1
/sbin/service ipsec restart
/usr/sbin/ipsec auto --up instance-1
ipsec auto --add instance-2
/sbin/service ipsec restart
/usr/sbin/ipsec auto --up instance-2 You should see a log such as following indicating the VPN tunnel is created
One quick note, for the IPSec tunnel, your security group should be configured as following to allow for the tunnel to be created. ![]() Run the executables
You should be able to see the issue reproduce within a minute. Hope this helps. Looking forward to hearing back from you. |
@mknyszek I wanted to add a new piece of information I've discovered. It might be useful in your investigation and just a note to anyone else that might run into this in future. Using StrongSWAN instead of LibreSWAN resolves the issue. In addition using the AL2 AMI doesn't reproduce the issue. |
This is continuation of #61552 since that one is closed.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes. All versions after go.19, and including, reproduce the issue.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I compiled a client code and server code as following on my M1 Mac and uploaded to a linux Server.
For client I compiled using
GOOS=linux GOARCH=amd64 go build -o client
For server I compiled using
GOOS=linux GOARCH=amd64 go build -o server
I ran the server in 3 different EC2 instances and in docker.
I used the following command to run it.
docker build . -t test:latest docker run --rm --network host test:latest
When I ran the server on the three different machines using host networking and then exected the client I see the client panics with the following stacktraces
What did you expect to see?
I expected not to see a panic.
What did you see instead?
I saw a panic from within the standard library. This issue occurred when one of the servers was down.
The text was updated successfully, but these errors were encountered: