Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: UDPConn.ReadFrom and UDPConn.WriteTo failing on Linux ARM device #7299

Closed
gopherbot opened this issue Feb 10, 2014 · 8 comments

Comments

Projects
None yet
5 participants
@gopherbot
Copy link

commented Feb 10, 2014

by armon.dadgar:

This issue has cropped up from a bug report against Serf here:
hashicorp/serf#123

Serf seems to have no issues on other platforms, however on this Linux ARM
environment, it seems that UDP packets cannot be sent or received.

We ran it under strace, and collected a the relevant sample, available here:
https://www.dropbox.com/s/bvfv4zjtmnqr12o/serf.stderr.output

It seems that `recvfrom` is always failing with EFAULT preventing packets from
being received. Similarly `sendto` is failing with EINVAL preventing packets
from being sent.

User is reporting the master build of Serf on ARM can reproduce the problem.

What steps will reproduce the problem?
1. Run a Serf agent on ARM "serf agent"
2. Run another Serf agent
3. Join the agents "serf join <agent1>"
4. Error messages about failed packet send/receive will appear

What is the expected output?
No error messages

What do you see instead?
2014/02/08 13:24:36 [ERR] memberlist: Failed to send gossip to 192.168.2.1:7946: write
udp: invalid argument
2014/02/08 13:24:36 [ERR] memberlist: Error reading UDP packet: read udp
192.168.2.2:7946: bad address

Which operating system are you using?
Linux version 2.6.12.6-arm1 (root@NasARM4) (gcc version 3.4.4 (release) (CodeSourcery
ARM 2005q3-2)) #2 Sun Sep 18 02:09:29 CST 2011 

Which version are you using?  (run 'go version')
go version go1.2 darwin/amd64
@davecheney

This comment has been minimized.

Copy link
Contributor

commented Feb 10, 2014

Comment 1:

> Which operating system are you using?
Linux version 2.6.12.6-arm1 (root@NasARM4) (gcc version 3.4.4 (release) (CodeSourcery
ARM 2005q3-2)) #2 Sun Sep 18 02:09:29 CST 2011 
This system is below the minimum kernel required for linux/arm. You need 2.6.27, and
2.6.29 is strongly recommended. Before then, the atomic operations were broken in the
kernel (armv5 needs kernel assistance for atomics)
That said, there is probably a bug with the syscall definition for sendto(2) and friends
on linux/arm.
Can you please create a short code sample, just setup the socket, do a sendto or
recvfrom and print out the error.

Labels changed: added release-none, repo-main, arch-arm.

Status changed to WaitingForReply.

@gopherbot

This comment has been minimized.

Copy link
Author

commented Feb 10, 2014

Comment 2 by armon.dadgar:

Here is a short example to repro:
https://gist.github.com/armon/8926419/raw/a1ddfcaed4a4889387732e55acd02fbbb9d217eb/test.go
Output on darwin:
[INFO] Packet from 127.0.0.1:10000 4
[INFO] Packet from 127.0.0.1:10000 4
Output on arm:
[ERR] Error sending UDP packet: write udp: invalid argument
[ERR] Error sending UDP packet: write udp: invalid argument
@davecheney

This comment has been minimized.

Copy link
Contributor

commented Feb 11, 2014

Comment 3:

I'm afraid this may be a kernel age issue
panda(~/src) % uname -a
Linux panda 3.7.10-x13 #1 SMP Wed Jun 26 07:33:15 UTC 2013 armv7l GNU/Linux
panda(~/src) % ./test
[INFO] Packet from 127.0.0.1:10000 4
[INFO] Packet from 127.0.0.1:10000 4
[INFO] Packet from 127.0.0.1:10000 4
[INFO] Packet from 127.0.0.1:10000 4
Can you attach an strace to that process and log some details about the syscall it is
attempting.
@gopherbot

This comment has been minimized.

Copy link
Author

commented Feb 11, 2014

Comment 4 by armon.dadgar:

Unfortunately I don't have access to the device myself, it was a user of Serf that I was
working with over IRC. The original bug report includes a link to output of strace,
which shows recvfrom and sendto returning the error code.
Feel free to close this due to kernel age. Thanks!
@davecheney

This comment has been minimized.

Copy link
Contributor

commented Feb 11, 2014

Comment 5:

So it is
[pid  3093] recvfrom(5,  <unfinished ...>
[pid  3089] <... clock_gettime resumed> {1392077603, 403573000}) = 0
[pid  3093] <... recvfrom resumed> 0x10a78000, 65536, 0, 0x10a77d90, 0x10a00d28) =
-1 EFAULT (Bad address)
I'll have a look at the machines I have available. I'm not sure if there is much that
can be done, 2.16.12 will generate many confusing error reports -- none of sync/atomic
will work properly, as well as the gc will probably segfault or corrupt memory as
atomics don't work.

Status changed to Thinking.

@gopherbot

This comment has been minimized.

Copy link
Author

commented Apr 24, 2014

Comment 6 by jtolds:

I haven't been able to reproduce this issue reliably, but on a bunch of ARM devices I
have with kernel 3.8.6, UDP packet sending will every so often fail, such that all
future writes on the UDPConn fail with EINVAL.
one difference between my code and the repro test code above is i was using DialUDP
instead of ListenUDP and Write instead of WriteTo. I don't think that should make any
difference, but i'll give connectionless mode a shot. 
in my case, writes don't fail right away. 
could be a completely separate issue.
@jart

This comment has been minimized.

Copy link

commented Dec 26, 2014

I encountered this error, but it turned out I was trying to send to an internet IP when I bound the socket on localhost.

@rsc rsc removed the arch-arm label Apr 10, 2015

@rsc rsc added this to the Unplanned milestone Apr 10, 2015

@rsc rsc removed release-none labels Apr 10, 2015

@mikioh mikioh removed the Thinking label May 14, 2015

@mikioh

This comment has been minimized.

Copy link
Contributor

commented May 14, 2015

We don't see this failure on build dashboard and linux/arm builders. Also Go 1.5 supports Linux 2.6.23 and above. Closing. FWIW, EFAULT is usually a sign of operations on corrupted packets including broken wire format, invalid checksum, blah blah.

@mikioh mikioh closed this May 14, 2015

@golang golang locked and limited conversation to collaborators Jun 25, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.