Skip to content

net: connect after polling initialization #8426

@gopherbot

Description

@gopherbot

by jason@eggnet.com:

I have checked the latest source, this bug applies as of the current master branch.

go is calling epoll_ctl ADD before connect which I believe is incorrect. If another
thread is in epoll_wait, that thread can (will?) get an event with EPOLLOUT|EPOLLHUP
flags before connect is called, which is what has been happening.

In general this appears to be a general flaw with all forms of polling for all operating
systems, but I have only performed extensive testing with linux amd64 / epoll. If I knew
of another popular program that managed the same epoll file descriptor in multiple
threads I would include information about that here, but I do not. nginx does call
epoll_ctl_add prior to connect but it is single threaded. When using multiple processes
each nginx process has one thread with its own instance of epoll. Since there is no
other thread in epoll_wait, the order of operations I suspect is irrelevant for nginx.

The call chain for linux for adding a new outbound socket to epoll is shown. The fd.dial
function is deciding to call fd.init before connecting. I'm not necessarily saying the
fix is as simple as altering the order of operations in fd.dial. If connect always
responded with EINPROGRESS it might be that simple. But apparently certain operating
systems will return from connect immediately with no error when connecting to localhost.
If that happens, then there are no pending operations on the new socket. It doesn't need
to be monitored by epoll. Therefore some care has to be taken, I think, in terms of
dealing with the pd and fd state.

d.Dial : dial.go
dialSingle : dial.go
dialTCP : tcpsock_posix.go
socket : sock_posix.go
fd.dial : sock_posix.go
fd.init : fd_unix.go
fd.pd.Init : fd_poll_runtime.go
runtime_pollOpen : znetpoll_linux_amd64.c

I wrote a quick test in C, relevant syscalls highlighted below. The test was to make
sure that epoll_wait was able to see the connect event even after a substantial delay
between connect and epoll_ctl.

1406334260.600586 socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) =
4 <0.000013>
1406334260.600629 setsockopt(4, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0 <0.000006>
1406334260.600671 connect(4, {sa_family=AF_INET, sin_port=htons(3456),
sin_addr=inet_addr("69.28.182.189")}, 16) = -1 EINPROGRESS (Operation now in
progress) <0.000054>
1406334260.600808 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 <0.000005>
1406334260.600859 rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0 <0.000005>
1406334260.600913 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000004>
1406334260.600946 nanosleep({5, 0}, 0x7fff83894450) = 0 <5.000141>
1406334265.601188 epoll_ctl(3, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLOUT|EPOLLET|0x2000,
{u32=4, u64=18023070048452612}}) = 0 <0.000010>
1406334265.601237 epoll_wait(3, {{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=4,
u64=18023070048452612}}}, 1, 10000) = 1 <0.000005>
1406334265.601310 getsockopt(4, SOL_SOCKET, SO_ERROR, [111], [4]) = 0 <0.000006>
1406334265.601350 close(4)              = 0 <0.000018>


Here's a clearer picture of what is happening in go:

http://play.golang.org/p/wXceILeetG

This is with go's internal resolver.

Running the trace like this:

strace -ttt -T -ff -o a ./test hostname:3456

Find the write that returns ECONNREFUSED.

Determine which file descriptor it is, then determine which u64 ID is used with it in
epoll_ctl. This should show all relevant syscalls that relate to the file descriptor or
u64 ID.

The spurious epoll_wait event fires between epoll_ctl and connect.

egrep '[(, ]215($|[), ])|139969772241984' a.* | perl -pe 's/^a.(\d+):(\S+)/$2 $1/' | sort

1406320197.172769 26587 getsockopt(439, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
<0.000005>
1406320197.172794 26587 epoll_ctl(4, EPOLL_CTL_DEL, 439, {0, {u32=4213032,
u64=4213032}}) = 0 <0.000006>
1406320197.172817 26587 close(439)            = 0 <0.000010>
1406320197.507617 26587 socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK,
IPPROTO_IP) = 439 <0.000010>
1406320197.507645 26587 setsockopt(439, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
<0.000006>
1406320197.507672 26587 epoll_ctl(4, EPOLL_CTL_ADD, 439,
{EPOLLIN|EPOLLOUT|EPOLLET|0x2000, {u32=2368844064, u64=139971058053408}}) = 0
<0.000006>
1406320197.507823 26588 epoll_wait(4, {{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368724616, u64=139971057933960}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368839488, u64=139971058048832}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368774368, u64=139971057983712}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368821712, u64=139971058031056}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368839136, u64=139971058048480}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368757648, u64=139971057966992}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368839312, u64=139971058048656}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368757296, u64=139971057966640}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368757120, u64=139971057966464}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368757472, u64=139971057966816}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368756944, u64=139971057966288}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368722504, u64=139971057931848}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368722328, u64=139971057931672}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368791088, u64=139971058000432}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368790912, u64=139971058000256}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368780528, u64=139971057989872}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368790560, u64=139971057999904}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368790384, u64=139971057999728}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368790736, u64=139971058000080}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368790032, u64=139971057999376}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368790208, u64=139971057999552}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368789680, u64=139971057999024}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368789328, u64=139971057998672}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368789856, u64=139971057999200}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368789152, u64=139971057998496}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368788976, u64=139971057998320}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368789504, u64=139971057998848}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368761696, u64=139971057971040}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368788800, u64=139971057998144}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368788624, u64=139971057997968}}, {EPOLLOUT|EPOLLHUP, {u32=2368844064,
u64=139971058053408}}}, 128, 0) = 31 <0.000010>
1406320197.507942 26587 connect(439, {sa_family=AF_INET, sin_port=htons(3456),
sin_addr=inet_addr("x")}, 16) = -1 EINPROGRESS (Operation now in progress)
<0.000031>
1406320197.508003 26587 getsockopt(439, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
<0.000005>
1406320197.508081 26587 getsockname(439, {sa_family=AF_INET, sin_port=htons(36051),
sin_addr=inet_addr("x")}, [16]) = 0 <0.000005>
1406320197.508109 26587 getpeername(439, 0x7f4d8cdb4560, [112]) = -1 ENOTCONN (Transport
endpoint is not connected) <0.000005>
1406320197.508141 26587 setsockopt(439, SOL_TCP, TCP_NODELAY, [1], 4) = 0
<0.000006>
1406320197.508262 26587 write(439, "hi", 2)   = -1 EAGAIN (Resource
temporarily unavailable) <0.000006>
1406320197.518050 26588 epoll_wait(4, {{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368843888, u64=139971058053232}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368843712, u64=139971058053056}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368843536, u64=139971058052880}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368844768, u64=139971058054112}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368754128, u64=139971057963472}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368806224, u64=139971058015568}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368782992, u64=139971057992336}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368761168, u64=139971057970512}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368806048, u64=139971058015392}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368833504, u64=139971058042848}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368761344, u64=139971057970688}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368834208, u64=139971058043552}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368805344, u64=139971058014688}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368723736, u64=139971057933080}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368803760, u64=139971058013104}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368834032, u64=139971058043376}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368765568, u64=139971057974912}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368770848, u64=139971057980192}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368831568, u64=139971058040912}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368831392, u64=139971058040736}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368831216, u64=139971058040560}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368815904, u64=139971058025248}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368831040, u64=139971058040384}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368830864, u64=139971058040208}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368830688, u64=139971058040032}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368830512, u64=139971058039856}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368844592, u64=139971058053936}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368844416, u64=139971058053760}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368844240, u64=139971058053584}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368723384, u64=139971057932728}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368820656, u64=139971058030000}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368767856, u64=139971057977200}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000,
{u32=2368844064, u64=139971058053408}}, {EPOLLOUT|EPOLLHUP, {u32=2368771200,
u64=139971057980544}}}, 128, 0) = 34 <0.000010>
1406320197.607291 26587 write(439, "hi", 2)   = -1 ECONNREFUSED (Connection
refused) <0.000006>
1406320197.607607 26587 epoll_ctl(4, EPOLL_CTL_DEL, 439, {0, {u32=4213032,
u64=4213032}}) = 0 <0.000006>
1406320197.607631 26587 close(439)            = 0 <0.000010>

Attachments:

  1. test.c (1874 bytes)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions