-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Closed
Labels
Milestone
Description
by jason@eggnet.com:
I have checked the latest source, this bug applies as of the current master branch. go is calling epoll_ctl ADD before connect which I believe is incorrect. If another thread is in epoll_wait, that thread can (will?) get an event with EPOLLOUT|EPOLLHUP flags before connect is called, which is what has been happening. In general this appears to be a general flaw with all forms of polling for all operating systems, but I have only performed extensive testing with linux amd64 / epoll. If I knew of another popular program that managed the same epoll file descriptor in multiple threads I would include information about that here, but I do not. nginx does call epoll_ctl_add prior to connect but it is single threaded. When using multiple processes each nginx process has one thread with its own instance of epoll. Since there is no other thread in epoll_wait, the order of operations I suspect is irrelevant for nginx. The call chain for linux for adding a new outbound socket to epoll is shown. The fd.dial function is deciding to call fd.init before connecting. I'm not necessarily saying the fix is as simple as altering the order of operations in fd.dial. If connect always responded with EINPROGRESS it might be that simple. But apparently certain operating systems will return from connect immediately with no error when connecting to localhost. If that happens, then there are no pending operations on the new socket. It doesn't need to be monitored by epoll. Therefore some care has to be taken, I think, in terms of dealing with the pd and fd state. d.Dial : dial.go dialSingle : dial.go dialTCP : tcpsock_posix.go socket : sock_posix.go fd.dial : sock_posix.go fd.init : fd_unix.go fd.pd.Init : fd_poll_runtime.go runtime_pollOpen : znetpoll_linux_amd64.c I wrote a quick test in C, relevant syscalls highlighted below. The test was to make sure that epoll_wait was able to see the connect event even after a substantial delay between connect and epoll_ctl. 1406334260.600586 socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 4 <0.000013> 1406334260.600629 setsockopt(4, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0 <0.000006> 1406334260.600671 connect(4, {sa_family=AF_INET, sin_port=htons(3456), sin_addr=inet_addr("69.28.182.189")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000054> 1406334260.600808 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 <0.000005> 1406334260.600859 rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0 <0.000005> 1406334260.600913 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000004> 1406334260.600946 nanosleep({5, 0}, 0x7fff83894450) = 0 <5.000141> 1406334265.601188 epoll_ctl(3, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLOUT|EPOLLET|0x2000, {u32=4, u64=18023070048452612}}) = 0 <0.000010> 1406334265.601237 epoll_wait(3, {{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=4, u64=18023070048452612}}}, 1, 10000) = 1 <0.000005> 1406334265.601310 getsockopt(4, SOL_SOCKET, SO_ERROR, [111], [4]) = 0 <0.000006> 1406334265.601350 close(4) = 0 <0.000018> Here's a clearer picture of what is happening in go: http://play.golang.org/p/wXceILeetG This is with go's internal resolver. Running the trace like this: strace -ttt -T -ff -o a ./test hostname:3456 Find the write that returns ECONNREFUSED. Determine which file descriptor it is, then determine which u64 ID is used with it in epoll_ctl. This should show all relevant syscalls that relate to the file descriptor or u64 ID. The spurious epoll_wait event fires between epoll_ctl and connect. egrep '[(, ]215($|[), ])|139969772241984' a.* | perl -pe 's/^a.(\d+):(\S+)/$2 $1/' | sort 1406320197.172769 26587 getsockopt(439, SOL_SOCKET, SO_ERROR, [111], [4]) = 0 <0.000005> 1406320197.172794 26587 epoll_ctl(4, EPOLL_CTL_DEL, 439, {0, {u32=4213032, u64=4213032}}) = 0 <0.000006> 1406320197.172817 26587 close(439) = 0 <0.000010> 1406320197.507617 26587 socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 439 <0.000010> 1406320197.507645 26587 setsockopt(439, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0 <0.000006> 1406320197.507672 26587 epoll_ctl(4, EPOLL_CTL_ADD, 439, {EPOLLIN|EPOLLOUT|EPOLLET|0x2000, {u32=2368844064, u64=139971058053408}}) = 0 <0.000006> 1406320197.507823 26588 epoll_wait(4, {{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368724616, u64=139971057933960}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368839488, u64=139971058048832}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368774368, u64=139971057983712}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368821712, u64=139971058031056}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368839136, u64=139971058048480}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368757648, u64=139971057966992}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368839312, u64=139971058048656}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368757296, u64=139971057966640}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368757120, u64=139971057966464}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368757472, u64=139971057966816}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368756944, u64=139971057966288}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368722504, u64=139971057931848}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368722328, u64=139971057931672}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368791088, u64=139971058000432}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368790912, u64=139971058000256}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368780528, u64=139971057989872}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368790560, u64=139971057999904}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368790384, u64=139971057999728}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368790736, u64=139971058000080}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368790032, u64=139971057999376}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368790208, u64=139971057999552}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368789680, u64=139971057999024}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368789328, u64=139971057998672}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368789856, u64=139971057999200}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368789152, u64=139971057998496}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368788976, u64=139971057998320}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368789504, u64=139971057998848}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368761696, u64=139971057971040}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368788800, u64=139971057998144}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368788624, u64=139971057997968}}, {EPOLLOUT|EPOLLHUP, {u32=2368844064, u64=139971058053408}}}, 128, 0) = 31 <0.000010> 1406320197.507942 26587 connect(439, {sa_family=AF_INET, sin_port=htons(3456), sin_addr=inet_addr("x")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000031> 1406320197.508003 26587 getsockopt(439, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 <0.000005> 1406320197.508081 26587 getsockname(439, {sa_family=AF_INET, sin_port=htons(36051), sin_addr=inet_addr("x")}, [16]) = 0 <0.000005> 1406320197.508109 26587 getpeername(439, 0x7f4d8cdb4560, [112]) = -1 ENOTCONN (Transport endpoint is not connected) <0.000005> 1406320197.508141 26587 setsockopt(439, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000006> 1406320197.508262 26587 write(439, "hi", 2) = -1 EAGAIN (Resource temporarily unavailable) <0.000006> 1406320197.518050 26588 epoll_wait(4, {{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368843888, u64=139971058053232}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368843712, u64=139971058053056}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368843536, u64=139971058052880}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368844768, u64=139971058054112}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368754128, u64=139971057963472}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368806224, u64=139971058015568}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368782992, u64=139971057992336}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368761168, u64=139971057970512}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368806048, u64=139971058015392}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368833504, u64=139971058042848}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368761344, u64=139971057970688}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368834208, u64=139971058043552}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368805344, u64=139971058014688}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368723736, u64=139971057933080}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368803760, u64=139971058013104}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368834032, u64=139971058043376}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368765568, u64=139971057974912}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368770848, u64=139971057980192}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368831568, u64=139971058040912}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368831392, u64=139971058040736}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368831216, u64=139971058040560}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368815904, u64=139971058025248}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368831040, u64=139971058040384}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368830864, u64=139971058040208}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368830688, u64=139971058040032}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368830512, u64=139971058039856}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368844592, u64=139971058053936}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368844416, u64=139971058053760}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368844240, u64=139971058053584}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368723384, u64=139971057932728}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368820656, u64=139971058030000}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368767856, u64=139971057977200}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368844064, u64=139971058053408}}, {EPOLLOUT|EPOLLHUP, {u32=2368771200, u64=139971057980544}}}, 128, 0) = 34 <0.000010> 1406320197.607291 26587 write(439, "hi", 2) = -1 ECONNREFUSED (Connection refused) <0.000006> 1406320197.607607 26587 epoll_ctl(4, EPOLL_CTL_DEL, 439, {0, {u32=4213032, u64=4213032}}) = 0 <0.000006> 1406320197.607631 26587 close(439) = 0 <0.000010>
Attachments:
- test.c (1874 bytes)