New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
non-blocking connects with multiple DNS entries return EALREADY #99
Comments
|
I did a little digging into this when daurnimator first mentioned it in the prosody muc and it looked to me like the address loop was just failing to account for timeout as a valid response for an address and continuing to loop incorrectly instead. |
|
Yep, the bug is here: https://github.com/diegonehab/luasocket/blob/master/src/inet.c#L418 Though a proper fix might require a bit of a refactor. Calling |
|
How do you propose us do a non-blocking connect when there are multiple addresses? I mean, what would the semantics be? |
If It's the only thing that make sense in the current api. |
|
Without testing the other ones? What if one of them would return connected immediately? |
|
is calling If that's the case, the proper procedure should probably be that they |
|
I think that users should write their own loops when connecting in non-blocking mode. On the other hand, the current behavior is not very informative, so perhaps we should change the implementation to wait after the call to connect? Whatever gets done, we should make sure that it also works on Windows. |
|
I found that the POSIX documentation defines that the return value of connect() when used on a currently connecting non-blocking) socket shall be EALREADY. |
|
Update for WIndows implementation: Microsoft specifies (http://msdn.microsoft.com/en-US/library/windows/desktop/ms737625%28v=vs.85%29.aspx): "Until the connection attempt completes on a nonblocking socket, all subsequent calls to connect on the same socket will fail with the error code WSAEALREADY, and WSAEISCONN when the connection completes successfully. Due to ambiguities in version 1.1 of the Windows Sockets specification, error codes returned from connect while a connection is already pending may vary among implementations. As a result, it is not recommended that applications use multiple calls to connect to detect connection completion. If they do, they must be prepared to handle WSAEINVAL and WSAEWOULDBLOCK error values the same way that they handle WSAEALREADY, to assure robust operation." |
|
POSIX says this about connect() with non-blocking sockets: |
|
The Windows case might be a bit more complicated. When calling But, if you call I don't know how unix handles this scenario. |
|
It would be great if somebody could figure out the portable way for doing non-blocking connects in LuaSocket so that users can forget if they are on Windows or Unix. Is this possible? |
I think the pattern (for a single socket) should be call This flow will be quite uncomfortable with luasocket; as the polling primitive is left to the user (to roll their own multi-tasking). |
|
I have been working on Copas, and fixed the connect like so; function copas.connect(skt, host, port)
skt:settimeout(0)
local ret, err, tried_more_than_once
repeat
ret, err = skt:connect (host, port)
-- non-blocking connect on Windows results in error "Operation already
-- in progress" to indicate that it is completing the request async. So essentially
-- it is the same as "timeout"
if ret or (err ~= "timeout" and err ~= "Operation already in progress") then
-- Once the async connect completes, Windows returns the error "already connected"
-- to indicate it is done, so that error should be ignored. Except when it is the
-- first call to connect, then it was already connected to something else and the
-- error should be returned
if (not ret) and (err == "already connected" and tried_more_than_once) then
ret = 1
err = nil
end
return ret, err
end
tried_more_than_once = tried_more_than_once or true
coroutine.yield(skt, _writing)
until false
endThe |
|
I found a note on a Microsoft page that said that thee are several With Linux, the situation is 100% POSIX, so that any further connect() Actually, no repeated connect() is necessary to complete the connection. Side Note: Oliver Am 03.03.2015 um 23:52 schrieb Thijs Schreijer:
|
|
From that same page (http://msdn.microsoft.com/en-US/library/windows/desktop/ms737625%28v=vs.85%29.aspx): It reads (before the excerpt you quoted above);
Note the first point. The way I read that is that the problem you mention, regarding the different return values, occurs only if one just repeatedly calls If you want this transparently handled within LuaSocket, then you must retain connection status in the socket object. Tracking for whether the call to connect is the first or a repeat, checking whether the connection target on the subsequent calls hasn't changed, etc. |
What happens on that OS if you call |
|
Thijs, it does the same: returns "invalid handle" error. It's a weird OS. Oliver Am 04.03.2015 um 10:12 schrieb Thijs Schreijer:
|
|
Am 04.03.2015 um 10:07 schrieb Thijs Schreijer: "If you want this transparently handled within LuaSocket, then you must It may be worth the effort to handle this within the socket library Oliver |
|
Hi Thijs, this looks very promising, especially the fact that TCP sockets have a I will give it a try and let you know the outcome. Oliver Am 03.03.2015 um 23:52 schrieb Thijs Schreijer:
|
|
to use it with LuaSocket I think something like the following pseudo code should be used -- assumes fields
-- skt.host
-- skt.port
-- skt.connectinprogress
connect_non_block(skt, host, port)
if not skt.connectinprogress then
-- new connect
err = skt:connect(host, port)
if err == isconnected then return nil, isconnected end
if err == inprogress then
-- mark connection as started
skt.connectinprogress = true
skt.host = host
skt.port = port
return nil, timeout
end
if err == succes then return 1 end
return nil, errormsg
else
-- already busy
if host~=skt.host or port~=skt.port then
return nil, allready_in_progress -- don't allow change target half way a connect
end
-- if we're not writeable, then its a timeout
iswriteable = select(nil, {skt}, 0) -- 0 timeout, check if writeable
if not iswriteable then return nil, timeout end
-- so try again to connect
err = skt:connect(host, port)
if err == inprogress then return nil, timeout end
skt.host = nil
skt.port = nil
skt.connectinprogress = nil
if err == isconnected then -- succesfully connected
return 1
else
return nil, errormsg
end
end
endIt has the overhead of doing a |
|
Is there a minimum number of changes that would unify Windows and Unix behavior without completely changing the way we do things in LuaSocket? This has always been a source of frustration. |
|
Well... I'm not sure there really is a cross-platform problem. This code (#99 (comment)) works for me so far. Apparently there are some exceptional OS'ses where it doesn't, but Windows, POSIX and generally unix ought to work. But then you rely on the client code to do the right thing. The one thing I don't understand is the initial message regarding "multiple DNS entries". First hunch is that it is the same as without the "multiple", but it was handled incorrectly. But maybe someone can explain the problem better, or try my code to see whether that works in the "multiple DNS entries" scenario. |
|
The issue with non-blocking connects with names that resolve to multiple addresses is: what does that mean? When the connect is blocking, we go over each of the addresses in order until one of them succeeds. When connect is non-blocking, we can't know immediately if the first connect will fail or not, so we have to return immediately. The current code fails because of EINPROGRESS. This should be fixed. |
|
Sorry for my sily questions below, I'm not an expert on this matter, just learning on-the-go...
For my understanding, when connecting, the DNS lookup returns multiple addresses. Now who is responsible for iterating over those? the OS or LuaSocket? How does trying multiple addresses work with only a single socket? You can only try one at a time, no? How about an extra timeout? If the first address doesn't connect within that timeframe, then go to the second. Passing EINPROGRESS to the async client, telling him to try again.
I tend to agree, this being a client issue, as basically the whole DNS lookup stuff is completely blocking anyway. The only way to circumvent that is by doing DNS lookups async yourself, and not through the OS. |
|
The "standard way of doing things" states that these addresses should be tried in sequence. The OS doesn't do this because each call to the OS connect receives a single address, and not domain names. So either LuaSocket or the client has to do it. The typical LuaSocket use is blocking. So LuaSocket does it for the user. For non-blocking, things fall outside of LuaSocket's responsibility, in my opinion. It is possible to replicate the internal C behavior of connect using Lua code by simply resolving the addresses and calling LuaSocket's connect directly with IP addresses instead of a domain name. And this would work already in the current version of the code. What we need to do is to fail in a consistent way when there is a non-blocking connect attempt to a domain name that resolves to multiple addresses. That's all. |
|
So, if non-blocking connect receives multiple addresses, then let it fail with a newly defined error message eg.
to prevent having to resolve the address again. |
|
Thijs, possibly not. The result is a list of IP addresses which are passed to the To handle this in a common manner, I guess, it's best to manually call Oliver Am 04.03.2015 um 15:29 schrieb Thijs Schreijer:
|
|
Obviously the DNS is blocking, so connect will also block temporarily while it looks up a domain name. But if you feed it merely an IP address, then it wouldn't go through DNS would it? no? So using connect non-blocking with a single address returned, might still block on the DNS part. But that is a given, as Diego mentioned, async can only be resolved outside LuaSocket. So if a client needs truly async, then he should provide his own async dns resolver. Sounds like it sucks, but it would already greatly improve current state of affairs. By returning the multiple addresses and let the client decide. |
|
To be absolutely sure, we would need to distinguish between numeric and named addresses ourselves, and then pass the appropriate flag to getaddrinfo to make sure DNS is not being engaged. "If the AI_NUMERICHOST flag is specified, then a non-null nodename string supplied shall be a numeric host address string. Otherwise, an [EAI_NONAME] error is returned. This flag shall prevent any type of name resolution service (for example, the DNS) from being invoked. If the AI_NUMERICSERV flag is specified, then a non-null servname string supplied shall be a numeric port string. Otherwise, an [EAI_NONAME] error shall be returned. This flag shall prevent any type of name resolution service (for example, NIS+) from being invoked." Since distinguishing between numeric addresses and named addresses would better be performed by inet_pton() anyway, we might as well avoid the call to getaddrinfo. This is how LuaSocket did it before there was IPv6. I suppose we should move it back to how it was and add a pton test before trying getaddrinfo. What do you think? |
Sounds like a good idea. see the cqueues implementation: https://github.com/wahern/cqueues/blob/master/src/lib/socket.c#L498 |
|
Is that really necessary? If I do But if you want to be certain... add it. |
|
You'd be surprised. I don't think we need to add the test. What we definitely need to add is that error checking that you suggested. I.e., if the socket is in non-blocking mode and if the resolver returns multiple addresses, we need to report an error. |
|
Possibly, it may be useful to report an error only if the application Oliver Am 05.03.2015 um 21:24 schrieb Diego Nehab:
|
|
Assume a non-blocking socket doing a connect;
So basically, the socket being non-blocking, is the parameter that @okroth mentiones, so I don't think there is any gain in @okroth suggestion. And we also don't need the extra return parameters listing the possible addresses. Or am I missing something? It probably warrants a utility function like The alternative being on a non-blocking socket;
Question: what constitutes a socket being in 'non-blocking' mode in LuaSocket land? couldn't find it in the docs. Is it a timeout of 0? |
|
Hi Thijs, the reason why I mentioned the parameter is based on the idea that The DNS lookup takes only a few milliseconds, if the DNS is set up If there is more than one address, and the OS did not already test them, We may implement this code into the socket_connect() function, but then On the other hand, there may be applications that really like to use I would go the second way. BTW: The DNS lookup is always (shortly) blocking, so there is no way Oliver Am 06.03.2015 um 09:15 schrieb Thijs Schreijer:
|
|
Though I think my 2nd proposal (error out when resolving a name on a non-blocking socket) is the cleanest from an API perspective. It is also the most 'breaking' option, so probably overall not the 'best' option. Consolidating; when connecting on a non-blocking socket;
Does that make sense? PS. anyone an answer to my "what constitutes a non-blocking socket" question? |
|
I think in 3 we should simply try the first address. Simpler to code. The only people that would care about this would know what to do anyway. Nobody's code is broken by this change, because at this moment this simply doesn't work. The behavior becomes consistent and reproducible. Whoever wants to write their own non-blocking connect for hosts that resolve to multiple addresses can do it with the API. A non-blocking socket is a socket that has had settimeout() called on it. If the user does that before a call to connect, we can test by looking at the tm structure. |
|
To me this looks sensible. Oliver Am 07.03.2015 um 20:41 schrieb Thijs Schreijer:
|
|
What was the outcome on this? |
|
I implemented the idea trying just the first returned address if the socket is set to non-blocking. |
Non-blocking connects when theres multiple DNS entries keep iterating through entries after getting
EINPROGRESSand end up returningEALREADYinstead.Expected
nil, timeoutinstead.The text was updated successfully, but these errors were encountered: