Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a fix for properly reporting CONNRESET. #81

Merged
merged 1 commit into from
Nov 9, 2023

Conversation

pkulchenko
Copy link
Contributor

This patch fixes an issue with the current version of the library that doesn't properly report CONNRESET to the application.

This issue is 100% reproducible in my application, even though I failed to create a simple script to demonstrate it on. The issue happens when the process on the other side of the connection is killed. I bisected this issue to the commit 734cc23 with the following comment:

I am a bit unsure about the UDP fix, because it may affect
TCP as well. On UDP sockets, when a sendto fails, the next
receive/receivefrom fails with CONNRESET. I changed
sock_recv/sock_recvfrom in wsocket.c to skip the CONNRESET
from the recv/recvfrom, hoping that if the socket is TCP,
sock_waitfd will get the CONNRESET again. The tests pass,
but this should be tested more thoroughly.

Also a very relevant discussion on the related pull request here.

By reviewing the symptoms I see, this is exactly what happens: my application that uses copas doesn't get "nil, closed" event, only sees "nil, timeout", and continues 'selecting' the socket that has already been closed, thus never returning the control.

This issue is fixed in the commit by changing

if (err != WSAEWOULDBLOCK && err != WSAECONNRESET) return err;

to

if (err != WSAEWOULDBLOCK) return err;

or by adding continue statement in the current HEAD as I did in the patch.

I get all the tests passing with these changes and my application is working correctly.

I can provide instructions on how to test this using my app; it may only take 3-5m or so to set it up.

@diegonehab
Copy link
Contributor

What platform is this?

On Tue, Sep 17, 2013 at 2:53 AM, Paul Kulchenko notifications@github.comwrote:

This patch fixes an issue with the current version of the library that
doesn't properly report CONNRESET to the application.

This issue is 100% reproducible in my application, even though I failed to
create a simple script to demonstrate it on. I bisected this issue to the
commit 734cc23 734cc23e1fwith the following comment:

I am a bit unsure about the UDP fix, because it may affect
TCP as well. On UDP sockets, when a sendto fails, the next
receive/receivefrom fails with CONNRESET. I changed
sock_recv/sock_recvfrom in wsocket.c to skip the CONNRESET
from the recv/recvfrom, hoping that if the socket is TCP,
sock_waitfd will get the CONNRESET again. The tests pass,
but this should be tested more thoroughly.

Also a very relevant discussion on the related pull request herehttps://github.com//pull/39#issuecomment-18508959
.

By reviewing the symptoms I see, this is exactly what happens: my
application that uses copas doesn't get "nil, closed" event, only sees
"nil, timeout", and continues 'selecting' the socket that has already been
closed, thus never returning the control.

This issue is fixed in the commit by changing

if (err != WSAEWOULDBLOCK && err != WSAECONNRESET) return err;

to

if (err != WSAEWOULDBLOCK) return err;

or by adding continue statement in the current HEAD as I did in the patch.

I get all the tests passing with these changes and my application is
working correctly.

I can provide instructions on how to test this using my app; it may only

take 3-5m or so to set it up.

You can merge this Pull Request by running

git pull https://github.com/pkulchenko/luasocket fix-close-win

Or view, comment on, or merge it at:

#81
Commit Summary

  • Added a fix for properly reporting CONNRESET.

File Changes

Patch Links:

@pkulchenko
Copy link
Contributor Author

Windows Vista 32bit, but I got reports of the same issue on Windows 7 64bit, so I presume all Windows versions may be affected.

@pkulchenko
Copy link
Contributor Author

Diego, please let me know if you need any more information on this as I'd like to get some confirmation on the fix as it's blocking next release for my app; thank you!

@diegonehab
Copy link
Contributor

Hey Paul,

As the comment states, the reason for that change that caused the bug was a
problem with UDP. So I can't simply revert the change because it would
reintroduce the bug in UDP. What I think I will have to do is split the
send and recv functions used by TCP and UDP so that each one can use its
own function. This requires somewhat larger patch and I will look into it.

Regards,
Diego

On Thu, Sep 19, 2013 at 2:57 PM, Paul Kulchenko notifications@github.comwrote:

Diego, please let me know if you need any more information on this as I'd
like to get some confirmation on the fix as it's blocking next release for
my app; thank you!


Reply to this email directly or view it on GitHubhttps://github.com//pull/81#issuecomment-24759912
.

@pkulchenko
Copy link
Contributor Author

Hi Diego,

This sounds good; thank you. So, just to confirm, (1) the TCP behavior is correct with the patch. (2) the UDP problem is this: "call udp:sendto("::1", carzyport), then try a udp:receive(). It will return nil, "closed" immediately. This is because of the WSAECONNRESET." If this is the only but I'm going to see with the current fix, I can probably live with it. I'm trying to decide what the risk is with patching it myself versus waiting for your proper fix (I'll definitely upgrade in either case when the fix is out).

May be easier to simply pass it along to the function call? After all, the callee knows the type.

Can you try this as you suggested, instead of splitting into two functions?

@pkulchenko
Copy link
Contributor Author

Hi Diego, can you confirm that the TCP behavior is correct with the patch? I'd like to compile the code with my patch, but want to make sure that nothing else breaks. Thank you!

pkulchenko added a commit to pkulchenko/ZeroBraneStudio that referenced this pull request Sep 27, 2013
@diegonehab
Copy link
Contributor

I need to get a Windows machine up and running to test this. But the only
change in that send_ function was because of UDP. The old code worked fine
in TCP. I'll get a patch that fixes both of them as soon as I can.

On Tue, Sep 24, 2013 at 8:59 PM, Paul Kulchenko notifications@github.comwrote:

Hi Diego, can you confirm that the TCP behavior is correct with the patch?
I'd like to compile the code with my patch, but want to make sure that
nothing else breaks. Thank you!


Reply to this email directly or view it on GitHubhttps://github.com//pull/81#issuecomment-25052869
.

Copy link
Member

@alerque alerque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was opened before GitHub had their little "allow maintainers to update" feature. Can you merge from master to get Windows CI jobs running in this PR so we can confirm this is still correct?

@pkulchenko
Copy link
Contributor Author

@alerque, I rebased the changes against the current master and all the checks should be passing.

@alerque alerque merged commit 22b8202 into lunarmodules:master Nov 9, 2023
19 checks passed
@alerque
Copy link
Member

alerque commented Nov 9, 2023

Only one decade after submission. I'm sure there must me some slower moving projects out there.

@pkulchenko
Copy link
Contributor Author

Better later than never ;); thank you for merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants