Skip to content
This repository

Switch to nonblocking sockets on Windows #79

Closed
wants to merge 8 commits into from

3 participants

joeyadams Holger Reinhardt Johan Tibell
joeyadams

This pull request addresses issue #75: Make network I/O interruptible on Windows. It works by switching to nonblocking sockets on Windows, and calling select() from a forked thread with a timeout.

Please review these commits. I'm not ready for them to be merged yet, as I still want to do more testing.

My goal is to get the http-conduit testsuite to pass on Windows. Currently, it fails (with or without my changes), producing output like this:

redirect
user error (accept: can't perform accept on socket ((AF_INET,Stream,0)) in status Closed)
  - ignores large response bodies FAILED [2]
user error (accept: can't perform accept on socket ((AF_INET,Stream,0)) in status Closed)

user error (accept: can't perform accept on socket ((AF_INET,Stream,0)) in status Closed)
1) no status message works FAILED (uncaught exception)
user error (accept: can't perform accept on socket ((AF_INET,Stream,0)) in status Closed)
GHC.IO.Exception.IOException (Network.Socket.ByteString.recv: failed (Unknown error))
user error (accept: can't perform accept on socket ((AF_INET,Stream,0)) in status Closed)

I suspect warp is closing sockets too soon, but I haven't found the root cause yet.

added some commits December 14, 2012
joeyadams Add SOCKET type alias, and use it in FFI signatures
Winsock doesn't use int for file descriptors, it uses SOCKET,
defined as:

    typedef UINT_PTR SOCKET;

This commit does not fix the incorrect use of CInt throughout the
package (for compatibility reasons), but it should make it easier to
do so in the future.
6e2428c
joeyadams Add Network.Socket.WinSelect (internal module) 0504fe2
joeyadams Add sockWaitRead/sockWaitWrite, and use it in wait guards d349cc7
joeyadams Implement retry logic in throwSocketErrorIfMinus1RetryMayBlock for Wi…
…ndows

Also, add throwSocketErrorIfMinus1RetryNoBlock.  This will be used to implement
RawIO.readNonBlocking and rawIO.writeNonBlocking
(RawIO is defined in GHC.IO.Device).
e294521
joeyadams Use non-blocking sockets on Windows 9561a22
Holger Reinhardt

I think I've found the bug that leads to all those "can't perform accept" errors: yesodweb/wai#129

added some commits December 21, 2012
joeyadams Make sure all entry points check for WSANOTINITIALISED on Windows
 * Update documentation of withSocketsDo:
   "the network package calls it automatically when needed".
   This documentation assumes that these changes will bump the
   network package to version 2.5

 * Protect initWinSock and shutdownWinSock with a global MVar lock,
   to make automatic withSocketsDo safe in a concurrent setting.

 * Add throwSocketErrorIfRetry

 * cbits/winSockErr.c: add WSA_NOT_ENOUGH_MEMORY,
   for getaddrinfo/getnameinfo

 * On Windows, use throwSocketErrorIfRetry for getaddrinfo/getnameinfo

   This is recommended by the MSDN documentation, since gai_strerror is
   not thread safe.  This also calls withSocketsDo if necessary.

 * Make the sockWait funcs call withSocketsDo if needed, in case a
   user uses them for sockets not created by the network package.

 * Check for WSANOTINITIALISED in SockFD's IODevice.ready

 * Use throwSelectError_ in connect's select1 call
8245d8a
joeyadams Fix hClose for SockFD
If we use Network.Socket.close, we get this error:

    <socket: 704>: hClose: user error (close: converted to a Handle, use hClose instead)

Whoops.  SockFD shouldn't call any socket functions that check the MVar.
That, or the user should just use hClose instead of hClose :-)

Also, don't use closeFdWith on Windows.  closeFdWith is currently a no-op
on Windows, but that could change in the future.
e1e3d59
Holger Reinhardt

I've tested your code and found a problem with Handles. Testcase here: https://gist.github.com/4359118

The error message I get is:

hPutBuf: invalid argument (non-positive length)

The error message comes from the RawIO instance for SockFD:

instance GHC.IO.Device.RawIO SockFD where
    read sfd ptr len
      | len <= 0  = ioError (mkInvalidRecvArgError loc)

    [...]

    write sfd ptr0 len0
      | len0 <= 0 = ioError (mkInvalidSendArgError loc)

It seems that some GHC library calls the write function with len0=0. If I fix the RawIO instance to allow lengths of 0 I get another error: "Todo: hPutBuf", which seems to come from GHC.IO.Handle.Text.writeChunk.
writeChunk tries to cast the SockFD to an FD which obviously fails.

joeyadams

Ugh, I was afraid this would happen. It looks like we'll have to fix this in base, and rope off SockFD with #ifdefs if the version of base is too old. For older base, we'll set the socket to blocking mode and use fdToHandle'.

Thanks for the testcase, @hreinhardt. I'll submit a GHC ticket demonstrating the problem with a dummy IODevice.

joeyadams

This was supposed to already be fixed: http://hackage.haskell.org/trac/ghc/ticket/4144 But, as your testcase demonstrates, it wasn't fixed all the way.

Johan Tibell
Owner

What's the status of this? Is it ready to be reviewed?

joeyadams

@tibbe: Everything but Handle support is ready to review. I'm still waiting for http://hackage.haskell.org/trac/ghc/ticket/4144 to be resolved. Afterward, I'll update this patch to rope off SockFD if base is too old.

Johan Tibell
Owner

Is the idea that you would conditionally define type SOCKET to be something different on Windows in a future change?

Right. For now, using CInt for compatibility.

Johan Tibell

Please document what this function does.

Johan Tibell

mask_ is scary. Please document exactly what's going on here and why it's safe.

Johan Tibell

Since this is inside mask_, how can we get an exception? Please document.

Johan Tibell

Does that mean that any called must wrap the call in mask_?

Johan Tibell

Please document this data type an its fields. What's going on with () -> TimeoutStep?

Johan Tibell

Why 60000? Why 60?

Johan Tibell

Document. Ditto for any other top-level entity.

Johan Tibell

Don't export things defined in standard modules. If you really need a <>, define it in a .Compat module.

Johan Tibell

Why can't this be done in Haskell using pokes? It looks like most things, if not all, things in this file could be done in Haskell.

Johan Tibell
Owner

This is really subtle stuff. We need a test that shows that interrupting stuff works on Windows with these changes.

Johan Tibell

Users might close sockets from other threads so I don't know if this will work. The I/O manager explicitly solves this problem on *nix, using a lock.

Johan Tibell
Owner

There's a lot going on in this change, which scares me a bit. We should try to make sure we affect the non-Windows code as little as possible.

Johan Tibell
Owner

Perhaps a good start would be to do a write-up in an email on how this is supposed to work from a bit higher-level.

joeyadams

Thanks for the review. I may have to put this project on hold for a while, to focus on school and finish a well-overdue contract.

In response to the review comments:

  • mask does not prevent interruptible operations (e.g. takeMVar, threadDelay) from being interrupted. I added a commit clarifying that with a couple comments.

  • TimeoutStep is indeed a bit of a mess. I'd like to use a lazy list here, but I'm afraid it'll end up being saved and causing a memory leak.

  • closeFdWith is already broken. If an operation starts to use an Fd and another thread closes it, one possible interleaving is that the Fd is closed, another file is allocated with the same descriptor number, then the first thread accesses the Fd, meaning it hits the wrong file.

    Accessors should instead take a lock that is not identified by an Fd, but is guaranteed to be unique by the runtime system (e.g. an MVar). We sort of do this already (close takes the lock before proceeding), but we don't do it for recv and send. We would need two locks, to allow simultaneous recv and send (this is one reason I wanted to make Socket a record, so we can make these kinds of changes while making it possible for existing code to adapt).

  • I initialize the Select1Data in C for a few reasons:

    • It was easier to code, for me at least.

    • It's safer. hsc2hs #poke emits a Storable b => Ptr a -> b -> IO (), meaning there's no check to ensure that we are initializing with the right type. By doing it in C, we can check that we're initializing tv_sec and tv_usec with the right types.

  • I've been using the testsuites of http-conduit and warp to hunt down issues. I'd rather wait until I can get those to pass before proceeding to write our own.

This bug is a bear to fix. To do it properly will break some programs on all platforms.

I guess we can close this pull request for now. I may write a blog post documenting the many challenges of making I/O interruptible on Windows.

Johan Tibell
Owner

I'll close this for now. Feel free to submit a new pull request if you want to pick this up again. Thanks.

Johan Tibell tibbe closed this October 08, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Showing 8 unique commits by 1 author.

Dec 20, 2012
joeyadams Add SOCKET type alias, and use it in FFI signatures
Winsock doesn't use int for file descriptors, it uses SOCKET,
defined as:

    typedef UINT_PTR SOCKET;

This commit does not fix the incorrect use of CInt throughout the
package (for compatibility reasons), but it should make it easier to
do so in the future.
6e2428c
joeyadams Add Network.Socket.WinSelect (internal module) 0504fe2
joeyadams Add sockWaitRead/sockWaitWrite, and use it in wait guards d349cc7
joeyadams Implement retry logic in throwSocketErrorIfMinus1RetryMayBlock for Wi…
…ndows

Also, add throwSocketErrorIfMinus1RetryNoBlock.  This will be used to implement
RawIO.readNonBlocking and rawIO.writeNonBlocking
(RawIO is defined in GHC.IO.Device).
e294521
joeyadams Use non-blocking sockets on Windows 9561a22
Dec 21, 2012
joeyadams Make sure all entry points check for WSANOTINITIALISED on Windows
 * Update documentation of withSocketsDo:
   "the network package calls it automatically when needed".
   This documentation assumes that these changes will bump the
   network package to version 2.5

 * Protect initWinSock and shutdownWinSock with a global MVar lock,
   to make automatic withSocketsDo safe in a concurrent setting.

 * Add throwSocketErrorIfRetry

 * cbits/winSockErr.c: add WSA_NOT_ENOUGH_MEMORY,
   for getaddrinfo/getnameinfo

 * On Windows, use throwSocketErrorIfRetry for getaddrinfo/getnameinfo

   This is recommended by the MSDN documentation, since gai_strerror is
   not thread safe.  This also calls withSocketsDo if necessary.

 * Make the sockWait funcs call withSocketsDo if needed, in case a
   user uses them for sockets not created by the network package.

 * Check for WSANOTINITIALISED in SockFD's IODevice.ready

 * Use throwSelectError_ in connect's select1 call
8245d8a
joeyadams Fix hClose for SockFD
If we use Network.Socket.close, we get this error:

    <socket: 704>: hClose: user error (close: converted to a Handle, use hClose instead)

Whoops.  SockFD shouldn't call any socket functions that check the MVar.
That, or the user should just use hClose instead of hClose :-)

Also, don't use closeFdWith on Windows.  closeFdWith is currently a no-op
on Windows, but that could change in the future.
e1e3d59
Jan 10, 2013
joeyadams Network.Socket.WinSelect: clarify a few things re async exceptions 9b860a5
Something went wrong with that request. Please try again.