Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: Exception raised when reading from a socket #5578

Closed
vicuna opened this Issue Apr 6, 2012 · 23 comments

Comments

Projects
None yet
1 participant
@vicuna
Copy link
Collaborator

vicuna commented Apr 6, 2012

Original bug ID: 5578
Reporter: @alainfrisch
Status: closed (set by @xavierleroy on 2015-07-24T08:38:58Z)
Resolution: fixed
Priority: normal
Severity: major
Fixed in version: 4.00.0
Category: platform support (windows, cross-compilation, etc)
Related to: #4466 #5325
Monitored by: @dra27

Bug description

In recent versions of the trunk, the following piece of code raises an exception on input_line:

let () =
let sock = Unix.socket Unix.PF_INET Unix.SOCK_STREAM 0 in
Unix.setsockopt sock Unix.SO_REUSEADDR true;
Unix.bind sock (Unix.ADDR_INET (Unix.inet_addr_any, 8090));
Unix.listen sock 10;
let (client, _) = Unix.accept sock in
Unix.set_close_on_exec client;
let ic = Unix.in_channel_of_descr client in
print_endline (input_line ic)

(compiled with: ocamlc -custom -o test_http.exe unix.cma test_http.ml
to trigger the error, just connect to 127.0.0.1:8090 (with telnet, a browser, ...)

The exception is:

Fatal error: exception Sys_error("Invalid argument")

I've tested this with the trunk (msvc and mingw ports). I'm pretty sure this used to work with previous versions.

Replacing input_line with Unix.recv seems to work.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 6, 2012

Comment author: gerd

As far as I remember, the issue always existed - the Ocaml runtime does not make an attempt to recv instead of read from a socket. So it depends only on Windows whether this works. What I can imagine is that some (newer) Windows versions support it, and older don't.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 7, 2012

Comment author: @xavierleroy

Could be related to the changes for #5325.

For the record: under Win32, Unix.file_descr values carry a "socket" / "not a socket" flag, which enables functions such as Unix.read to adapt their behavior, calling ReadFile for non-sockets and recv for sockets.

The buffered I/O system (types in_channel and out_channel), however, calls into the read() and write() functions of the C library, which Microsoft's C runtime library implement using ReadFile and WriteFile, without making a special case for sockets.

So, the issue boils down to whether ReadFile works correctly when given a socket handle. The answer seems to be "yes" on modern Windows versions (it was "no" under Windows 95/98), provided the socket is in synchronous mode. And that's where the changes described in #5325 could be problematic.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 7, 2012

Comment author: @dra27

Are the changes in #5325 definitely the problem? That was about ensuring that the socket was definitely in synchronous mode? I too have seen this with my Win32 patched version of 3.12.1 and it definitely did work in previous versions of OCaml (I only hadn't raised a bug report because I hadn't had a chance yet to try it on a "clean" 3.12.1 installation).

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 7, 2012

Comment author: @xavierleroy

My understanding is:

  • for Alain Frisch's example to work, the socket must be in synchronous mode (cf. MSDN doc for ReadFile())
  • we used to have SO_OPENTYPE hackery in win32unix/, probably to ensure that all sockets are created in synchronous mode;
  • the hackery was removed as part of fixing #5325, and now all sockets are created in the default asynchronous mode (cf. MSDN page for socket()).

All this is guesswork to be confirmed by experiments. Any experimental data is most welcome.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 10, 2012

Comment author: @alainfrisch

I confirm that reverting socket.c and accept.c to r11965 fixes the current issue.

Does the current Unix interface make it possible to put a socket in synchronous mode?

Would it make sense to keep the socket flag in the buffered I/O system channel structure?

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 10, 2012

Comment author: @xavierleroy

Does the current Unix interface make it possible to put a socket in synchronous mode?

AFAIK, Win32 provides no way to change the synchronous/asynchronous mode of a socket or a file handle after it's been created. This mode can only be specified when the socket is created or the file is opened.

Would it make sense to keep the socket flag in the buffered I/O system channel structure?

By itself it wouldn't help because the buffered I/O code is shared between Unix and Win32 and uses read() and write() for I/O, which Microsoft's C runtime implements using ReadFile() and WriteFile(). At a minimum you'd have to #ifdef in byterun/io.c and improve on MSVCRT's read()/write() emulation.

At this point, I'm tempted to revert #5325 to fix this #5578 bug, on the ground that #5325 is less critical than #5578.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 10, 2012

Comment author: @alainfrisch

This mode can only be specified when the socket is created or the file is opened.

Just out of curiosity, is there even a clean way to specify this mode, e.g. with "accept"? The current approach looks like a (thread-unsafe, if not protected) hack.

At this point, I'm tempted to revert 0005325 to fix this 0005578 bug, on the ground that 0005325 is less critical than 0005578.

I agree. (Disclaimer: I'm strongly biased, we don't do multi-threading.)

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 10, 2012

Comment author: gerd

When I understand it correctly, ReadFile behaves differently when it gets an async file handle (like a socket - unless we set the socket to synchronous). I guess the "Invalid argument" is due to the missing OVERLAPPED structure ReadFile needs to process async handles.

Wouldn't be the clean solution to just handle async reads? This KB article explains how to wait for the completion of an async request: http://support.microsoft.com/kb/156932/en-us . We only need to call GetOverlappedResult. This would be simple enough (if it really worked).

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 10, 2012

Comment author: @alainfrisch

the missing OVERLAPPED structure ReadFile needs to process async handles.

The I/O code in byterun/ relies on libc read/write functions, which probably calls ReadFile/WriteFile without an OVERLAPPED structure. As Xavier wrote, a proper solution might indeed be to call directly the Win32 functions, which would allow us to pass do whatever is the right solution under Windows. This is a non-trivial change. Now we just need a volunteer to propose and test a patch! Xavier, can you confirm that such an approach would have chances to be accepted?

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 11, 2012

Comment author: @dra27

Alain - yes, there is a clean way. SO_OPENTYPE is deprecated - WSASocket should be used instead with a value of 0 for dwFlags (i.e. WSA_FLAG_OVERLAPPED not included). I haven't tried it, but looking at the documentation you should then use AcceptEx instead of accept/WSAAccept as that allows you to create the socket yourself (so you can create a non-overlapped socket). It doesn't look like WSAAccept can do that directly.

Might an alternative fix be to include either extra functions or add an optional parameter to the Win32 Unix modules accept and socket functions so either accept_async and socket_async - which create synchronous but threadable sockets (or Unix.socket: ?async:bool -> Unix.socket_domain -> Unix.socket_type -> int etc., with the default being false).

But, I'd be happy to have a stab at "fixing" io.c - it looks to me as though caml_do_read and do_write are the only two functions requiring the extra work? Wrapped in #ifdef WINDOWS, if errno is EINVAL then we want to attempt either recv or send instead (_get_osfhandle maps the CRT FD back to the Win32 socket handle) - both will return WSAENOTSOCK for an actual EINVAL (i.e. for an invalid FD) so we'd just need to call caml_sys_io_error in that instance.

That does mean that reading/writing through the CRT becomes even more inefficient than it was before - but accessing sockets in that way under Windows is more about compatibility because ReadFile is already a considerably slower way of reading a socket (according to MSDN). And, frankly, if one were in the business of trying to write a high performance sockets application for Windows in OCaml, you'd write your own bindings for Winsock, rather than using the Unix module...

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 11, 2012

Comment author: @alainfrisch

dra: thanks for all the info. I suggest you wait for some feedback from Xavier before writing the patch.

I agree that degrading performance when working with sockets under Windows through OCaml channels is not a big deal. But (out of curiosity, mainly), if we keep in the channel structure the fact that a filedescr is a socket, we should be able to call directly recv/send directly, right? Why would this be slower than the current situation?

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 11, 2012

Comment author: @dra27

I guess it could be done that way (it certainly wouldn't be slower, I agree) - the "problem" is that caml_do_read is used in lots of places (although a quick grep reveals that it's only in fact in io.c).

Although caml_do_read and do_write take an fd, they're in fact only ever used in the context of a channel so the signature could be safely altered to pass the channel structure instead for the Windows version. Of course, a certain amount of care is required as caml_do_read is an exported primitive.

Let's wait and see what Xavier says... it's obviously beneficial not to hurt performance where possible, but that will involve peppering io.c with more #ifdefs!

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 11, 2012

Comment author: gerd

dra: not sure whether this is the right way. We have currently two options:

(a) Stick to non-overlapped sockets, and hope that read() works
(b) Use overlapped sockets, and replace read() with ReadFile + postprocessing for the overlapped case

I see several problems with (a). Some socket functionality seems to be unavailable. MSDN mentions the socket timeouts (SO_RCVTIMEO, SO_SNDTIMEO) on the page for WSASocket. It is unclear what this means exactly. At least Unix exports these options. The other problem is the strange AcceptEx function, which is not directly available, and seems to have restrictions. "The AcceptEx function uses overlapped I/O, unlike the accept function." This sounds as if you cannot use it with non-overlapped sockets at all (but who knows). There is also a strange sentence that the accept socket can only be passed to certain functions, but the whole description is contradictory, because the page mentions other functions like getpeeraddr and CreateIoCompletionPort. All in all, the description for AcceptEx sounds as if this function is not for general use, but only for specific cases even Microsoft does not know exactly.

I think (b) would have fewer uncertainties. You would only have to get the file handle from the fd (get_osfhandle), and then call ReadFile/WriteFile directly - i.e. we develop our own versions of read/write that can deal with overlapped I/O.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 11, 2012

Comment author: @dra27

I agree - option b) is the best. It's the only way to get threading working properly (which is what was fixed in PR 5325 but has then caused this PR).

The WSASocket/AcceptEx stuff was just what "should" be being done if you wanted non-overlapped sockets using "modern" Winsock calls as MSDN lists SO_OPENTYPE as deprecated - it wasn't clear which version of accept/WSAAccept/AcceptEx is required to create non-overlapping sockets. It may just be that there's a gap in the API (Microsoft release an incomplete API: never!!)

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 18, 2012

Comment author: @alainfrisch

I mark this issue as "block", because I really think we need to make a decision before the next release.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented May 24, 2012

Comment author: @xavierleroy

Reverted patch for #5325 in version/4.00, so that the present issue is not present in the upcoming 4.00.0 release. I leave this PR open because we still need to do something about it in a later release. Downgrading the "block" status to "major" as a consequence.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented May 24, 2012

Comment author: @xavierleroy

My thoughts on how we could address this issue. It means reimplementing bits of Microsoft's C runtime library and make them better Here is a plan of action:

byterun/io.c: in do_read and do_write, #ifdef the calls to read() and
write(), replacing them by calls to caml_win32_read and
caml_win32_write if _WIN32 is defined.

byterun/win32.c: implement the caml_win32_{read,write} functions as
follows (taking read as an example):

Minimal:
call read(fd, buf, len) // from the C runtime library
if error, call recv(_get_osfhandle(fd), buf, len, 0)

Problem: if the fd is in O_TEXT mode, read() converts the text,
but recv() does not. Do we live with this or go out of our way to fix
it? Is there any use for O_TEXT mode over a socket? Can we just report an error if [set_binary_mode_in ic false] is ever called on a channel obtained from a socket? Opinions from RFC experts are welcome.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jun 6, 2013

Comment author: @alainfrisch

Xavier: do you remember what's the current status of this on the trunk (according to the note, you only reverted in version/4.00)? It seems the proper fix won't be done before the next release, and we should at least ensure that we don't introduce a regression here.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jun 14, 2013

Comment author: @alainfrisch

There seem to be some confusion here. The original patch (commit 11966) was undone by Xavier (commit 12480) on branch 4.00 only, with the hope we would reach a better solution on the trunk. This commit was imported by Damien (commit 12720) while merging changes from 4.00 onto the trunk, and he later undid it (commit 13743), because that change was not intended to be on the trunk. I've undid that last commit on 4.01 (commit 13779) so that we don't get a regression in the next release. So, to summarize: 4.01 has today the same old version (4.00 and before); trunk has still the one which fixes #5325 but breaks #5578.

I propose to undo the original patch from the trunk as well, so that we don't replay the same vaudeville again. Keeping #5325 open should be enough to remember about it, and this would reduce the risk of introducing a regression in a future release. I'll do it in a few days unless someone objects to it.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jun 14, 2013

Comment author: @alainfrisch

Pushing to 4.02 now that 4.01 is out of danger.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jul 1, 2013

Comment author: @alainfrisch

As said, commit 13860 re-undoes commit 11966 (on trunk). The trunk is now synchronized with 4.01 and free of this bug #5578.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Aug 1, 2013

Comment author: @xavierleroy

If I correctly followed the vaudeville, this PR was fixed both in 4.00 and in 4.01 even though it reappeared somewhere in between. I'm marking it resolved. The one that remains open in 5325.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Sep 17, 2013

Comment author: waern

In case this information is helpful to anyone (perhaps in making a decision on how to proceed): I did a naive test implementation of xleroy's plan of action and it seems to work fine together with the fix of #5325.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.