Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Unix.descr_of_fd and Unix.descr_of_os #1990

Open
wants to merge 1 commit into
base: trunk
from

Conversation

@dra27
Copy link
Contributor

dra27 commented Aug 14, 2018

There are situations on Unix where a program knows that it will be invoked with certain known-fds connected (for example, status reporting in daemons). At present, the only way to obtain a Unix.file_descr for these is (Obj.magic (fd : int) : Unix.file_descr) which is not lovely.

This GPR adds Unix.descr_of_fd. This function is not terribly useful for Windows, since fds are an emulation provided by the C runtime, not a core concept. I've therefore also added Unix.descr_of_os which takes nativeint instead. On Unix, these functions are essentially the same, but on Windows this exposes the old win_alloc_handle_or_socket function.

Various notes at this stage:

  • @stedolan makes the valid point that mucking around with FDs in both C and OCaml can lead to weirdness - in other words, it should really be the case that either C or OCaml is viewed as "owning" an fd. This is part of the reason for my choice of _of_fd rather than _of_int (it makes the name a little confusing and increases the chance that the user might have read the comment in the docs...)
  • There's no reason to call Unix.descr_of_os on Unix which is why I've made no effort to validate the nativeint as being within the range of int. I'm tempted, given that there is no Unix system out there supporting either negative fds or fds larger than a 2^30 - 1 to state that the behaviour is undefined for nativeints outside the range of an int.
  • If this is accepted, I will add a test before it's merged.
should not be used from C. If access to the fd is also required in C, it is
better to use [dup] and maintain separate fds for C code and OCaml code
(this does not apply to C stub functions which should observe all the normal
caution of interacting with OCaml's Unix library. *)

This comment has been minimized.

Copy link
@Gbury

Gbury Aug 14, 2018

Contributor

I think there's a missing closing parenthesis here.

This comment has been minimized.

Copy link
@dra27

dra27 Aug 14, 2018

Author Contributor

So there is, thanks!

@nojb

This comment has been minimized.

Copy link
Contributor

nojb commented Aug 14, 2018

  • Personally, I would label these functions "unsafe", similar to unsafe_get and the like.
  • Sorry for the basic question, but can Windows HANDLEs have more than 32-bits ?
@dra27

This comment has been minimized.

Copy link
Contributor Author

dra27 commented Aug 14, 2018

@nojb - there's "nothing" unsafe about them, though? Elsewhere, I think there's always a "safe" version (or practice) for an "unsafe" function? Surely these are the safe version of Obj.magic 😉

HANDLE is always the same size as a pointer, hence using nativeint

@pmetzger

This comment has been minimized.

Copy link
Member

pmetzger commented Aug 15, 2018

They're memory safe. If you are on Unix at least and you try to read or write from a non-existent fd all that happens is you error. No memory is leaked or accidentally stepped on. What is true is that the functions are potentially dangerous in other ways (you could accidentally step on the wrong descriptor), but I'm not sure we have a good way to express that.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Aug 16, 2018

I designed the first version of the Unix library around 1992, for Caml Light. Since then, every other year someone suggests adding conversions between integers and the type of file descriptors, and every other year I explain that, no, file descriptors are not integers, because it doesn't make sense to add or multiply two FDs; file descriptors are best kept entirely abstract and obtainable only by opening files or creating pipes or sockets, or as stdin/stdout/stderr.

I haven't changed my mind. Put those functions in a 3rd-party library if you want, or continue using Obj.magic if you want, but I think those functions do not belong to the OCaml Unix library because they are not in the spirit of this library.

@dra27

This comment has been minimized.

Copy link
Contributor Author

dra27 commented Aug 16, 2018

@xavierleroy - OK, I find MPR#6948 from 2015. I'm not personally convinced of the pragmatism of the idea - however much fds may not behave like integers, the fundamental handle is still represented using one on both Unix and Windows, so it rather closes any communication with the outside world. I do wholeheartedly agree that the library should not encourage creation of these kinds of protocol, which the other functions suggested in that MPR lead to.

Especially given the portability disaster it creates, it seems totally crazy highly unfortunate to have anything which essentially recommends using Obj.magic. What would your opinion be to moving these officially in to the C API (cf. MPR#4589. At least then an external library would be adding external descr_of_os : int -> file_descr = "unix_descr_of_fd" which would be type-safe on both Unix and Windows? (incidentally, I dispute your wording in MPR#6948 - converting fd numbers to file_descr's on Windows does make sense, it's just unlikely to be useful without some weird contortions probably involving OCaml being used in a program written in C)

@pmetzger

This comment has been minimized.

Copy link
Member

pmetzger commented Aug 16, 2018

file descriptors are best kept entirely abstract and obtainable only by opening files or creating pipes or sockets, or as stdin/stdout/stderr.

The problem is that there are daemon spawning tools that expect to pass particular open descriptors in to the daemons they have opened, which the daemon is expected to know are open when it starts, so stdin/stdout/stderr aren't always sufficient.

@dra27

This comment has been minimized.

Copy link
Contributor Author

dra27 commented Aug 16, 2018

Sorry, I've re-read what I put above, and while it was most certainly wasn't intended "ad hominem", I dislike the way "totally crazy" reads above.

@dra27

This comment has been minimized.

Copy link
Contributor Author

dra27 commented Aug 16, 2018

@pmetzger - while I don't agree with it overall, I take Xavier's point and the daemon example is the only one which springs to mind where there's definitely no workaround (for example, in many inter-process contexts, you'd much more readily expect all processes to be OCaml, so they'll either "know" the fds by virtue of fork or Marshal can be used if still necessary to recover the actual descriptor value).

I'm much more interested to know whether a C API, or primitives not actually declared in unix.mli would be OK for satisfying the niche cases.

@pmetzger

This comment has been minimized.

Copy link
Member

pmetzger commented Aug 16, 2018

I think it is indeed the case that this is only an issue for situations where your process is started up with some subset of file descriptors beyond the normal stdin/stdout/stderr (0, 1, and 2) already open. I've been trying to think of other situations, but in all of them that I've reasoned through, you can arrange the ocaml primitives wrapping the system calls to do the right thing.

@damiendoligez

This comment has been minimized.

Copy link
Member

damiendoligez commented Sep 25, 2018

I remember some Unix systems where it was customary to pass the user's tty (if any) as FD number 3.

I also remember writing some scripts where I redirected FD number 4 to stderr so that an inner process could redirect its stderr to it while some intermediate processes had their stderr redirected elsewhere, but these scripts should probably have been written in some other language anyway.

@pmetzger

This comment has been minimized.

Copy link
Member

pmetzger commented Sep 25, 2018

It occurs to me that if you pass an FD via a socket you also need a mechanism like this (or another one that did the same thing somehow.) I'm curious if @xavierleroy has ideas on how to have a cleaner way to do such things.

@dra27

This comment has been minimized.

Copy link
Contributor Author

dra27 commented Sep 25, 2018

@pmetzger - I don't think you need to break the abstraction on file descriptors for that - you just need the sendmsg function to be added. IIRC, like ioctl, that was difficult to imagine when the Unix library was originally written, but could now be solved quite neatly with a GADT.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Oct 16, 2019

One year and a few months later, I still "think those functions do not belong to the OCaml Unix library because they are not in the spirit of this library". Can I close this PR without ruffling too many feathers, or should we fight about it?

@dra27

This comment has been minimized.

Copy link
Contributor Author

dra27 commented Oct 16, 2019

@xavierleroy - we should not fight! As per my #1990 (comment), can these functions instead be officially part of the C API (and be documented as such), either here as actual C primitives (which could be used in an external in a third party library) or as official C functions which an external library could expect to be present?

I do think there is a case for having an official API which exposes the difference at a C level between the file_descr type on Windows vs "Unix"?

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Oct 17, 2019

I'm not sure I interpret the two questions correctly, but let me try to answer.

  • If we are talking about C functions to manipulate OCaml's Unix.filedescr type from C code: there is already a good API in the Win32 version of unixsupport.h, correctly distinguishing between HANDLE, SOCKET, and CRT file descriptors. Maybe, by symmetry, you'd like some functions or macros in the Unix version of unixsupport.h that wrap Val_int and Int_val to make it clear how to build and use Unix.filedescr values in C stub code. Maybe some documentation comments too. This I can agree with.

  • If we are talking about C functions intended to be called from OCaml as external functions, I still disagree. The only sensible use case that was given in the discussion is the server infrastructure or script infrastructure that gives special meaning to file descriptors 3 or 4. This can easily be handled in an OCaml program by opening /dev/fd/3 or /dev/fd/4. (According to Stevens, "The /dev/fd feature was developed by Tom Duff and appeared in the 8th Edition of the Research Unix System. It is supported by SVR4 and 4.3+BSD.") Make it clear that your code is accessing a special file! There is no need to violate the abstraction of Unix.filedescr. Existing code that uses Obj.magic to break this abstraction is wrong but would remain wrong if we gave it a filedescr_of_int primitive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.