Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Dealing with raw file descriptors #2729

Closed
piscisaureus opened this Issue Feb 9, 2012 · 32 comments

Comments

Projects
None yet
Member

piscisaureus commented Feb 9, 2012

People have been complaining about the fact that functionality for working with raw file descriptors got removed / deprecated in node 0.6. Examples are the net.listenFD method and the customFDs option for child_process.spawn.

The purpose of this issue to track (valid) use cases that cannot be solved easily.

So far, I've seen these problems:

Owner

bnoordhuis commented Feb 9, 2012

Need to listen() on FD 4 to interact with systemd

Reasonable.

Need to have duplex stdout (#1940) to work with qmail_queue

I'm somewhat reluctant to add a hack for a single program. This particular issue could be solved with a 30 line native add-on.

baudehlo commented Feb 9, 2012

Re: qmail-queue - It still required internals access to get at pipe(), which would be nice if we exposed.

To be honest the best way of "fixing" my problem with qmail-queue, without going down the wrapper route would be to get the pid, and write to /proc//fd/0 and /proc//fd/1... Assuming that would work. I haven't tried it yet. And yes I know this restricts me to Linux, but none of this stuff is portable anyway.

On the other hand, what you call a hack is just how Unix works - it's not up to you to say that a particular fd number behaves in a particular (readable or writable) way. Is there a reason making the current pipe()s into socketpair()s wouldn't fix the problem?

orlandov commented Feb 9, 2012

I object with calling this functionality a hack. Using file descriptors is
a common practice in the UNIX world, and forcing anyone who wants to use them
in a sane way from JavaScript to write their own C++ addon places an onerous
burden on the author for something that could be implemented with quality,
once, from Node core. Developing and testing C++ for correctness is not
trivial, particularly when the functionality is required to be asynchronous as
is quite often the case in Node. At Joyent we've used listenFD and customFds
with great success and it really spoke to the modularity and composeability of
Node.

Sure I could write an addon, to do that, but many others simply can't or won't
and they may just move onto a runtime where this isn't an issue. Writing a Node
addon in C++ is not the walk in the park you make it out to be. It defeats the
purpose of using Node, which is to rapidly develop an network application that
perform well. Well, file descriptors are at the heart of IPC. If I now need to
write C++ to do the most trivial of things, why wouldn't I just write the whole
thing in a low-level language and save myself the mental context switching? One
of the reasons Node became popular is it allowed us to use leverage low-level
facilities in an easy and straightforward way.

Many language runtime environments strike a good balance between cross-platform
support and at the same time allowing the salient features of the operating system to
shine through. Striving for the lowest common denominator is a sure-fire path
to mediocrity.

Owner

bnoordhuis commented Feb 9, 2012

it's not up to you to say that a particular fd number behaves in a particular (readable or writable) way

It may not be up to me but there's almost 40 years of prior art. qmail-queue is the only program I know that behaves this way. I wonder how you drive it from a shell script?

Is there a reason making the current pipe()s into socketpair()s wouldn't fix the problem?

No, but it'd require quite a few changes to libuv plus extensive regression testing. That's a lot of effort to accommodate a single contrarian program. In other words, don't hold your breath. :-)

Things I'm missing from 0.4 include:

  • server.listenFD
  • customFds
  • dup2

I have used customFds to open an interactive subprocess (a login shell) that I want to spawn from node. To do this In 0.4 I used:

child = spawn(cmd, args, {customFds: [process.stdin, process.stdout, process.stderr]});

which worked great on 0.4 but failed silently in 0.6. In this case I was able to work around the issue by changing the code to:

child = spawn(cmd, args, {customFds: [0, 1, 2]});

but as I understand it other descriptors are not supported. Qmail is a good example of a tool that needs multiple descriptors (qmail-send for example needs 6 connected when it starts) but another example is gpg which can take a file descriptor for reading the passphrase as an argument. See this:

for an example of someone doing this same thing with python. There are many more programs that might need an additional arbitrary descriptors open. My understanding is that with node 0.6 there's no way to do this short of writing an add-on.

In general I think there's really no reason that node shouldn't be able to let me create a child process and connect arbitrary file descriptors which I may have opened from a file or another subprocess or whatever to my new child. Python is a good example since it is also cross-platform and works on Windows.

Python uses file descriptors many places so that developers who need to use the system on which they are running are able to. I think not exposing the system (file descriptors is just one example) to developers unless they write add-ons is
a mistake. As someone who writes primarily system level software in node, I really want the system exposed to me. I've already found myself not using node when node doesn't expose some system facility I need (I've tried the C++/add-on route, and it's not worth it to me).

Owner

bnoordhuis commented Feb 9, 2012

@orlandov: Are we talking about the same thing? I don't object to reintroducing listenFD() (the general concept anyway, maybe not that particular API) but I question the usefulness of full-duplex stdio for the reasons stipulated above.

orlandov commented Feb 9, 2012

I'm talking about the broad trend I have noticed where features are being removed from Node because feature parity cannot be achieved with Windows. I understand what you are saying, but it's not for us to say what is or isn't useful for software to be able to do, particularly when users are trying to interface with third party systems. As noted above, there are many valid use cases.

I wonder how you drive it from a shell script?

In most shells you can redirect arbitrary file descriptors. Eg. with bash this is very simple:


turbo:~ joshw$ cat test.c 
#include <stdio.h>

int main()
{
    char string0[100], string1[100];
    FILE *fd0, *fd1;

    fd0 = fdopen(0, "r");
    fd1 = fdopen(1, "r");

    fgets(string0, 100, fd0);
    fgets(string1, 100, fd1);

    fprintf(stderr, "fd0: %s", string0);
    fprintf(stderr, "fd1: %s", string1);

    fclose(fd0);
    fclose(fd1);
}
turbo:~ joshw$ gcc test.c
turbo:~ joshw$ cat file0
hello0
turbo:~ joshw$ cat file1
hello1
turbo:~ joshw$ cat test.sh
#!/bin/bash

./a.out 1<file1 0<file0
turbo:~ joshw$ bash test.sh
fd0: hello0
fd1: hello1
turbo:~ joshw$ 
@ghost

ghost commented Feb 10, 2012

How about encapsulating the concept of fds/handles as js objects in node. Streams are abstractions for interacting with external IO sources. Handles can be abstractions for transportable references to those resources, however that works on a given OS. Node would know how to instantiate a Handle from a system input, know how to consume it for a given operation or report an error if it's not appropriate, and how to serialize it back into something the system and external programs can use.

Member

piscisaureus commented Feb 10, 2012

@joshwilsdon

About customFDs: we recognize the need to make child processes inherit their parent's stdio. That's why we allow customFDs 0, 1 and 2. About dup2: You are not missing it from node4 since node4 didn't have dup2. Also you are not providing a use case.

Member

piscisaureus commented Feb 10, 2012

People, please: I was trying to track valid situations where access to raw file descriptors is needed in node. Please do not tell me what's removed since 0.4 (we know what we removed) or how much you're in love with unix.

Quoting myself:

The purpose of this issue to track (valid) use cases that cannot be solved easily.

Member

piscisaureus commented Feb 10, 2012

http://github.com/mcavage/node-zsock

This should probably be done without relying on listenFD, e.g. the compiled addon should just create a StreamWrap.

Another use case is by connecting processes and files via unnamed pipes with customFds. It avoids unnecessary memcpy'ing via Stream#pipe, and it's also unbuffered.

polotek commented Feb 10, 2012

@piscisaureus I think you are getting valid input. But you are also seeing people's frustrations with what is happening with node lately. I think you and other core guys are missing the forest for the trees. It's great that you guys are asking for feedback. But part of the feedback you are getting is that your community is not happy with the way these requests are being treated.

This isn't about supporting every random edge case. It's about people trying to utilize the expected behavior of the OS like they always do, and node is hiding that functionality or breaking those expectations. I know this has gotten more complicated since Windows came into the picture. But you guys have to know that unix/linux deployment is still going to be the vast majority for node. Please don't act like that means nothing.

You're obviously right on dup2. I must have thought it was there because it's something I'd expect to be there. I must have seen it in an add-on.

I gave the use-case of gpg to go along with the qmail-queue example already described. What sort of example are you looking for? Another program that needs to be able to use a file descriptor other than just reading on 0 and writing to 1 and 2? Or some node code that forks a process and connects that process's stdout to the new child's stdin?

I'm surprised adding support for using file descriptors in the natural way is so controversial. As you've pointed out there are "40 years of prior art" of people using file descriptors.

How about encapsulating the concept of fds/handles as js objects in node.

+1 for taking a similar approach as Perl and its very successful portability

Owner

bnoordhuis commented Feb 10, 2012

Please take it to the mailing list, guys. The issue tracker is not the place for this.

Umm, we were kind of asked to comment here... But I don't mind doing it in the other place too.

Honestly though it feels like we've given N valid reasons for wanting this feature, and the core devs still want to say no, which is a bit frustrating to say the least.

@piscisaureus Node does I/O, event-driven I/O. Great. Unix abstracts I/O through file descriptors. If you don't provide access to file descriptors, then what will you provide? Abstractions around every distinct use of file descriptors? Madness lies that way. And it's not just Unix. Windows too has file descriptors (well, file handles).

But let's ask instead what the problem is with providing a reasonable abstraction for file descriptors/handles, and let's also look at what a reasonable abstraction would be. I think dealing with file descriptor numbers is not important most of the time (except when doing I/O redirection). That means that having an object interface to descriptors would be just fine.

polotek commented Feb 10, 2012

@bnoordhuis This discussion started on the mailing list. I thought this issue was an indication that you wanted to address these concerns. But so far you've used this issue as a way to collect and summarily dismiss almost all the use cases. Now want to restrict people's comments based on some vague notion of what's "appropriate" for github? Some comments here are a little confrontational, but still constructive and firmly on topic. It feels pretty clear that you don't want to be having this discussion at all. I'm just waiting for your infamous, patented "No, this is closed".

Owner

bnoordhuis commented Feb 10, 2012

@nicowilliams: Mailing list.

Owner

bnoordhuis commented Feb 10, 2012

@polotek: This is an issue to track progress, not a bike shed. I confess I'm getting slightly piqued.

isaacs commented Feb 10, 2012

The intent of this issue was originally to try to find unaddressed use cases that we could track. Some of those we may decide are just outside the scope of node, but I think the goal here should be to figure what they are first.

I agree with @bnoordhuis that the more abstract discussions, and especially the explorations of solutions, should probably be taken to the mailing list or addressed elsewhere. However, the original request for information was not articulated as clearly as it could have been (especially by me, in a tweet), and so in retrospect, the resulting exploration should probably have been expected.

I would also like to address the "least-common-denominator/windows resentment" issue elsewhere. It's important, but this isn't the place. Look for a mailing list post coming soon.

So, let's move forward here. Talking about a solution is pointless until we have a good handle (ha) on the problem.

To recap what we have so far:

  1. Sharing stdio between parent and child (addressed with current customFds: [0,1,2] and future stdio: 'inherit')
  2. Providing stream objects for child proc stdio (addressed with current customFds:[-1,-1,-1] and future stdio: 'stream')
  3. Piping child proc stdio to /dev/null (not addressed currently, addressed with future stdio: 'null')
  4. Opening FD 4 for systemd. (not addressed)
  5. Providing an arbitrary FD for gpg to read a passphrase on (not addressed)
  6. qmail-queue, which needs 6 file descriptors opened for it and attached at startup time. (not addressed, perhaps uncommon enough to justify requiring a binary addon or some other "extra" hoop-jumping rather than make the API handle it)
  7. Node's internal use of fd 3 for the message passing to forked children. (Doesn't actually require arbitrary fd handling, but is relevant to stdio stuff.)

Is there anything else that belongs on this list?

Not to contradict what I just said and start bikeshedding solutions, but any solution to this issue should have the following characteristics (along with the usual rubriks of parsimony, readability, etc.)

  1. Extremely simple for the most common use cases, to the point of guessing the usually-right behavior as a default most of the time. (Especially, 1-3 above.)
  2. Support the things that unix users are used to being able to do, and as much as possible, work properly on Windows.
  3. Fail immediately with ENOTSUP whenever portability is truly impossible - but this is a last resort.
  4. Performance regressions will not be tolerated.

To the degree that 1 and 2 are maximized, and 3 is minimized, a solution will be considered good.

There will be very little need for bikeshedding if we can first establish what we're trying to actually do. If there are use cases that aren't mentioned already, please mention them.

Member

mikeal commented Feb 10, 2012

I'm going to chime in here.

GitHub is an amazing communication system because all of the communication it facilitates is in a common language of contribution. It lend itself incredibly well to that. However, it does breaks down tremendously when the communication strays away from what we can described in terms of contribution.

Asking for use cases is something accomplished much better on the mailing list. The kind of feedback you want, while not about code in core directly, would be better suited for the node-dev list. While the node user mailing list is a breeding ground for bike shedding the dev list has stayed quite focused and this sort of discussion is well suited for it.

Even if the conversation strays from use cases on the mailing list it's much easier to ignore that feedback when everyone who follows node isn't getting notifications about it.

How do other languages handle the ability to do such things? Specifically: lua, python, ruby?

In Ruby you can $stdout.reopen to dup2, and you can fork() and exec(), so you can easily control what filehandles are doing in a child process. Similarly for Perl.

fork() is evil; use spawn() (posix_spawn())

I recently wrote some functionality to phantomjs in order to capture screenshots of a website through nodejs. Previously the only way to do it was to have phantomjs save a file to disk, then read than file and later delete it. I updated it so that it could use a custom file descriptor to directly pass the image buffer to nodejs. Unfortunately the latest version of node doesn't seem to support the ability to communicate with a process through a custom file descriptor. I can't just use stdout either, because other stuff such as debug messages get output on that file.

isaacs commented Feb 15, 2012

@wallslide Where does the custom FD show up as in the child? 3, 4? Or as stdout or stderr?

It's a user-defineable parameter, however it is best to use an FD that doesn't correspond to the standard 0-2, therefore 3 or higher is best. This is because the goal is to pass a buffer holding an image, and if we use stdout or stderr to do so, there's a high potential for messages from upstream in the project (debug messages, error messages, standard console.log messages) to be interleaved in with the binary image buffer I'm trying to capture, therefore making it useless. The cmd line usage is as below:

phantomjs renderfd.js fileDescriptorNumber urlToRender fileType

Therefore to render the google.com webpage as a PNG, output that image to an FD of 3, and save it as google.png the command would be:

phantomjs renderfd.js 3 http://www.google.com PNG 3> google.png

Owner

bnoordhuis commented Jul 28, 2012

Closing, this has been implemented in 0.8.

@bnoordhuis bnoordhuis closed this Jul 28, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment