New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find a way to make `psub --fifo` safe from deadlock #1040

Open
geoff-codes opened this Issue Oct 14, 2013 · 20 comments

Comments

Projects
None yet
8 participants
@geoff-codes
Contributor

geoff-codes commented Oct 14, 2013

There are actually a couple of bugs here.

The easy one is hereuse_fifo is missing a sigil and is therefore a string comparison, causing psub to always act as psub -f.

Unfortunately, it's not as simple as fixing that typo as doing so will cause a non-interruptable hang under certain circumstances. I believe it occurs when the pipe buffer is exceeded? But I'm not sure how to actually determine the pipe buffer in fish. Maybe forking another process is needed? Or... something.

Anyway, here's a (hopefully) cross-platform test case (about 1.5MiB and requires Java, uses openssl to decode base64). It's a standalone wrapper for rhino. There's a big hunk of base64 in the middle of it, but the script is just:
java -jar (echo 'BIGHUNKOFBASE64' | openssl base64 -d | psub)

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Oct 14, 2013

Member

Nice diagnosis and test case!

Member

ridiculousfish commented Oct 14, 2013

Nice diagnosis and test case!

zanchey added a commit that referenced this issue Apr 28, 2014

use mktemp(1) to generate temporary file names
Fix for CVE-2014-2906.

Closes a race condition in funced which would allow execution of
arbitrary code; closes a race condition in psub which would allow
alternation of the data stream.

Note that `psub -f` does not work (#1040); a fix should be committed
separately for ease of maintenance.

Closes #1437

zanchey added a commit that referenced this issue Apr 28, 2014

use mktemp(1) to generate temporary file names
Fix for CVE-2014-2906.

Closes a race condition in funced which would allow execution of
arbitrary code; closes a race condition in psub which would allow
alternation of the data stream.

Note that `psub -f` does not work (#1040); a fix should be committed
separately for ease of maintenance.

zanchey added a commit that referenced this issue Apr 28, 2014

use mktemp(1) to generate temporary file names
Fix for CVE-2014-2906.

Closes a race condition in funced which would allow execution of
arbitrary code; closes a race condition in psub which would allow
alternation of the data stream.

Note that `psub -f` does not work (#1040); a fix should be committed
separately for ease of maintenance.

zanchey added a commit that referenced this issue Sep 30, 2014

@zanchey

This comment has been minimized.

Show comment
Hide comment
@zanchey

zanchey Sep 30, 2014

Member

I've pushed a fix to a topic branch (19217f3 on psub_fix), but I don't want to merge to master until the buffering issue is worked out.

Member

zanchey commented Sep 30, 2014

I've pushed a fix to a topic branch (19217f3 on psub_fix), but I don't want to merge to master until the buffering issue is worked out.

@zanchey zanchey modified the milestones: next-2.x, fish 2.2.0 Apr 14, 2015

faho added a commit that referenced this issue Aug 31, 2015

Revert "Fix missing variable expansion $ in psub"
That change was a bit too eager as the mkfifo route doesn't currently work.

See #1040 and #2052.

This reverts commit a17b9fd.
@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Aug 31, 2015

Member

Any progress on this?

Member

faho commented Aug 31, 2015

Any progress on this?

faho added a commit to faho/fish-shell that referenced this issue Sep 8, 2015

Revert "Fix missing variable expansion $ in psub"
That change was a bit too eager as the mkfifo route doesn't currently work.

See fish-shell#1040 and fish-shell#2052.

This reverts commit a17b9fd.
@geoff-codes

This comment has been minimized.

Show comment
Hide comment
@geoff-codes

geoff-codes Sep 24, 2015

Contributor

@faho Yes. And no. But, yes.

But before we get to the good stuff, lets review:

  1. Disregard everything in the psub man page.
    Virtually unchanged from when it was written 10 years ago for fish 1.0.0, literally (and by literally, I mean literally) every sentence in man psub is wrong (except for the example). With that out of the way...
  2. What is 'process substitution'?
    Process substitution (which itself is something of a misnomer) was a concept introduced in the Korn shell (ksh88) as a shorthand for the often messy process of using file descriptors and named pipes with a command or program when the program expects a file as an argument. In other words, it is a way to emulate including these programs in a UNIX pipeline, where one otherwise could not, either simply by design (man dd), or for some other clever reason (man tee). The syntax used by shells that support this is:
command <(process) ...

The standard output of process is fed into a file descriptor or named pipe, which is passed as an argument to command.

and

process >(command) ...

The standard input of command is read from whatever process is doing with that argument (ideally, producing output).

This syntax is by no means standard, and it certainly is not POSIX. It is supported by ksh88 and ksh93, but not pdksh or mksh; bash supports it, but not in sh mode (bash --posix). It is supported in zsh when not in an emulation mode that proscribes it; and zsh also has another syntax, =(process), which we'll get to in a bit.

The fish psub function:

command (process | psub)

(purportedly) capitalizes on the fact that no special syntax is needed to perform "process substitution"; the ordinary syntax of command substitution (command or $(command) in Bourne shells, (command) is fish) can be used to accomplish the same thing pipeline within the command substitution whose standard output culminates in a file descriptor or named pipe.

  1. So, why has this bug stayed in place since it was introduced seven years ago (c6ebb23), two years since I opened this issue, followed by about a dozen others?
    This is where it gets tricky, and there are two factors at play.

    a. Many utilities which take files as arguments (i.e., for input or output), do so for a reason. A buffered UNIX pipeline often simply is not acceptable ... blah blah blah I've been up all night hacking on this so I'll finish my blustering treatise later. 💀

Basically, I propose we use a ramdisk. It combines to the "durability", ability to handle large files and non-streamable data of "file substitution" (?) (zsh =(command), what we're doing with psub at present) with the speed and ephemeral nature of using a fd or fifo.

@ridiculousfish @zanchey @anyone @anyone @bueller... Initial thoughts?

function psub --description "Process substitution, revisited."

    set -l filename
    set -l funcname

    set -l halfmem
    set -l sectors
    set -l ramdisk
    set -l mountpoint
    set -l psubdir

    set -l use_file 0

    while set -q argv[1]
        switch $argv[1]
            case -h --help
                __fish_print_help psub
                return 0

            case -f --file
                set use_file 1
        end
        set -e argv[1]
    end

    if not status --is-command-substitution
        echo psub: Not inside of command substitution >&2
        return 1
    end

    if test -z "$TMPDIR"
        set TMPDIR /tmp
    end

    # Implemention for other systems is left as an exercise for the reader.
    if test (uname) != Darwin
        set use_file 1                            # ... mount -t tmpfs ...
    end

    if test $use_file -eq 1
        while not set psubdir (mktemp -d $TMPDIR/.psub.XXXXXXXXXX); end
        chmod 0300 $psubdir

        while not set filename (mktemp $psubdir/temp.XXXXXXXXXX); end
        chmod 0100 $psubdir
        chmod 0200 $filename
    else
        # Basically: detect memory and use 1/2 of it (the default with tmpfs
        # on other platforms) as a ramdisk. The memory is allocated as needed.
        # `hdid -nomount ram://SECTORS` (of 512 bytes) on Darwin.

        set halfmem (math (sysctl -n hw.memsize) / 2)
        set sectors (math $halfmem / 512)
        set ramdisk (hdid -nomount ram://$sectors | tr -d [:space:])
        chmod 0600 $ramdisk

        # $ramdisk is now something like '/dev/disk2'; it would be nice if we
        # could just use the raw device file as $filename, but if we do that
        # there's no EOF. So we format, mount, and use a tempfile on our
        # ramdisk. UDF or is probably as good as anything. We probably just
        # don't want any filesystem that's journaled to reduce overhead.


        while not set mountpoint (mktemp -d /$TMPDIR/.psub.XXXXXXXXXX); end
        chmod 0300 $mountpoint
        newfs_udf $ramdisk >/dev/null 2>&1
        mount_udf -o nobrowse $ramdisk $mountpoint

        while not set psubdir (mktemp -d $mountpoint/.psub.XXXXXXXXXX); end
        chmod 0300 $psubdir        

        while not set filename (mktemp $psubdir/temp.XXXXXXXXXX); end
        chmod 0100 $mountpoint
        chmod 0100 $psubdir
        chmod 0200 $filename
    end

    # Write stdin to tempfile
    cat > $filename
    chmod 0400 $filename


    # Write filename to stdout
    echo $filename

    # Find unique function name
    while true
        set funcname __fish_psub_(random)
        if not functions $funcname >/dev/null 2>&1
            break
        end
    end

    # Make sure we unmount and detatch when caller exits.
    function $funcname --on-job-exit caller --inherit-variable filename --inherit-variable funcname --inherit-variable use_file --inherit-variable ramdisk --inherit-variable mountpoint --inherit-variable psubdir
        chmod 0700 $psubdir $filename
        if test $use_file -eq 0
            umount -f $mountpoint >/dev/null 2>&1
            which hdiutil >/dev/null 2>&1
            and hdiutil detach $ramdisk >/dev/null 2>&1
            chmod 0700 $mountpoint
        end
        command rm -rf $mountpoint $psubdir $filename
        functions -e $funcname
    end
end
Contributor

geoff-codes commented Sep 24, 2015

@faho Yes. And no. But, yes.

But before we get to the good stuff, lets review:

  1. Disregard everything in the psub man page.
    Virtually unchanged from when it was written 10 years ago for fish 1.0.0, literally (and by literally, I mean literally) every sentence in man psub is wrong (except for the example). With that out of the way...
  2. What is 'process substitution'?
    Process substitution (which itself is something of a misnomer) was a concept introduced in the Korn shell (ksh88) as a shorthand for the often messy process of using file descriptors and named pipes with a command or program when the program expects a file as an argument. In other words, it is a way to emulate including these programs in a UNIX pipeline, where one otherwise could not, either simply by design (man dd), or for some other clever reason (man tee). The syntax used by shells that support this is:
command <(process) ...

The standard output of process is fed into a file descriptor or named pipe, which is passed as an argument to command.

and

process >(command) ...

The standard input of command is read from whatever process is doing with that argument (ideally, producing output).

This syntax is by no means standard, and it certainly is not POSIX. It is supported by ksh88 and ksh93, but not pdksh or mksh; bash supports it, but not in sh mode (bash --posix). It is supported in zsh when not in an emulation mode that proscribes it; and zsh also has another syntax, =(process), which we'll get to in a bit.

The fish psub function:

command (process | psub)

(purportedly) capitalizes on the fact that no special syntax is needed to perform "process substitution"; the ordinary syntax of command substitution (command or $(command) in Bourne shells, (command) is fish) can be used to accomplish the same thing pipeline within the command substitution whose standard output culminates in a file descriptor or named pipe.

  1. So, why has this bug stayed in place since it was introduced seven years ago (c6ebb23), two years since I opened this issue, followed by about a dozen others?
    This is where it gets tricky, and there are two factors at play.

    a. Many utilities which take files as arguments (i.e., for input or output), do so for a reason. A buffered UNIX pipeline often simply is not acceptable ... blah blah blah I've been up all night hacking on this so I'll finish my blustering treatise later. 💀

Basically, I propose we use a ramdisk. It combines to the "durability", ability to handle large files and non-streamable data of "file substitution" (?) (zsh =(command), what we're doing with psub at present) with the speed and ephemeral nature of using a fd or fifo.

@ridiculousfish @zanchey @anyone @anyone @bueller... Initial thoughts?

function psub --description "Process substitution, revisited."

    set -l filename
    set -l funcname

    set -l halfmem
    set -l sectors
    set -l ramdisk
    set -l mountpoint
    set -l psubdir

    set -l use_file 0

    while set -q argv[1]
        switch $argv[1]
            case -h --help
                __fish_print_help psub
                return 0

            case -f --file
                set use_file 1
        end
        set -e argv[1]
    end

    if not status --is-command-substitution
        echo psub: Not inside of command substitution >&2
        return 1
    end

    if test -z "$TMPDIR"
        set TMPDIR /tmp
    end

    # Implemention for other systems is left as an exercise for the reader.
    if test (uname) != Darwin
        set use_file 1                            # ... mount -t tmpfs ...
    end

    if test $use_file -eq 1
        while not set psubdir (mktemp -d $TMPDIR/.psub.XXXXXXXXXX); end
        chmod 0300 $psubdir

        while not set filename (mktemp $psubdir/temp.XXXXXXXXXX); end
        chmod 0100 $psubdir
        chmod 0200 $filename
    else
        # Basically: detect memory and use 1/2 of it (the default with tmpfs
        # on other platforms) as a ramdisk. The memory is allocated as needed.
        # `hdid -nomount ram://SECTORS` (of 512 bytes) on Darwin.

        set halfmem (math (sysctl -n hw.memsize) / 2)
        set sectors (math $halfmem / 512)
        set ramdisk (hdid -nomount ram://$sectors | tr -d [:space:])
        chmod 0600 $ramdisk

        # $ramdisk is now something like '/dev/disk2'; it would be nice if we
        # could just use the raw device file as $filename, but if we do that
        # there's no EOF. So we format, mount, and use a tempfile on our
        # ramdisk. UDF or is probably as good as anything. We probably just
        # don't want any filesystem that's journaled to reduce overhead.


        while not set mountpoint (mktemp -d /$TMPDIR/.psub.XXXXXXXXXX); end
        chmod 0300 $mountpoint
        newfs_udf $ramdisk >/dev/null 2>&1
        mount_udf -o nobrowse $ramdisk $mountpoint

        while not set psubdir (mktemp -d $mountpoint/.psub.XXXXXXXXXX); end
        chmod 0300 $psubdir        

        while not set filename (mktemp $psubdir/temp.XXXXXXXXXX); end
        chmod 0100 $mountpoint
        chmod 0100 $psubdir
        chmod 0200 $filename
    end

    # Write stdin to tempfile
    cat > $filename
    chmod 0400 $filename


    # Write filename to stdout
    echo $filename

    # Find unique function name
    while true
        set funcname __fish_psub_(random)
        if not functions $funcname >/dev/null 2>&1
            break
        end
    end

    # Make sure we unmount and detatch when caller exits.
    function $funcname --on-job-exit caller --inherit-variable filename --inherit-variable funcname --inherit-variable use_file --inherit-variable ramdisk --inherit-variable mountpoint --inherit-variable psubdir
        chmod 0700 $psubdir $filename
        if test $use_file -eq 0
            umount -f $mountpoint >/dev/null 2>&1
            which hdiutil >/dev/null 2>&1
            and hdiutil detach $ramdisk >/dev/null 2>&1
            chmod 0700 $mountpoint
        end
        command rm -rf $mountpoint $psubdir $filename
        functions -e $funcname
    end
end
@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Sep 24, 2015

Member

Virtually unchanged from when it was written 10 years ago for fish 1.0.0, literally (and by literally, I mean literally) every sentence in man psub is wrong (except for the example).

I don't think so - it could be improved but I don't see much that is wrong. A bit imprecise and awkward maybe but assuming psub were bug-free it would mostly be sort-of correct. Anyway, this isn't really important to the matter at hand - we should improve psub documentation, but mostly we should just improve psub.

Many utilities which take files as arguments (i.e., for input or output), do so for a reason. A buffered UNIX pipeline often simply is not acceptable

Or try to comm or diff the output of two commands - how would you specify that? Piping syntax doesn't scale beyond a one-to-one relationship (at least I haven't ever seen how that'd work).

... mount -t tmpfs ...

Nope - "mount: only root can use "--types" option". (There is probably a way, but that one's not it)

which hdiutil >/dev/null 2>&1
and hdiutil detach $ramdisk >/dev/null 2>&1

This doesn't seem great - you're creating a ramdisk with one tool and then only detach it if another tool exists?

Anyway, I don't see the merits of this approach for my system - /tmp (where current psub stores its fifo or file) is already a tmpfs. As my cursory googling shows, it's the default on archlinux (my distro), Fedora, Debian (nope, they reverted) and maybe Ubuntu, so other linuxen also won't benefit from this - at all. Even for those that don't have /tmp on tmpfs, we could use /run instead, which is explicitly defined to be one.

Plus, fifos have an advantage that you don't have here - they can be filled in the background, which means the reading side can start earlier (which is presumably why zsh offers both fifos and files). Check the current psub source (try not to step on the bugs) - the fifo path does so, while the file path does not. Also, this increases the code complexity, especially the amount of OS-specific code.

Can't say I'm a fan.

Member

faho commented Sep 24, 2015

Virtually unchanged from when it was written 10 years ago for fish 1.0.0, literally (and by literally, I mean literally) every sentence in man psub is wrong (except for the example).

I don't think so - it could be improved but I don't see much that is wrong. A bit imprecise and awkward maybe but assuming psub were bug-free it would mostly be sort-of correct. Anyway, this isn't really important to the matter at hand - we should improve psub documentation, but mostly we should just improve psub.

Many utilities which take files as arguments (i.e., for input or output), do so for a reason. A buffered UNIX pipeline often simply is not acceptable

Or try to comm or diff the output of two commands - how would you specify that? Piping syntax doesn't scale beyond a one-to-one relationship (at least I haven't ever seen how that'd work).

... mount -t tmpfs ...

Nope - "mount: only root can use "--types" option". (There is probably a way, but that one's not it)

which hdiutil >/dev/null 2>&1
and hdiutil detach $ramdisk >/dev/null 2>&1

This doesn't seem great - you're creating a ramdisk with one tool and then only detach it if another tool exists?

Anyway, I don't see the merits of this approach for my system - /tmp (where current psub stores its fifo or file) is already a tmpfs. As my cursory googling shows, it's the default on archlinux (my distro), Fedora, Debian (nope, they reverted) and maybe Ubuntu, so other linuxen also won't benefit from this - at all. Even for those that don't have /tmp on tmpfs, we could use /run instead, which is explicitly defined to be one.

Plus, fifos have an advantage that you don't have here - they can be filled in the background, which means the reading side can start earlier (which is presumably why zsh offers both fifos and files). Check the current psub source (try not to step on the bugs) - the fifo path does so, while the file path does not. Also, this increases the code complexity, especially the amount of OS-specific code.

Can't say I'm a fan.

@geoff-codes

This comment has been minimized.

Show comment
Hide comment
@geoff-codes

geoff-codes Sep 24, 2015

Contributor

I don't think so - it could be improved but I don't see much that is wrong. A bit imprecise and awkward maybe but assuming psub were bug-free it would mostly be sort-of correct. Anyway, this isn't really important to the matter at hand - we should improve psub documentation, but mostly we should just improve sub.

It is important if you one wants to know what we're trying to accomplish with this. Line-by-line:

Posix shells feature a syntax that is a mix between command substitution and piping, called process substitution.

  • Posix shells do not feature process substitution.

It is used to send the output of a command into the calling command, much like command substitution, but with the difference that the output is not sent through commandline arguments but through a named pipe, with the filename of the named pipe sent as an argument to the calling program.

  • Aside from the fact that bash is the only shell which can fall back to using a FIFO (on systems which lack numbered file descriptors), all other shells which implement process substitution use numbered file descriptors, not named pipes. The sentence is also self-contradictory, as it says output is not sent via command line arguments, then goes on to say the filename (the output of the final command in the pipeline) is sent as an argument.

psub combined with a regular command substitution provides the same functionality.

  • It doesn't, since it exclusively uses named pipes (in theory). But its also wrong because:

If the -f or --file switch is given to psub, psub will use a regular file instead of a named pipe to communicate with the calling process.

  • Nope, because of the type, this is what it always does, every since that switch was added.

This will cause psub to be significantly slower when large amounts of data are involved, but has the advantage that the reading process can seek in the stream.

  • Nope, for the reason above, psub and psub -f have identical behavior. Its also not necessarily true in any case, depending on how the stream is buffered, disk throughput, etc.

Every sentence.
On to more relevant matters:

Many utilities which take files as arguments (i.e., for input or output), do so for a reason. A buffered UNIX pipeline often simply is not acceptable

Or try to comm or diff the output of two commands - how would you specify that? Piping syntax doesn't scale beyond a one-to-one relationship (at least I haven't ever seen how that'd work).

You're quoting right where I dropped off there, so I'm not sure we actually disagree here.

But my point here is threefold:

  1. Often, a pipeline lacks the necessary complexity to handle all input and output (which we seem to agree on),
  2. However, pipes of any type, and therefore process substitution by any means (FIFOs, /dev/fd/X, etc) may not (and frequently do not) function in a manner sufficient to produce the same behavior as a regular file, do to buffering, etc..
  3. But notwithstanding the point above, there are a sufficient number of situations where one does not want large, intermediate temporary files on magnetic disks.

which hdiutil >/dev/null 2>&1
and hdiutil detach $ramdisk >/dev/null 2>&1

This doesn't seem great - you're creating a ramdisk with one tool and then only detach it if another tool exists?

No, while I probably don't need this guard line any more as I've since wrapped it in an if block, what I'm doing is handling an annoying fish "feature" which does not allow you to squash the output of attempting to use a command that does not exist.

~> asdfasdf >/dev/null 2>&1
fish: Unknown command 'asdfasdf'

... mount -t tmpfs ...
Nope - "mount: only root can use "--types" option". (There is probably a way, but that one's not it)
Anyway, I don't see the merits of this approach for my system - /tmp (where current psub stores its fifo or file) is already a tmpfs. As my cursory googling shows, it's the default on archlinux (my distro), Fedora, Debian (nope, they reverted) and maybe Ubuntu, so other linuxen also won't benefit from this - at all. Even for those that don't have /tmp on tmpfs...

Perhaps you missed my joke:

# Implemention for other systems is left as an exercise for the reader.

That is to say, it's likely already implemented. As in, when your init scripts ran mount -t tmpfs as root.

Plus, fifos have an advantage that you don't have here - they can be filled in the background, which means the reading side can start earlier (which is presumably why zsh offers both fifos and files).

There's nothing preventing one from reading a regular file while its still being written. tail -f?
The difference is a fifo is buffered, which is also why so many programs fail with fifos.

Also, this increases the code complexity, especially the amount of OS-specific code.

There is ample precedent for this. There is an immense ammount of OS-specific code in fish. ls.fish. open.fish. How man __fish_systemctl_SOMETHING.fish functions are there?

Can't say I'm a fan.

Well... sorry, I guess? We can't all run Arch Linux.

And I must say, pretty rude IMO, considering I only did any of this in light of the fact that you specifically asked for "progress" on this issue. I guess I interpreted that to mean more meaningful/fundamental improvements, since the forking problem is a much larger issue, well beyond the scope of this here..

If all you're looking for is a once-off workaround for 'THIS FISH DON'T FORK!', and you just want psub to work just like the <(kshisms), all you basically need is to fork the background process yourself.
This patch should do it.

Contributor

geoff-codes commented Sep 24, 2015

I don't think so - it could be improved but I don't see much that is wrong. A bit imprecise and awkward maybe but assuming psub were bug-free it would mostly be sort-of correct. Anyway, this isn't really important to the matter at hand - we should improve psub documentation, but mostly we should just improve sub.

It is important if you one wants to know what we're trying to accomplish with this. Line-by-line:

Posix shells feature a syntax that is a mix between command substitution and piping, called process substitution.

  • Posix shells do not feature process substitution.

It is used to send the output of a command into the calling command, much like command substitution, but with the difference that the output is not sent through commandline arguments but through a named pipe, with the filename of the named pipe sent as an argument to the calling program.

  • Aside from the fact that bash is the only shell which can fall back to using a FIFO (on systems which lack numbered file descriptors), all other shells which implement process substitution use numbered file descriptors, not named pipes. The sentence is also self-contradictory, as it says output is not sent via command line arguments, then goes on to say the filename (the output of the final command in the pipeline) is sent as an argument.

psub combined with a regular command substitution provides the same functionality.

  • It doesn't, since it exclusively uses named pipes (in theory). But its also wrong because:

If the -f or --file switch is given to psub, psub will use a regular file instead of a named pipe to communicate with the calling process.

  • Nope, because of the type, this is what it always does, every since that switch was added.

This will cause psub to be significantly slower when large amounts of data are involved, but has the advantage that the reading process can seek in the stream.

  • Nope, for the reason above, psub and psub -f have identical behavior. Its also not necessarily true in any case, depending on how the stream is buffered, disk throughput, etc.

Every sentence.
On to more relevant matters:

Many utilities which take files as arguments (i.e., for input or output), do so for a reason. A buffered UNIX pipeline often simply is not acceptable

Or try to comm or diff the output of two commands - how would you specify that? Piping syntax doesn't scale beyond a one-to-one relationship (at least I haven't ever seen how that'd work).

You're quoting right where I dropped off there, so I'm not sure we actually disagree here.

But my point here is threefold:

  1. Often, a pipeline lacks the necessary complexity to handle all input and output (which we seem to agree on),
  2. However, pipes of any type, and therefore process substitution by any means (FIFOs, /dev/fd/X, etc) may not (and frequently do not) function in a manner sufficient to produce the same behavior as a regular file, do to buffering, etc..
  3. But notwithstanding the point above, there are a sufficient number of situations where one does not want large, intermediate temporary files on magnetic disks.

which hdiutil >/dev/null 2>&1
and hdiutil detach $ramdisk >/dev/null 2>&1

This doesn't seem great - you're creating a ramdisk with one tool and then only detach it if another tool exists?

No, while I probably don't need this guard line any more as I've since wrapped it in an if block, what I'm doing is handling an annoying fish "feature" which does not allow you to squash the output of attempting to use a command that does not exist.

~> asdfasdf >/dev/null 2>&1
fish: Unknown command 'asdfasdf'

... mount -t tmpfs ...
Nope - "mount: only root can use "--types" option". (There is probably a way, but that one's not it)
Anyway, I don't see the merits of this approach for my system - /tmp (where current psub stores its fifo or file) is already a tmpfs. As my cursory googling shows, it's the default on archlinux (my distro), Fedora, Debian (nope, they reverted) and maybe Ubuntu, so other linuxen also won't benefit from this - at all. Even for those that don't have /tmp on tmpfs...

Perhaps you missed my joke:

# Implemention for other systems is left as an exercise for the reader.

That is to say, it's likely already implemented. As in, when your init scripts ran mount -t tmpfs as root.

Plus, fifos have an advantage that you don't have here - they can be filled in the background, which means the reading side can start earlier (which is presumably why zsh offers both fifos and files).

There's nothing preventing one from reading a regular file while its still being written. tail -f?
The difference is a fifo is buffered, which is also why so many programs fail with fifos.

Also, this increases the code complexity, especially the amount of OS-specific code.

There is ample precedent for this. There is an immense ammount of OS-specific code in fish. ls.fish. open.fish. How man __fish_systemctl_SOMETHING.fish functions are there?

Can't say I'm a fan.

Well... sorry, I guess? We can't all run Arch Linux.

And I must say, pretty rude IMO, considering I only did any of this in light of the fact that you specifically asked for "progress" on this issue. I guess I interpreted that to mean more meaningful/fundamental improvements, since the forking problem is a much larger issue, well beyond the scope of this here..

If all you're looking for is a once-off workaround for 'THIS FISH DON'T FORK!', and you just want psub to work just like the <(kshisms), all you basically need is to fork the background process yourself.
This patch should do it.

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Sep 24, 2015

Member

Posix shells do not feature process substitution.

Granted, but minor.

Aside from the fact that bash is the only shell which can fall back to using a FIFO (on systems which lack numbered file descriptors), all other shells which implement process substitution use numbered file descriptors, not named pipes.

Sure that zsh does it that way?

Nope, because of the type, this is what it always does, every since that switch was added.

"assuming psub were bug-free".

You're quoting right where I dropped off there, so I'm not sure we actually disagree here.

We don't - I was expanding on your point.

handling an annoying fish "feature" which does not allow you to squash the output of attempting to use a command that does not exist

Ah okay. In that case, shouldn't you do that with anything? Or isn't the error output here kinda important? You're leaking tmpfss (if I understand correctly).

Perhaps you missed my joke:

I was trying to say that it might be more complicated for other systems, though now I see that we probably could use /run on linux and just do the ramdisk setup on OSX/BSD.

There's nothing preventing one from reading a regular file while its still being written. tail -f?

Maybe we should consider running the cat > file in the background then, too?

There is ample precedent for this.

This seems a bit more complicated than most OS-specific paths.

How man __fish_systemctl_SOMETHING.fish functions are there?

For the record, I'm a bit annoyed by those, mostly since most of them are only used by the systemctl completion, AFAIK (I've thought about moving them into that, but I wanted to look into why they were moved out). Also, this is in completions, which are much less critical than psub.

And I must say, pretty rude IMO, considering I only did any of this in light of the fact that you specifically asked for "progress" on this issue. I guess I interpreted that to mean more meaningful/fundamental improvements, since the forking problem is a much larger issue, well beyond the scope of this here..

If I came of as rude, I'm sorry about that. It was never my intention. I was merely trying to express my technical opinion of your code. Maybe I was too blunt - might be my inherent german-ness (germanity?) or my mastery of the english language. Anyway, I appreciate your willingness to help here, I just don't agree with your proposal.

I only did any of this in light of the fact that you specifically asked for "progress" on this issue. I guess I interpreted that to mean more meaningful/fundamental improvements, since the forking problem is a much larger issue, well beyond the scope of this here..

I was more asking about @zanchey's topic branch and the work on the buffering issue. The buffering also bites us in other respects - look for bugs about functions running in the background, so it should be fixed anyway, which would also fix psub (well, that and the missing "$").

In that light, your ramdisk idea comes across as optimization work, and for that I didn't like the added complexity - the added forks (via e.g. math) might also cost more performance than they save, especially in short-lived psubs.

Member

faho commented Sep 24, 2015

Posix shells do not feature process substitution.

Granted, but minor.

Aside from the fact that bash is the only shell which can fall back to using a FIFO (on systems which lack numbered file descriptors), all other shells which implement process substitution use numbered file descriptors, not named pipes.

Sure that zsh does it that way?

Nope, because of the type, this is what it always does, every since that switch was added.

"assuming psub were bug-free".

You're quoting right where I dropped off there, so I'm not sure we actually disagree here.

We don't - I was expanding on your point.

handling an annoying fish "feature" which does not allow you to squash the output of attempting to use a command that does not exist

Ah okay. In that case, shouldn't you do that with anything? Or isn't the error output here kinda important? You're leaking tmpfss (if I understand correctly).

Perhaps you missed my joke:

I was trying to say that it might be more complicated for other systems, though now I see that we probably could use /run on linux and just do the ramdisk setup on OSX/BSD.

There's nothing preventing one from reading a regular file while its still being written. tail -f?

Maybe we should consider running the cat > file in the background then, too?

There is ample precedent for this.

This seems a bit more complicated than most OS-specific paths.

How man __fish_systemctl_SOMETHING.fish functions are there?

For the record, I'm a bit annoyed by those, mostly since most of them are only used by the systemctl completion, AFAIK (I've thought about moving them into that, but I wanted to look into why they were moved out). Also, this is in completions, which are much less critical than psub.

And I must say, pretty rude IMO, considering I only did any of this in light of the fact that you specifically asked for "progress" on this issue. I guess I interpreted that to mean more meaningful/fundamental improvements, since the forking problem is a much larger issue, well beyond the scope of this here..

If I came of as rude, I'm sorry about that. It was never my intention. I was merely trying to express my technical opinion of your code. Maybe I was too blunt - might be my inherent german-ness (germanity?) or my mastery of the english language. Anyway, I appreciate your willingness to help here, I just don't agree with your proposal.

I only did any of this in light of the fact that you specifically asked for "progress" on this issue. I guess I interpreted that to mean more meaningful/fundamental improvements, since the forking problem is a much larger issue, well beyond the scope of this here..

I was more asking about @zanchey's topic branch and the work on the buffering issue. The buffering also bites us in other respects - look for bugs about functions running in the background, so it should be fixed anyway, which would also fix psub (well, that and the missing "$").

In that light, your ramdisk idea comes across as optimization work, and for that I didn't like the added complexity - the added forks (via e.g. math) might also cost more performance than they save, especially in short-lived psubs.

@geoff-codes

This comment has been minimized.

Show comment
Hide comment
@geoff-codes

geoff-codes Sep 25, 2015

Contributor

Apology accepted.

Sure that zsh does it that way?

mpb% echo <(echo) <(echo)
/dev/fd/11 /dev/fd/12

Or isn't the error output here kinda important? You're leaking tmpfss (if I understand correctly).

No, the error isn't important. And no, we're not leaking tmpfs's. Mac OS X (Darwin, technically) has a rather bizarre mechanism for userland disks. hdid technically creates an in-memory disk image; and this disk needs to be unmounted, then "ejected", then "detached" to actually remove entry in /dev and the inode. The "error" I'm suppressing is "disk2" unmounted. "disk2" ejected.

Maybe we should consider running the cat > file in the background then, too?

Well, here you run essentially the opposite risk of using a buffer; if the reading process consumes at a faster rate than the outputting process it will "starve" and terminate.

I was more asking about @zanchey's topic branch and the work on the buffering issue. The buffering also bites us in other respects - look for bugs about functions running in the background, so it should be fixed anyway, which would also fix psub (well, that and the missing "$").

So, I think you might be conflating issues with pipe buffer(s) with the issue of "to fork or not to fork" within a pipeline (ephemeral file descriptors).

Pipe buffers are handled in-kernel, and are particularly relevant to FIFOs and real file descriptors.
The pipe buffer typically has a hard limit set by the operating system. All shells suffer equally from the limits of the pipe buffer. In the image below, note the similar error messages, which are due to the pipe buffer being exceeded:
pipebuffers
[ Note that this is with #2423; otherwise, the fish version would hang here. ]

There is a separate issue, of where, when, how, and with what, to fork and/or create a new thread, within a pipelined chain of commands.

See the lengthy discussion in #1228; in that vein, I still think fish needs to abstract and internalize the concept of file descriptors better; they needn't necessarily be tied to the actual file descriptors that exist outside the shell. In the following image, you can see bash actually creates and populates entries in /dev/fd that do not exist anywhere outside that instance of the shell; as far as I'm aware, fish does no such thing.

bash-fish-fds

Contributor

geoff-codes commented Sep 25, 2015

Apology accepted.

Sure that zsh does it that way?

mpb% echo <(echo) <(echo)
/dev/fd/11 /dev/fd/12

Or isn't the error output here kinda important? You're leaking tmpfss (if I understand correctly).

No, the error isn't important. And no, we're not leaking tmpfs's. Mac OS X (Darwin, technically) has a rather bizarre mechanism for userland disks. hdid technically creates an in-memory disk image; and this disk needs to be unmounted, then "ejected", then "detached" to actually remove entry in /dev and the inode. The "error" I'm suppressing is "disk2" unmounted. "disk2" ejected.

Maybe we should consider running the cat > file in the background then, too?

Well, here you run essentially the opposite risk of using a buffer; if the reading process consumes at a faster rate than the outputting process it will "starve" and terminate.

I was more asking about @zanchey's topic branch and the work on the buffering issue. The buffering also bites us in other respects - look for bugs about functions running in the background, so it should be fixed anyway, which would also fix psub (well, that and the missing "$").

So, I think you might be conflating issues with pipe buffer(s) with the issue of "to fork or not to fork" within a pipeline (ephemeral file descriptors).

Pipe buffers are handled in-kernel, and are particularly relevant to FIFOs and real file descriptors.
The pipe buffer typically has a hard limit set by the operating system. All shells suffer equally from the limits of the pipe buffer. In the image below, note the similar error messages, which are due to the pipe buffer being exceeded:
pipebuffers
[ Note that this is with #2423; otherwise, the fish version would hang here. ]

There is a separate issue, of where, when, how, and with what, to fork and/or create a new thread, within a pipelined chain of commands.

See the lengthy discussion in #1228; in that vein, I still think fish needs to abstract and internalize the concept of file descriptors better; they needn't necessarily be tied to the actual file descriptors that exist outside the shell. In the following image, you can see bash actually creates and populates entries in /dev/fd that do not exist anywhere outside that instance of the shell; as far as I'm aware, fish does no such thing.

bash-fish-fds

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Sep 25, 2015

Member

The "error" I'm suppressing is "disk2" unmounted. "disk2" ejected.

Ummh...the error you'd be suppressing is "Unknown command 'hdiutil'" - this is about the which hdiutil; and part, not the hdiutil detach call. If detaching is important, you should show an error if it can't be done.

this disk needs to be unmounted, then "ejected", then "detached" to actually remove entry in /dev and the inode.

In that case are you leaking entries in /dev or are they reused?

Well, here you run essentially the opposite risk of using a buffer; if the reading process consumes at a faster rate than the outputting process it will "starve" and terminate.

So there is something stopping us from backgrounding writing to a regular file.

All shells suffer equally from the limits of the pipe buffer

Currently, fish suffers worse, because it actually hangs. (Which IIUC is because of #238 - we never get to the reading before finishing the writing so if we can't finish the writing because the buffer is full...)


Okay, let's look at the ramdisk stuff again: The setup on OSX is really rather complicated, while on linux we could use /run (with a fallback to /tmp). The advantage of this approach is that the data never hits the disk (unless of course it swaps) but is still seekable. The disadvantage is that even readers who can deal with waiting for data (like presumably tail -f) will be started only after the data is fully written - this also means the behavior for very large data with readers who can deal with a fifo is somewhat worse than the behavior with an actual fifo.

So it is just straight up better than using on-disk files (discounting the code complexity), but not strictly better than fifos - which is how we'd again end up with offering two solutions (and letting the user decide between them since we can't, like zsh does).

I still think fish needs to abstract and internalize the concept of file descriptors better; they needn't necessarily be tied to the actual file descriptors that exist outside the shell.

I'm afraid I don't completely understand - how would that help with psub here? Wouldn't that still be tied to the buffering limitations?

In the following image, you can see bash actually creates and populates entries in /dev/fd that do not exist anywhere outside that instance of the shell

Be careful what you wish for when it comes to bash and what things it does to /dev - or you might end up implementing /dev/tcp.

Member

faho commented Sep 25, 2015

The "error" I'm suppressing is "disk2" unmounted. "disk2" ejected.

Ummh...the error you'd be suppressing is "Unknown command 'hdiutil'" - this is about the which hdiutil; and part, not the hdiutil detach call. If detaching is important, you should show an error if it can't be done.

this disk needs to be unmounted, then "ejected", then "detached" to actually remove entry in /dev and the inode.

In that case are you leaking entries in /dev or are they reused?

Well, here you run essentially the opposite risk of using a buffer; if the reading process consumes at a faster rate than the outputting process it will "starve" and terminate.

So there is something stopping us from backgrounding writing to a regular file.

All shells suffer equally from the limits of the pipe buffer

Currently, fish suffers worse, because it actually hangs. (Which IIUC is because of #238 - we never get to the reading before finishing the writing so if we can't finish the writing because the buffer is full...)


Okay, let's look at the ramdisk stuff again: The setup on OSX is really rather complicated, while on linux we could use /run (with a fallback to /tmp). The advantage of this approach is that the data never hits the disk (unless of course it swaps) but is still seekable. The disadvantage is that even readers who can deal with waiting for data (like presumably tail -f) will be started only after the data is fully written - this also means the behavior for very large data with readers who can deal with a fifo is somewhat worse than the behavior with an actual fifo.

So it is just straight up better than using on-disk files (discounting the code complexity), but not strictly better than fifos - which is how we'd again end up with offering two solutions (and letting the user decide between them since we can't, like zsh does).

I still think fish needs to abstract and internalize the concept of file descriptors better; they needn't necessarily be tied to the actual file descriptors that exist outside the shell.

I'm afraid I don't completely understand - how would that help with psub here? Wouldn't that still be tied to the buffering limitations?

In the following image, you can see bash actually creates and populates entries in /dev/fd that do not exist anywhere outside that instance of the shell

Be careful what you wish for when it comes to bash and what things it does to /dev - or you might end up implementing /dev/tcp.

@zanchey

This comment has been minimized.

Show comment
Hide comment
@zanchey

zanchey Oct 9, 2015

Member

As a systems administrator, I am terrified by the use of memory-backed filesystems as written.

Member

zanchey commented Oct 9, 2015

As a systems administrator, I am terrified by the use of memory-backed filesystems as written.

@zanchey

This comment has been minimized.

Show comment
Hide comment
@zanchey

zanchey Feb 24, 2016

Member

The topic branch has bitrotted, and wasn't a particularly novel fix so I've removed it while the rest of the issue is worked out.

Member

zanchey commented Feb 24, 2016

The topic branch has bitrotted, and wasn't a particularly novel fix so I've removed it while the rest of the issue is worked out.

@jkabrg

This comment has been minimized.

Show comment
Hide comment
@jkabrg

jkabrg Mar 11, 2016

Could we not have two commands for now? A psub which does process substitution using fifos; and an osub which runs the command, waits for it to finish, and puts its output in a file.

So adding the missing sigil gives us psub and without it we get osub.

jkabrg commented Mar 11, 2016

Could we not have two commands for now? A psub which does process substitution using fifos; and an osub which runs the command, waits for it to finish, and puts its output in a file.

So adding the missing sigil gives us psub and without it we get osub.

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Mar 12, 2016

Member

@jkabrg: See #2052 - the path that would be reached when adding that "$" is basically broken, so adding the sigil again would make it worse, not better.

Member

faho commented Mar 12, 2016

@jkabrg: See #2052 - the path that would be reached when adding that "$" is basically broken, so adding the sigil again would make it worse, not better.

@krader1961 krader1961 modified the milestones: next-2.x, 2.3.0 Mar 22, 2016

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Mar 22, 2016

Contributor

It's pretty clear that this is not going to be fixed as part of the 2.3.0 release milestone so I'm punting this back to next-2.x.

Contributor

krader1961 commented Mar 22, 2016

It's pretty clear that this is not going to be fixed as part of the 2.3.0 release milestone so I'm punting this back to next-2.x.

@floam

This comment has been minimized.

Show comment
Hide comment
@floam

floam Mar 30, 2016

Member
 while not set mountpoint (mktemp -d /$TMPDIR/.psub.XXXXXXXXXX); end
        chmod 0300 $mountpoint
        newfs_udf $ramdisk >/dev/null 2>&1
        mount_udf -o nobrowse $ramdisk $mountpoint

I recently had shell-script set up a RAM disk for me on OS X. FWIW it's easier and I think better to use diskutil for a case like this. It'll format it and put it in /Volumes, you don't need to clean up after it or deal with so much administrative debris.

I'd do something closer to this:

diskutil erasevolume HFS+ "fishdisk" (hdiutil attach -nomount ram://$sectors | string trim)

Member

floam commented Mar 30, 2016

 while not set mountpoint (mktemp -d /$TMPDIR/.psub.XXXXXXXXXX); end
        chmod 0300 $mountpoint
        newfs_udf $ramdisk >/dev/null 2>&1
        mount_udf -o nobrowse $ramdisk $mountpoint

I recently had shell-script set up a RAM disk for me on OS X. FWIW it's easier and I think better to use diskutil for a case like this. It'll format it and put it in /Volumes, you don't need to clean up after it or deal with so much administrative debris.

I'd do something closer to this:

diskutil erasevolume HFS+ "fishdisk" (hdiutil attach -nomount ram://$sectors | string trim)

@floam

This comment has been minimized.

Show comment
Hide comment
@floam

floam Mar 30, 2016

Member

I should mention both methods take a few moments to create/format/mount. Can take several seconds on my system in a bad case. Not an awesome optimization for psub.

Member

floam commented Mar 30, 2016

I should mention both methods take a few moments to create/format/mount. Can take several seconds on my system in a bad case. Not an awesome optimization for psub.

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Sep 7, 2016

Contributor

I'm removing the "next-2.x" label because this has been open for three years. There is no reason to think this will become a priority to fix anytime soon.

Contributor

krader1961 commented Sep 7, 2016

I'm removing the "next-2.x" label because this has been open for three years. There is no reason to think this will become a priority to fix anytime soon.

@urxvtcd

This comment has been minimized.

Show comment
Hide comment
@urxvtcd

urxvtcd Apr 29, 2017

Hello there, I have encountered a situation where fish hangs upon process substitution, like in paste (tail -f foo | psub) (tail -f bar | psub). Bash analogue works fine. Is this related to this issue? And does some workaround exist?

urxvtcd commented Apr 29, 2017

Hello there, I have encountered a situation where fish hangs upon process substitution, like in paste (tail -f foo | psub) (tail -f bar | psub). Bash analogue works fine. Is this related to this issue? And does some workaround exist?

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Apr 29, 2017

Member

@urxvtcd: Yes, that's this issue. psub currently only returns when the file has been fully written, and with tail -f that never happens since it keeps writing.

The workaround is to use something that can follow multiple files at the same time (like multitail). Or use multiple terminal windows with one tail each, or use e.g. tmux.

Alternatively, and I recommend against this, you can make your own fifos (with mkfifo) and then redirect the tail outputs into those.

Note that, if you background the tails, this will pretty much by necessity create dangling jobs - tail -f won't ever quit, so whatever is reading it won't either. I expect this to be an issue with bash as well.

Member

faho commented Apr 29, 2017

@urxvtcd: Yes, that's this issue. psub currently only returns when the file has been fully written, and with tail -f that never happens since it keeps writing.

The workaround is to use something that can follow multiple files at the same time (like multitail). Or use multiple terminal windows with one tail each, or use e.g. tmux.

Alternatively, and I recommend against this, you can make your own fifos (with mkfifo) and then redirect the tail outputs into those.

Note that, if you background the tails, this will pretty much by necessity create dangling jobs - tail -f won't ever quit, so whatever is reading it won't either. I expect this to be an issue with bash as well.

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jul 17, 2017

Contributor

While fixing #4222 I updated the psub documentation to clarify that --file is the default behavior. I also added a new --fifo flag to request the use of a named pipe and documented when and why you shouldn't use that flag. We still need to find a way to make using --fifo safe from deadlock but that's going to require fundamental changes to fish internals.

Contributor

krader1961 commented Jul 17, 2017

While fixing #4222 I updated the psub documentation to clarify that --file is the default behavior. I also added a new --fifo flag to request the use of a named pipe and documented when and why you shouldn't use that flag. We still need to find a way to make using --fifo safe from deadlock but that's going to require fundamental changes to fish internals.

@krader1961 krader1961 changed the title from psub never subs any p's to find a way to make `psub --fifo` safe from deadlock Jul 17, 2017

@krader1961 krader1961 added enhancement and removed bug labels Jul 23, 2017

jdxcode pushed a commit to jdxcode/fish-shell that referenced this issue Aug 28, 2017

use mktemp(1) to generate temporary file names
Fix for CVE-2014-2906.

Closes a race condition in funced which would allow execution of
arbitrary code; closes a race condition in psub which would allow
alternation of the data stream.

Note that `psub -f` does not work (fish-shell#1040); a fix should be committed
separately for ease of maintenance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment