Skip to content

Loading…

Piping for hooks #9

Closed
stuartpb opened this Issue · 5 comments

1 participant

@stuartpb
plushu member

(Historical note: "runhook" was the early name for what would become "plushook".)

runhook's stdin support right now is not great. It reads all of stdin into a variable, which:

  • is probably not binary-safe (embedded zeroes?)
  • is ram-inefficient

options to fix this without changing the current hook paradigm:

  • check if only one hook and if so just pipe to it normally
    • not mutually exclusive with other solutions
  • construct pipeline
    • cons:
      • requires eval (and all the problems that brings)
  • (re)adopt or fork pluginhook.
    • cons:
      • requires adding a whole new compiled dependency, and/or go
      • doesn't support plushu-style numbered/pre hooks (would be fixed by forking)
      • sort of requires reverting to hook-filter i/o
  • use pee
    • cons:
      • not part of a default install
      • only runs in parallel (dealbreaker)
  • use tee + process redirection file descriptors
    • cons:
      • requires eval (and all the problems that brings)
      • process redirection is finite (I think the max can be something like 4)
  • save stdin to tmp file
    • this was how runhook originally worked
    • I switched to the current solution because in tracing I found that mktemp was a huge performance bottleneck, but that might have just been a load issue at the time
    • cons:
      • (sort of) adds dependency on mktemp to core (not guaranteed to be installed)
      • tmp files add an externality to the run environment. tmp files can be manipulated, etc.
      • adds fs manipulation step which presents performance problems (second-long delays in otherwise instant commands)
      • hooks that need files can make the files and use command line options
  • mkfifo+tee juggling
    • cons:
      • mkfifo has the same fs-encapsulation issues as mktemp
      • hacky as hell
@stuartpb
plushu member

If at all possible, tee-juggling is the way I would most want to do this:

  • Create two anonymous pipes, one containing the contents of stdin and the other empty (the "second anonymous pipe").
  • Replace the file descriptor for stdin with the first anonymous pipe. (This is really just a safety precaution in case the original stdin wouldn't work as a proper pipe.)
  • Run hook in process-substitution environment, attaching the parent stdout to the child's stdout, and the second anonymous pipe to tee.
  • Swap the anonymous pipes (the one for stdin and the "second" pipe).

Repeat as necessary.

@stuartpb
plushu member

On further inquiry it looks like pipes aren't really great buffering mechanisms (Linux limits them to 64K, up from 4k in 2.6.11).

Really, having hook scripts work as filters is merited from a technical standpoint, as well as being (arguably) merited from a design standpoint.

In short, this is an aspect of pluginhook that ultimately needs to be preserved. Here's the complete rationale for this behavior.

Parallel Multiplexed I/O

Pros

  • Output from hooks doesn't have to work as input. Scripts can call hooks with input and get a different type of output (think git hooks that provide refs on stdin and take a commit message from stdout).
  • Hooks are independent of one another: one hook can't affect the input to another hook

Cons

  • Without some kind of buffering, must run in parallel and can't run sequentially (on top of breaking anything that must be ordered, output from multiple hooks'll probably be all interleaved and messed up).

Serial Filters

Cons

  • Hooks have to output the same sort of thing as stdin to stdout
  • Hooks are dependent on the output of other hooks: hooks can transform the behavior of later hooks

Pros

  • Hooks can transform the behavior of later hooks
  • Hooks have other ways of taking repeated input, like args (could even take a tempfile name): any hook type that can have an output format that makes sense when blindly concatenated (as sequentialized parallel would have to be) can do it as a filter that just cats something onto stdin
  • Doesn't require buffering hacks to work sequentially
  • stdout can be made available on file descriptor 3 I think? So all is not lost.
@stuartpb
plushu member

Actually, considering that piped commands run concurrently as well (as they'd have to), and I don't want to force input reading... maybe I just need to suck it up and use tmpfiles (in the multiple-hooks case). Besides, constructing pipes is not something Bash likes to do.

@stuartpb
plushu member

Solution implemented in 9138d27: stdin is passed directly to a hook when there is a single hook script. In the (less common) event there is more than 1 hook, stdin is put into a temp file, which is then directed into each hook in a serial loop.

Being able to have a separate stdin and stdout is massively important, especially for Git hook compatibility. Each hook giving its output as it runs makes total sense, and each hook should run one after another.

@stuartpb stuartpb closed this
@stuartpb
plushu member

Also, since 90% of the time making the temp file and multiplexing is overkill, stdin is now disabled altogether unless the -i flag is passed to runhook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.