Added bytestring variants for readProcess and friends #36

tdammers · 2015-07-26T20:28:52Z

This allows convenient binary communication with subprocesses.

Some concerns that may need addressing:

I introduced these function in a rather dirty copy-and-paste style; they should work fine, but there is a bit of code duplication there now; I don't feel confident to refactor as needed though, because I am not all too familiar with the decisions that went into these particular functions.
I introduced a dependency on the bytestring package; this isn't usually something I worry about, but considering how process has hardly any dependencies, adding one feels like a somewhat bigger deal than usual.
The bytestring variants might deserve a module of their own.
I haven't added any tests; the ones in /tests seem kind of dead, or at least cabal test doesn't touch them, but copying the process006 test over and changing it to use the bytestring flavors should work.

This allows convenient binary communication with subprocesses.

snoyberg · 2015-07-27T02:06:41Z

@hvr I haven't reviewed the code yet, but I wanted to touch base regarding the bytestring dependency. Do you know of any reason we should avoid it?

tdammers · 2015-07-27T12:43:25Z

I really can't think of any serious reason, really. Maybe there are people with use cases where it is absolutely vital to avoid unnecessary dependencies (idk, embedded dev maybe? Haven't done any of that, so I really have no clue what the situation is like there), but my expectation would be that bytestring is such a basic and ubiquitous library that depending on it should be acceptable even then.

TL;DR: If I were in charge, I wouldn't worry, but maybe others have pressing concerns here.

tdammers · 2015-07-27T12:45:36Z

Oh, and also, the build fails for GHC 7.4 and earlier (https://travis-ci.org/haskell/process/jobs/72742820), as far as I can tell this is because there is no DeepSeq.NFData instance for strict ByteStrings yet. Not sure how to tackle this - I believe that since we're dealing with strict ByteStrings, deepseq-ing things isn't required, so we could just strip that part out?

snoyberg · 2015-07-27T12:48:56Z

Sounds correct on the deepseq parts. The only reason I'm concerned at all about bytestring is that I'm not familiar with the GHC build process, and don't want to cause problems there.

Strict ByteStrings have no suitable instances for deepseq'ing on older GHC's (<= 7.4), but we don't actually need deepseq here, because BS.hGetContents is strict already. Instead, we'll do our reading and writing in concurrent threads directly, and send them back to the main thread through MVars.

hvr · 2015-07-27T15:41:29Z

I can't think of any reason not to depend on bytestring in the context of GHC.
(btw, with process being so low in the dependency tree, it's even more important to specify exact version bounds, so that the cabal solver can avoid faulty install-plans -- process is an upgradable package even if one version is bundled w/ GHC by default)

I'd avoid however depending on deepseq if it's merely for strict ByteStrings.

However, I would have expected lazy ByteStrings being used, especially as the stdout/stderr output of processes is of unknown size and potentially unbounded... but this is just a quick observation...

tdammers · 2015-07-27T15:42:02Z

This last commit removes the deepseq-ing for ByteStrings, instead putting the hGetContents call (which is strict for strict bytestrings) into the worker threads; it should build fine on older GHC's, and also play nicer with threading.

tdammers · 2015-07-27T15:45:07Z

@hvr: true, lazy bytestrings would be expected, but I'm not sure how this would play along with the "read outputs strictly" part.

Re the deepseq dependency: that one was already in place in order to strictly read Strings from the subprocess; the ByteString-specific code in my last commit no longer depends on deepseq anymore though.

snoyberg · 2015-07-27T17:00:34Z

The data is going to be read out strictly regardless. The advantage of a lazy ByteString in this case would be theoretically better memory layout, since it can be chunked instead of a single large allocation. However:

We shouldn't be using these functions for large outputs anyway, as that can lead to memory exhaustion
From my experience with http-conduit, people get very confused about lazy-but-not-really-lazy ByteStrings

OK, I'll move ahead with reviewing this code, sounds like we don't have any blockers on merge besides the code itself ;)

hvr · 2015-07-27T17:17:26Z

fair enough, in the interest of some more bikeshedding, I'm slightly +1 for

The bytestring variants might deserve a module of their own.

=)

tdammers · 2015-07-27T17:23:41Z

The data is going to be read out strictly regardless.

Indeed - the intended behavior is "block until the child process terminates, then return stdout and stderr", which is kind of impossible with lazy reads (can't read lazily when the child process has already terminated).

@hvr: Should I move the bytestring variants out then?

snoyberg · 2015-07-27T17:42:36Z

I'm +1 on the separate module approach too, it fits nicely with other core libraries (like network). It also makes it easy to make a shim compatibility package.

tdammers · 2015-07-27T18:48:45Z

Alright, separated out the ByteString functions into System.Process.ByteString. A bunch of support functions had to move to System.Process.Internal in the process; most of them weren't exported, but those that were are now re-exported from System.Process.

tdammers · 2015-07-27T19:48:13Z

OK, turns out I'm rather dumb:

http://hackage.haskell.org/package/process-extras-0.3.3.5/docs/System-Process-ByteString.html

I guess that means my pull request is moot then...

bgamari · 2015-07-28T06:52:10Z

@tdammers I believe you really want lazy bytestrings here. Currently your approach uses the strict hGetContents which reads chunks and then concatenates them together into a single buffer, resulting in a needless copy.

For this reason you really want to use Data.ByteString.Lazy.hGetContents and force the result. The easiest way to accomplish this is with deepseq. That being said, you should be able to get the same effect with foldrChunks.

hvr · 2015-07-28T07:10:25Z

@tdammers it may still be desirable to move System.Process.ByteString from process-extras into process though...

hvr · 2015-07-28T10:00:43Z

@snoyberg re

We shouldn't be using these functions for large outputs anyway, as that can lead to memory exhaustion

maybe those strictly-consuming-output functions should take a max-size parameter as an upper bound; this would encourage a more defensive programming style...

snoyberg · 2015-07-28T15:02:44Z

Due to prior art, I'm very much tempted to take the two ByteString modules from process-extras exactly as they are to make for as easy a transition as possible. When it comes to consuming possibly-large data, I think dropping down to a Handle is what we need to encourage, and do so via the documentation, not necessarily any added functionality. If we're going to consider improving the API in that direction, I'd think of using streaming-commons's approach, but that should be a separate issue.

hvr · 2015-07-28T15:22:12Z

For the sake of compat/transition, I would have used process-extra's type-signatures, add a big warning, and add some ...WithLimit versions.

For me personally, those ByteString variants become much less attractive (other than for quick'n'dirty runghc scripts or GHCi experimentations where I don't care much if they terminate w/ OOMs) if they have neither lazyness nor an explicit length-limit passed. As I don't want to to have to mess with Handles, I guess I'll have to resort to streaming APIs based on conduit//machines/pipes/io-streams instead instead... :-)

snoyberg · 2015-07-28T15:26:08Z

To be clear: a strict ByteString API (whether based on actual strict ByteStrings or lazy ByteStrings that are fully evaluated) is semantically identical to the current functions in process, except for the obvious differences with a ByteString (space usage and character vs octet). readProcess is already strict, and will already run out of memory.

If we wanted to double down on lazy I/O (which I don't want), a function signature like the following could be added:

readProcessLazily
  :: CreateProcess
  -> L.ByteString -- ^ stdin
  -> (L.ByteString -> L.ByteString -> IO a) -- ^ inner function, takes lazy stdout and stderr
  -> IO (a, ExitCode)

And then the inner function will run concurrently with the process, instead of the current API which requires that the process exit before getting access to the data at all.

snoyberg · 2015-08-19T08:11:57Z

Is everyone OK with closing this based on the presence of process-extras?

Added bytestring variants for readProcess and friends

828e280

This allows convenient binary communication with subprocesses.

Moved System.Process.ByteString into a separate module.

f6ebd4a

Fixed: try is ambiguous on base <4.6

a7118f7

snoyberg closed this Aug 24, 2015

snoyberg mentioned this pull request Apr 6, 2016

Binary I/O versions of readProcess, readProcessWithExitCode, etc. #59

Closed

ndmitchell mentioned this pull request Dec 29, 2016

Clarify Binary properties of the streams #82

Closed

snoyberg mentioned this pull request Feb 3, 2017

There's no readCreateProcessWithExitCodeWithoutStdErr #89

Closed

Uh oh!

Added bytestring variants for readProcess and friends #36

Added bytestring variants for readProcess and friends #36

Uh oh!

Conversation

tdammers commented Jul 26, 2015

Uh oh!

snoyberg commented Jul 27, 2015

Uh oh!

tdammers commented Jul 27, 2015

Uh oh!

tdammers commented Jul 27, 2015

Uh oh!

snoyberg commented Jul 27, 2015

Uh oh!

hvr commented Jul 27, 2015

Uh oh!

tdammers commented Jul 27, 2015

Uh oh!

tdammers commented Jul 27, 2015

Uh oh!

snoyberg commented Jul 27, 2015

Uh oh!

hvr commented Jul 27, 2015

Uh oh!

tdammers commented Jul 27, 2015

Uh oh!

snoyberg commented Jul 27, 2015

Uh oh!

tdammers commented Jul 27, 2015

Uh oh!

tdammers commented Jul 27, 2015

Uh oh!

bgamari commented Jul 28, 2015

Uh oh!

hvr commented Jul 28, 2015

Uh oh!

hvr commented Jul 28, 2015

Uh oh!

snoyberg commented Jul 28, 2015

Uh oh!

hvr commented Jul 28, 2015

Uh oh!

snoyberg commented Jul 28, 2015

Uh oh!

snoyberg commented Aug 19, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants