New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't buffer stderr of --filter #2729

Closed
Warbo opened this Issue Feb 19, 2016 · 3 comments

Comments

Projects
None yet
2 participants
@Warbo

Warbo commented Feb 19, 2016

When calling external filter programs, e.g. pandoc --filter foo < input > output the program foo will be executed, and all of its standard error stream is captured to a Haskell variable errbs (see

(exitcode, outbs, errbs) <- E.handle filterException $
).

Once the program has finished executing, errbs is checked to see if it's empty; if not, its content is written to the the stderr of pandoc.

This buffering (waiting until the filter has finished before showing anything on stderr) can make debugging filters tedious. For example, if a filter takes a long time to run (e.g. it's doing a lot of IO), then we would like to be notified of any problems without having to wait for the whole thing to finish. Likewise, if there is a bug in a filter which is transforming a large number of elements, we have to wait until all of them have failed, and look through the whole output of all failures, rather than being able to spot the error immediately after the first failure.

In particular, I use a Pandoc filter called "panpipe" which allows arbitrary shell commands to be executed ( http://chriswarbo.net/git/panpipe/branches/master/ , described in more detail at http://chriswarbo.net/essays/activecode ), and use it to run many resource-intensive scripts. To avoid having to wait for stderr, I must use pandoc -t json | panpipe | pandoc -f json rather than pandoc --filter panpipe.

The errbs variable doesn't seem to be used for anything else, so it could simply be removed, along with a corresponding change to the Text.Pandoc.Process.pipeProcess function. The required change would be around

(Just inh, Just outh, Just errh, pid) <- createProcess (proc cmd args)
where std_err = Inherit should be used rather than std_err = CreatePipe (as per https://hackage.haskell.org/package/process-1.4.2.0/docs/System-Process.html#t:StdStream ).

It looks like the only other user of the pipeProcess function is the Text.Pandoc.PDF module, which hides all output of conversion processes (including stderr) unless the verbose variable is True.

Perhaps a simpler and neater solution would be passing the verbose flag into pipeProcess, and doing something like std_err = if verbose then Inherit else NoStream. That way, the System.Process module will be doing all of the work: when output it wanted, it will be given immediately without buffering; when it's not wanted, it will never be stored at all; and in either case, all of the custom IO code for handling stderr can be deleted.

If this sounds reasonable, I may take a stab at implementing it.

@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm Feb 19, 2016

Owner

I think that makes a lot of sense. Of course, we may see
a difficulty after you've implemented it, but I think it
makes sense to implement it as you suggest.

Owner

jgm commented Feb 19, 2016

I think that makes a lot of sense. Of course, we may see
a difficulty after you've implemented it, but I think it
makes sense to implement it as you suggest.

@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm Dec 8, 2016

Owner

It's not as simple as I thought. The PDF reader doesn't simply print stderr output (or discard it). It analyzes the log messages in order to present to the pandoc user just the relevant part, without the full output, which can be confusing.

So probably the best approach is to offer an alternative version of pipeProcess (say, pipeProcessStreamingStderr) that streams stderr instead of returning it.

Owner

jgm commented Dec 8, 2016

It's not as simple as I thought. The PDF reader doesn't simply print stderr output (or discard it). It analyzes the log messages in order to present to the pandoc user just the relevant part, without the full output, which can be confusing.

So probably the best approach is to offer an alternative version of pipeProcess (say, pipeProcessStreamingStderr) that streams stderr instead of returning it.

@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm Dec 9, 2016

Owner

It turns out latex log output goes to stdout.
So I'm going to change the pipeProcess signature.
Since this is an API change, I'm doing it in the typeclass branch, so it will be in pandoc 2.0.

Owner

jgm commented Dec 9, 2016

It turns out latex log output goes to stdout.
So I'm going to change the pipeProcess signature.
Since this is an API change, I'm doing it in the typeclass branch, so it will be in pandoc 2.0.

jgm added a commit that referenced this issue Dec 9, 2016

Process.pipeProcess: stream stderr rather than capturing.
Signature of pipeProcess has changed: the return value is
now IO (ExitCode, ByteString) -- with only stdout. Stderr
is just inherited from the parent.

This means that stderr from filters will now be streamed
as the filters are run.

Closes #2729.

jgm added a commit that referenced this issue Dec 24, 2016

Process.pipeProcess: stream stderr rather than capturing.
Signature of pipeProcess has changed: the return value is
now IO (ExitCode, ByteString) -- with only stdout. Stderr
is just inherited from the parent.

This means that stderr from filters will now be streamed
as the filters are run.

Closes #2729.

jgm added a commit that referenced this issue Jan 16, 2017

Process.pipeProcess: stream stderr rather than capturing.
Signature of pipeProcess has changed: the return value is
now IO (ExitCode, ByteString) -- with only stdout. Stderr
is just inherited from the parent.

This means that stderr from filters will now be streamed
as the filters are run.

Closes #2729.

jgm added a commit that referenced this issue Jan 24, 2017

Process.pipeProcess: stream stderr rather than capturing.
Signature of pipeProcess has changed: the return value is
now IO (ExitCode, ByteString) -- with only stdout. Stderr
is just inherited from the parent.

This means that stderr from filters will now be streamed
as the filters are run.

Closes #2729.

@jgm jgm modified the milestone: pandoc 2.0 Jan 25, 2017

@jgm jgm closed this in 9570f59 Jan 29, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment