New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pandoc support, refactor external helpers #4060

Merged
merged 1 commit into from Nov 30, 2017

Conversation

Projects
None yet
10 participants
@betaveros
Copy link
Contributor

betaveros commented Nov 7, 2017

Recognize the Pandoc format under the file extension .pandoc or .pdc,
and shell out to pandoc as an external helper to format Pandoc content.

Add a configuration option, externalHelperArguments, that can be used to
override the additional arguments passed to external helpers.

Refactor out repeated code with external helpers. Change the error
output formatting. I did not see any of the external helpers print the
string "<input>" to represent stdin as a file; just prepending the file
name to error output is more general and doesn't sacrifice that much in
terms of readability.

@betaveros betaveros force-pushed the betaveros:pandoc-support branch from ba056de to 182f139 Nov 7, 2017

@bep

This comment has been minimized.

Copy link
Member

bep commented Nov 7, 2017

A quick question: Is it a remote possibility to do "shell injection" via os/exec so I could configure it to do

pandoc --strip-comments && rm -rf /

In the above I'm mostly worried about the people on the forum etc. that asks me questions of the type "can you clone my site and see what the problem is?" ... If I would start worrying about some malware type of situations, I would stop doing that.

Note that I would expect the golang team to protect against this. But I would like to be really sure.

@betaveros

This comment has been minimized.

Copy link
Contributor

betaveros commented Nov 7, 2017

I don't think shell injection can be achieved in exec.Command, but the broader point is well-taken; if you're worried about malicious config files we shouldn't allow the user to pass arbitrary arguments to external helpers, since these helpers might not be designed to be safe under those circumstances. For example, it looks like pandoc takes an option --filter=PROGRAM that will call an arbitrary executable (perhaps, a malicious shell script elsewhere in the repository), so an attacker with full control over pandoc's arguments could do a lot of damage even without shell injection.

It's unfortunate though, most of the allure of using pandoc is getting access to all of its options and I don't think there is a universal set that will make everybody who wants to use pandoc happy (I for one haven't even decided which of the math output formats I want for my static site yet). I could whitelist some pandoc options, but it would either be woefully incomplete or extremely bloated for an external helper that I understand probably isn't a central use case Hugo wants to support. Still, I guess an incomplete whitelist is better than nothing.

(If you're worried about malicious config files in general, I'll also note in passing that it looks like attackers can also specify arbitrary executables in the configuration option newContentEditor, so they could also cause a malicious script to be invoked on hugo new.)

@bep

This comment has been minimized.

Copy link
Member

bep commented Nov 7, 2017

If you're worried about malicious config files in general

I will make a note about the newContentEditor, but that command is at least not part of the main Hugo build.

@bep

This comment has been minimized.

Copy link
Member

bep commented Nov 7, 2017

I will think about this a little, about the cost vs value vs added support etc. This isn't "core Hugo", so any new additions in this department should come cheaply, and this currently does not look like it.

@betaveros betaveros force-pushed the betaveros:pandoc-support branch from 182f139 to 4274cdc Nov 7, 2017

@betaveros

This comment has been minimized.

Copy link
Contributor

betaveros commented Nov 7, 2017

I dropped the options for now to keep things simple. If there's demand for external options we (or somebody else) can revisit this later.

Add Pandoc support, refactor external helpers
Recognize the Pandoc format under the file extension .pandoc or .pdc,
and shell out to pandoc as an external helper to format Pandoc content.

Refactor out repeated code with external helpers. Change the error
output formatting. I did not see any of the external helpers print the
string "<input>" to represent stdin as a file; just prepending the file
name to error output is more general and doesn't sacrifice that much in
terms of readability.

@betaveros betaveros force-pushed the betaveros:pandoc-support branch from 4274cdc to a3f0dc6 Nov 8, 2017

@betaveros

This comment has been minimized.

Copy link
Contributor

betaveros commented Nov 8, 2017

Actually I'm going to go out on a limb and make --mathjax the single option to pandoc, since the default math option is rather lame, MathJax is somewhat officially recommended in the Hugo documentation, and the output markup looks fine for rendering with other LaTeX libraries if people want to customize that.

@betaveros

This comment has been minimized.

Copy link
Contributor

betaveros commented Nov 13, 2017

Any updates? The PR as it currently stands should be fairly cheap; it doesn't use pandoc any differently from the way it previously used asciidoc/asciidoctor/rst2html.

@betaveros

This comment has been minimized.

Copy link
Contributor

betaveros commented Nov 22, 2017

@bep any updates on the status of this PR, or anything else you'd like to be changed?

@bep

This comment has been minimized.

Copy link
Member

bep commented Nov 22, 2017

It looks very good, I will merge it in "a couple of days"; I will probably have to get a 0.31.1 patch release out the window, and try to keep the master branch as quiet as possible right after a release.

@shawnohare

This comment has been minimized.

Copy link

shawnohare commented Nov 30, 2017

I stumbled upon this PR because I was interested in adding the feature as well. Thanks for putting in the effort to do this!

I wonder if the current references to "pandoc" (especially in the documentation) should be changed to more explicitly reference "pandoc markdown", since pandoc itself is not a format, per se.

@bep bep merged commit e69da7a into gohugoio:master Nov 30, 2017

3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
license/cla Contributor License Agreement is signed.
Details
@adityam

This comment has been minimized.

Copy link

adityam commented Jan 4, 2018

Is it possible to reconsider allowing options to pandoc (either via config.toml or via the options in each .pdc file)? In particular, I am thinking of the possibility of adding bibliography files and citation style. This will make it much easier to create course notes using hugo.

If you are reluctant to add arbitrary flags, one option is to simply add --filter pandoc-citeproc flag to pandoc.

@rdwatters

This comment has been minimized.

Copy link
Contributor

rdwatters commented Jan 4, 2018

@betaveros

I’m curious if you see utility in pandoc’s MD-to-Word functionality (what I admittedly have to use Pandoc for most at work) for Hugo sites. I could see some very cool COPE workflows for users who need to create PDF versions of content as well. I’ve been woefully absent in the Hugo world for the past few months. Thanks for your work on this. I might be an edge case, but I’m really excited about this feature!

@dreamcat4

This comment has been minimized.

Copy link

dreamcat4 commented Jan 23, 2018

Documentation here:

https://gohugo.io/content-management/formats/#additional-formats-through-external-helpers

This is a refactor to inlude panadoc support, still with default arguments - Hugo version 0.32 and higher.

@dreamcat4

This comment has been minimized.

Copy link

dreamcat4 commented Jan 23, 2018

Ah, it looks like you dropped this feature from the PR:

Add a configuration option, externalHelperArguments, that can be used to
override the additional arguments passed to external helpers.

Well I was actually very much looking forward to seeing that, for passing custom asciidoc default arguments. This is a valuable improvement (for asciidoc users).

I dropped the options for now to keep things simple. If there's demand for external options we (or somebody else) can revisit this later.

Yes I need this / would like to see this.

Is it possible to reconsider allowing options to pandoc (either via config.toml or via the options in each .pdc file)? In particular, I am thinking of the possibility of adding bibliography files and citation style. This will make it much easier to create course notes using hugo.

This is a similar / same reasons why asciidoctor users also need this externalHelperArguments override string in config.toml global configuration.

@betaveros do you still want to make a new PR for the externalHelperArguments feature? As a seperate PR ? @bep Can we have further discussions about this feature? Do you see the value in it?

@valouille

This comment has been minimized.

Copy link

valouille commented Feb 16, 2018

Hello,

For my usecase, I certainly see value in this feature, because I would like to be able to pass options to asciidoctor like -r asciidoctor-diagram for the generation of plantUML diagrams. Currently, I'm getting this error :

ERROR 2018/02/16 10:46:15 post/2018-01-012-doc.adoc: asciidoctor: WARNING: <stdin>: line 20: invalid style for literal block: plantuml

@bilke

This comment has been minimized.

Copy link

bilke commented Feb 22, 2018

@bep Would you allow the argument --filter pandoc-citeproc?

One had to check if the executable pandoc-citeproc is found in the path and if yes this argument can be added. I would make a PR if you agree.

cc: @adityam

@raghur

This comment has been minimized.

Copy link

raghur commented Aug 6, 2018

I've run into this as well - need to customize args passed to asciidoctor - attributes, plugins etc and right now there's no way to do it short of putting everything in each asciidoc file

Also, is something like PR 4310 planned? It doesn't say why it was closed (just points to this issue)

@kirubakaran

This comment has been minimized.

Copy link

kirubakaran commented Nov 11, 2018

@shawnohare Re your #4060 (comment)

I use org-mode. So I pass --from=org flag myself, as a temporary solution, as I've described here: https://kirubakaran.com/blog/org-pandoc-hugo/#update

I think that inferring from format from the filename would be a good solution. Example: foo.org.pandoc to send --from=org when calling Pandoc. Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment