Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify command-line options using YAML metadata #4627

Closed
mb21 opened this issue May 4, 2018 · 42 comments
Closed

Specify command-line options using YAML metadata #4627

mb21 opened this issue May 4, 2018 · 42 comments

Comments

@mb21
Copy link
Collaborator

mb21 commented May 4, 2018

I'm creating this issue to close the more specific ones that fall into the category of "can I specify command-line option X using YAML metadata?"

You could write a bash script, or use one of the following third-party tools that build on top of pandoc and already implement the approach described in below comments):

Update

Pandoc now supports a --defaults option, which can be used as follows to specify command-line options from a file. If e.g. input.md contains:

---
standalone: true
...

# my title

rest of my document

You can call pandoc as follows:

pandoc --defaults input.md input.md

Yes, currently you have to specify the file twice: once for the --defaults option to read out the YAML, and once as the markdown input file as usual. There's a follow-up issue for this.

@mb21
Copy link
Collaborator Author

mb21 commented May 4, 2018

We could discuss whether its worth implementing this in pandoc itself. Possibly with a syntax like the following:

---
options_:
  - reference-doc: mydoc.docx
  - template: |
      `mytemplate.tex`{=latex}
      `mytemplate.html`{=html}
---

The syntax has to be valid YAML (therefore we need the | or potentially quotes around some values), and pandoc interprets the values as markdown (therefore we sometimes might have to wrap them in backticked code-spans to prevent nasty surprises).

The question is whether this approach is not more trouble than it's worth.

@jtkiley
Copy link

jtkiley commented May 4, 2018

+1

I use pandoc for a lot of things, but one set is producing brief write ups, letters, and envelopes. These are quick one-off documents, but it's nice to have them formatted consistently and nicely. Currently, I specify what I can in YAML and use a custom template and engine on the command line. That's inconvenient for one-off documents, compared to an academic paper or something with an ongoing set of revisions (and, presumably, a Makefile).

For my use, the ideal scenario would be this:

  1. Set everything (including output type) in YAML. This would be basically be part of a template.
  2. Run a command as simple as pandoc document.markdown on the command line (or, better, using script in Atom).
  3. Done.

A bonus would be a way of specifying that the output filename should be the same with a different extension (e.g., document.markdown makes document.pdf without needing to specify the literal name document in YAML in each file).

@iandol
Copy link
Contributor

iandol commented May 4, 2018

pandocomatic and panzer already handle this with I think much more flexibility (within-document settings that combine / override a cross-document yaml det of defaults), but I imagine for simple uses this would certainly be used by some users who do not want to install any additional tools...

@jgm
Copy link
Owner

jgm commented May 5, 2018 via email

@jtkiley
Copy link

jtkiley commented May 5, 2018

@iandol: Thanks! I'll give those a look.

@jgm: I suppose I was thinking of templates in two different senses. One is a markdown file that I would copy and use to create a new document. That's what I meant above. I use that pattern for things like envelopes where my envelope.tex template (the second kind) is expecting certain variable names to some in from the markdown file (which, incidentally, only has YAML content). For my use, it would be practically hard to eliminate the markdown template, as I'd have to memorize all of my variable names. Similarly, my letter and paper markdown templates include a number of YAML variables (controlling things like signature images). With that in mind, I was thinking of specifying command line options in that markdown template.

Your response also bring up an interesting difference in the use you design for and how I actually use it. You're suggesting a one-to-many relationship where a given document will routinely be converted to different output formats. My use is almost entirely one-to-one in that a given document is destined to end up in only one output format (though what format that is differs by document). That's why the command line options in the markdown file make more sense to me (but perhaps not to you): I almost never do anything other than a single output format for any given document.

So, perhaps the root of my suggestion is making the one-to-one workflow a first class use case.

I should note that the options idea would seemingly work pretty well on one computer, but I think it would be more complex for multiple computers (e.g., dotfiles repository, symlinks, Dropbox, or some combination), and it would make sharing the input document harder than just sending over a markdown file and a .tex template for someone else to edit/run. But, then again, I'm assuming my one-to-one workflow.

Sorry for the wall of text, and thanks again for thinking about this. It would be a big improvement for what I do.

@jgm
Copy link
Owner

jgm commented May 6, 2018 via email

@mb21
Copy link
Collaborator Author

mb21 commented May 6, 2018

If I understand the use-case of @jtkiley correctly, it's exactly about bundling everything (including the options) in one single, portable, file. Which is exactly what panzer does:

panzer adds styles to pandoc. Styles provide a
way to set all options for a pandoc document with one line (‘I want this
document be an article/CV/notes/letter’).

You can think of styles as a level up in abstraction from a pandoc
template. Styles are combinations of templates, metadata settings,
pandoc command line options, and instructions to run filters, scripts
and postprocessors. These settings can be customised on a per writer and
per document basis. Styles can be combined and can bear inheritance
relations to each other. panzer exposes a large amount of structured
information to the external processes called by styles, allowing those
processes to be both more powerful and themselves controllable via
metadata (and hence also by styles). Styles simplify makefiles, bundling
everything related to the look of the document in one place.

[...]

Styles are defined in a yaml file (example).
The style definition file, plus associated executables, are placed in
the .panzer directory in the user’s home folder (example).

A style can also be defined inside the document’s metadata block:

I'm guessing some people use make-files for this. But if you're coming from the world of GUIs and word processors, it would sound simpler to bundle up everything in one file and then run the export-to-PDF and export-to-HTML commands in your editor (say, Atom), and it would read all the options from the file metadata.

@jtkiley
Copy link

jtkiley commented May 6, 2018

@mb21: Yeah, the ease and portability are a big part of it. That said, I'm going to rework some of my stuff using panzer to try it out. It look like it would cover a lot of my individual friction points.

I do use Makefiles for my heavily-edited, version-controlled documents (usually academic papers), but I have plenty of things that are either one-off or at least more casual. It would be nice for those things (all of the templating included) to sync around to different computers easily and be easy to distribute to others. I can personally manage the complexity, but it does make collaboration harder, especially with people who typically use GUIs/Word (to be fair, nearly everyone else in my field). There's a payoff in automating low value-added work like citations or document-level presentation, but there's a complexity cost in installing, setting up, and using a workflow like this, and it would be nice (from my perspective) to put a dent in those costs.

It'll probably be a few days, but I'll circle back here once I try panzer. I know it's an n of 1, but do let me know if some specific examples would help. I can share some when I have a change to dig in with panzer.

@jgm
Copy link
Owner

jgm commented May 6, 2018 via email

@jtkiley
Copy link

jtkiley commented May 22, 2018

I tried panzer, and it's not really helpful for my case. First, of the three options I'd most like to specify inside the file (i.e. pdf-engine, template, and output), it only supports pdf-engine. So, I wouldn't really be saving much on the command line, and it wouldn't help with the friction with one-off documents, as I can't see a good way of automating running that command.

The options route would be a start, but it doesn't seem to help with automation. The really awesome outcome for me would be setting up a grammar for Atom using script. Then it's just a keyboard shortcut to produce a PDF, regardless of type.

I know that you have reasons for not wanting it in pandoc (though I do still hope to persuade you otherwise), but I really wish there were a way to streamline these kinds of uses. For things like envelopes, 90 percent of the work is making the PDF, not entering the address. It seems like that shouldn't be the case, whether it's supported within pandoc itself or something external.

@jgm
Copy link
Owner

jgm commented May 22, 2018 via email

@mb21
Copy link
Collaborator Author

mb21 commented Sep 17, 2018

I’ve found myself coming back to this issue.

There's a basic conceptual problem with putting all of this option
stuff in the md file itself: we need at least some options to be
settled before we even know how to read the file. We have to know
that the input format is markdown and that yaml_metadata is an enabled
extension.

I can see how it would be weird for pandoc to first naively parse the YAML metadata of the input markdown file without parsing the values as markdown, read out the options, and then re-parse the whole file using the specified options. It could be done, but architecturally it would be a weird thing to do for pandoc. But it would be useful.

So I wrote a simple script (~100 lines) that does exactly that: panrun.

The motivation is really that for one-off documents, I want to save the necessary pandoc options right in the file. (Just like rmarkdown users can simply open the file and hit that 'convert' button.) I don’t want to remember which document-class/style/theme I had decided to convert this document with. I don’t want to litter my filesystem with runpandoc.sh or template.html files for each one-off document. Finally, I didn’t want to “parse” YAML with sed, or use a complex tool that only works for certain options.

Anyway, I’ll see whether panrun serves me well. Let me know how it works for you: panrun/issues ;-)

@SylvainGuieu
Copy link

SylvainGuieu commented Jan 21, 2019

My option, for the template only was to use a pre-extention on the file name so a filename.letter.md tells my Makefile to look for a letter.tex template or letter.html template file to run pandoc.
This work well for me because it allows me to see the main kind of md file i have in a directory : *.tech note.md for thecnical notes, *.meeting.md for meeting minutes *.letter.md for letters etc...
Each produce standardised documents by type. For html, a css can also be included with the template in the same way.

The target assignment on my make file looks like:

$(OUTPUT)/%.letter.pdf : $(SOURCE)/%.letter.md
    $(PANDOC) $(PANDOC_OPTIONS) --template /path/to/templates/letter.tex $(PANDOC_PDF_OPTIONS) -o $@ $(PANDOC_HEADERS) $< $(PANDOC_FOOTERS)

This is easy to script also in a bash file.

@jtkiley
Copy link

jtkiley commented Jun 9, 2019

Thanks all for the ideas. I adapted some of the ideas here into a form that accomplishes most of what I want, and I've been successfully using it for a couple of months.

I created a directory hierarchy where each template type has a directory with a Makefile that uses wildcards to process a markdown file with the appropriate LaTeX template to produce the requested target. So, for a new letter, I copy a markdown template, edit, and then make 20190609_example.pdf to get the typeset version. Then, once I'm done (e.g., printing, uploading, emailing), I move the markdown and pdf to a _completed subfolder.

It works well for one-off letters (usually recommendations) and envelopes. My main projects already have Makefiles, so this wasn't an issue for those. It's a little less convenient for things in the middle of one-off and projects, like a document that should be grouped with other files but isn't something that I would version control. Those are rarer for me, so they have less friction than the one-off documents, though. I do not yet have a good way of automating pandoc in a text editor, but perhaps that is a future project.

I do still hope this is eventually implemented, but I appreciate the help here in helping me think through a good way to address most of the friction.

@mb21
Copy link
Collaborator Author

mb21 commented Jun 9, 2019

I do not yet have a good way of automating pandoc in a text editor

My PanWriter supports pandoc export, options are read from the document's YAML.

@bpj
Copy link

bpj commented Sep 30, 2019

My rather primitive take on default options is a Perl script which looks for a file ~/.runpandocrc, ./.runpandocrc or ./runpandocrc, slurps it and splits it into a list of "words" with Text::ParseWords (using the regex (?:\s+|\#.*) as delimiter so as to allow line comments) and then invokes pandoc with this list prepended to the commandline. It has some options of its own to read in options from additional/alternative files and intercepts the --from --read -f -r and --to --write -t -w options and the -M and -V options in order to allow setting and unsetting extensions separately from formats on the command line or in the file, and to allow unsetting Metadata and variables from the file via a home-cooked syntax with --rx +=EXTENSION and the like, but mostly it just passes the command line on to pandoc.
This at least has the advantage that it doesn't really require a new syntax.

@mb21
Copy link
Collaborator Author

mb21 commented Oct 12, 2019

Interestingly, with the new --defaults option (currently in the nightly builds, to be released with pandoc 2.8), we almost sort of got this. I was expecting that with this in foo.md:

---
standalone: true
---

# test file

you could run:

pandoc --defaults foo.md foo.md

But currently this fails with unexpected multiple YAML documents, probably because of the Y.decode1 in the source code. Maybe this could be changed to Y.decode and simply take the first one?

@jgm
Copy link
Owner

jgm commented Oct 12, 2019

@mb21 that's a nice trick, but I think it's going to cause too many problems if we allow that.

  1. It would only work if the first YAML block in the markdown file contains only fields --defaults knows. Otherwise an error would be raised.
  2. All of these fields would go into the document's metadata, and might come out e.g. in meta tags, but they're not metadata.
  3. The fields would be parsed as markdown (perhaps harmless).

One idea I've toyed with is allowing something like:

---
defaults_:
  standalone: true
  columns: 78
# now comes the real metadata
title: Foo
...

if we taught --defaults to check the YAML for a defaults_ field and use it if present, this might work. Note that, as documented, YAML metadata fields ending in _ aren't included in metadata or parsed as markdown.

@jgm
Copy link
Owner

jgm commented Oct 12, 2019

Or maybe we could tell pandoc not to parse a YAML metadata section with anchor defaults:

---
&defaults
standalone: true
columns: 78
...

That looks clean.

@jgm
Copy link
Owner

jgm commented Oct 12, 2019

See haskell-hvr/HsYAML#39
for a blocking issue (though we could manually crop the input if necessary).

@mb21
Copy link
Collaborator Author

mb21 commented Oct 12, 2019

Yes, my attempt was definitely a hack.

I don't think a lot of people know about YAML anchors and it will unnecessarily confuse them. But I like having the options as a subfields (e.g. under defaults_:). Maybe defaults_ is not the most descriptive name though, what about something like options_ or output_?

@mitinarseny
Copy link

I don't know if this is relevant to this discussion or has been discussed before (a lot of text here), but it would be really convenient and meaningful if YAML block in .md document can contain variable which specifies extensions and other options, that should be used by default to process current document. For example, here is contents of example.md.

---
title: Document with latex macros
_defaults:
  extensions:
    - +latex_macros
  output:
    html:
      katex: true
---

\providecommand{\mathFunc}[4]{#1\left#2\, #3 \,\right#4}
\providecommand{\mathbbFunc}[4]{\mathFunc{\mathbb{#1}}{#2}{#3}{#4}}
\providecommand{\mathrmFunc}[4]{\mathFunc{\mathrm{#1}}{#2}{#3}{#4}}
\providecommand{\Prob}[1]{\mathbbFunc{P}{(}{#1}{)}}
\providecommand{\Expect}[1]{\mathbbFunc{E}{[}{#1}{]}}
\providecommand{\Var}[1]{\mathrmFunc{Var}{[}{#1}{]}}

# Normal Distribution
Here is the definition of Normal Distribution
$$\begin{gathered}
    \left\{ \eta \sim N(\mu, \sigma^2) \right\}\\
    \Updownarrow\\
    \left\{\begin{gathered}
        F_\eta(x) = \Prob{\eta < x} = \int_{-\infty}^{x} f_\eta(x)dx,\\
        \text{where} f_\eta(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}
    \end{gathered}\right\}
\end{gathered}$$

## Expected Value

$$\boxed{
    \Expect{\eta} = \mu
}$$

## Variance

$$\boxed{
    \Var{\eta} = \sigma^2
}$$

Here I define some commands with \providecommand. It makes sense to list extensions within document as it USES them and it would translate to inappropriate output if latex_macros is not enabled. katex: true means that --katex option should be enabled by default when exporting to html document type. When I write this example.md I test it with KaTeX and in most cases I will use it for future exporting.
So, instead of writing

pandoc example.md --from=markdown+latex_macros --katex -o example.html

I would simply write:

pandoc example.md -o example.html

And get following output in browser:
example.html
While was writing this comment, I eventually found out that latex_macros in enabled by default (see pandoc --list-extensions). But the same thing could be applied to hard_line_breaks.

Another solution would be to move some of extensions (latex_macros, hard_line_breaks), which are associated with way of writing (not translating) .md document, to variables, so that they can be set from YAML block within .md file. I find this rather more logical, but I am not sure if I fully understand reasons, why they are extensions and not variables.

P.S. I'd like to thank so much everybody who contribute to Pandoc! I recently discovered it and now I am happily using it for my academic papers in uni and try to launch blog based on Pandoc and GitHub Pages.

@mb21
Copy link
Collaborator Author

mb21 commented Oct 19, 2019

@mitinarseny yes, this is exactly what this issue is about :) (see the first post)

@narg95
Copy link

narg95 commented Oct 20, 2019

+1

@mitinarseny
Copy link

It will be very useful if yaml metadata block could also contain filters: [filter1, filter2] that are needed to be applied by default to this document in corresponding order. —filter filter3 cli option should append filter to filters declared in yaml. And —no-yaml-filtersoption that will cancel usage of these filters would be useful, too.

@mb21
Copy link
Collaborator Author

mb21 commented Nov 6, 2019

@mitinarseny see also #5870

@kysko
Copy link

kysko commented Nov 9, 2019

I hope this is the right place for these two comments:

Order of options on command line

How does the -M or --metadata option play into this?
It seems to depend on the order on the command line.

Say I have the following markdown (a.md), default yaml (d0.yaml) and command line:

# Foo
standalone: true
setext-headers: false
metadata:
  author: me
pandoc a.md -M test1 -d d0 -M test2 -o a_result.md

Then we have the result a_result.md:

---
author: me
test2: true
---

# Foo

So if an -M is placed before -d d0, it is ignored if there's a metadata option in the default, even if the latter doesn't have that particular metadata key.
When the metadata lines are removed in d0.yaml, both tests come out.
If this is the expected result, perhaps a few words in the manual would be good.

However, when putting test: true in a standalone metadata file, the result is as expected (not ignored) whether it is put before or after -d d0.

atx/setex options

Since --defaults was described as a way to "specify a package of options", I began by inserting atx-headers: true in the above default yaml, but got an error. Checking the example, I saw it should be setext-headers: false instead. Yet, I see no --setext-headers option for command line in the Manual.
Not a problem, I just wonder why it doesn't reflect the existing --atx-headers option, for consistency.

@jgm
Copy link
Owner

jgm commented Nov 11, 2019

@kysko these useful comments should go in a separate issue, as they don't concern the feature under discussion here, but rather the behavior of the --defaults option.

@kysko
Copy link

kysko commented Nov 11, 2019

Done.
Sorry, I thought this was the issue of origin leading to --defaults

@ghost
Copy link

ghost commented Feb 20, 2020

+1. I would really like this to be implemented. Panzer is no longer being developed because most of its functionality is now integrated into Pandoc itself. Even though Panrun and Pandocomatic are still active, removing the dependency on external tools would be nice.

@bpj
Copy link

bpj commented Feb 27, 2020

https://pandoc.org/MANUAL.html#default-files

OK not in the document metadata but good enough for me!

@lyndondrake
Copy link

Now that we have --defaults, is there any chance that there might be a default --defaults file? Where this would help is other tools which invoke Pandoc (e.g. Hugo) but where it's impossible to change the command line passed to Pandoc.

@jgm
Copy link
Owner

jgm commented Apr 13, 2020

I'd worry about the security implications of a default defaults file. But this shouldn't be discussed here -- use pandoc-discuss.

@jgm
Copy link
Owner

jgm commented Apr 13, 2020

@lyndondrake for your use case why not create a shell script that passes on arguments to pandoc and includes some new ones? Name it pandoc and put it in your path before real pandoc, so Hugo will use it.

@alerque
Copy link
Contributor

alerque commented Apr 13, 2020

@jgm I can think of several reasons that is a bad solution. It's a hack that could work, but not a solution. First, it would not be project specific and would break other projects unless you did some very creative hacking with env and path variables. And even if you did, catching things in the PATH before system paths is a bad idea for many reasons and strongly discouraged by most sysadmins. Whatever you did to hack that in inevitably wouldn't be portable and wouldn't map well to use in CI runners, etc.

@lyndondrake
Copy link

I'd worry about the security implications of a default defaults file. But this shouldn't be discussed here -- use pandoc-discuss.

Apols - I'll take it across there.

@brainchild0
Copy link

Is this issue fully succeeded by #5790 and #5870?

@tarleb
Copy link
Collaborator

tarleb commented May 14, 2021

I think @brainchild0 is right, and remaining issues should be discussed in #5870.

@tarleb tarleb closed this as completed May 14, 2021
@hoclun-rigsep
Copy link

The pandoc -d doc.md doc.md approach described in the top post here fails for me on a recent version with "Multiple YAML documents encountered."

@jgm
Copy link
Owner

jgm commented Jan 13, 2022

@hoclun-rigsep I believe this is due to our switch from HsYaml to yaml for YAML parsing.
I may be able to add some code to restore the former behavior.

jgm added a commit that referenced this issue Jan 13, 2022
This line signals the end of a YAML document.
This restores the behavior we got with HsYaml.
yaml complains about content past this line.
See #4627 (comment)
@jgm
Copy link
Owner

jgm commented Jan 13, 2022

OK, this should work again after 0d1ba3d

@mboyea
Copy link

mboyea commented Jun 25, 2024

The pandoc -d doc.md doc.md approach described in the top post here fails for me on a recent version with "Multiple YAML documents encountered."

This is failing for me in today in Pandoc 3.2 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests