Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] submit pandoc commandline options by metadata and/or separate yaml file #2069

Closed
bwl21 opened this issue Apr 11, 2015 · 17 comments

Comments

@bwl21
Copy link

bwl21 commented Apr 11, 2015

I open this ticket according to the following post in the google group https://groups.google.com/forum/#!searchin/pandoc-discuss/definition$20list/pandoc-discuss/ZKytjti_h2Q/pcagNlG9SdsJ

Summing up:

  1. pandoc should allow to choose at least reader extensions in the document metadata
  2. pandoc should allow to provide options in an extra metadata file provided by e.g.

    --metadatafile <file>
  3. pandoc should allow to customize options even in a default metadata file

In particular the change of definition list is a drastic one. The only way to deal with legacy documents is to enable compact_definition_list at any time. In a given environment there is not even a compatibility mode allowing to support both syntaxes on input and use pandoc as an implicit converter. One must chose the definition list format explicitly. The requested feature would make it easier to maintain legacy documents. For example I have documents which are adapted every other year and got the surprise ...

The metadata could be for example

---
mdreader:
 -   +fenced_code_blocks
 -   +compact_definition_lists
mdwriter:
 -   -backtick_code_blocks
 -   +fenced_code_blocks
 -   +compact_definition_lists
pandocoptions:
 -   --atx-headers
...
@jgm
Copy link
Owner

jgm commented Apr 11, 2015

+++ Bernhard Weichel [Apr 11 15 08:54 ]:

  1. pandoc should allow to choose at least reader extensions in the
    document metadata

That's a bit difficult, since we need to know the reader extensions in order to know whether (and also how) to parse the YAML metadata in the first place.

@bwl21
Copy link
Author

bwl21 commented Apr 12, 2015

isn't yaml_metadata_block enabled by default such that pandoc already parses the yaml metadata block by default? then it should be able to select the extensions accordingly.

So, we could have a hierarchy (decreasing significance). The settings could be merged. In that sense it would not be possible to turn off yaml_metadata_blcok in the document`s metadata block.

  • commandline
  • metadata block in document
  • pandoc_config.yaml in the folder of document
  • $HOME/pandoc_config.yaml
  • pandoc's builtin defaults

@jgm
Copy link
Owner

jgm commented Apr 12, 2015

+++ Bernhard Weichel [Apr 11 15 23:02 ]:

isn't yaml_metadata_blockenabled by default such that pandoc already parses the yaml metadata block by default? then it should be able to select the extensions accordingly.

I suppose they could be changed mid-parse. But then we'd face something like the Liar Paradox if you specified in the YAML metadata that yaml_metadata_block was disabled! (Not a serious objection -- the way it could work is that, going forward in the document, YAML metadata would no longer be recognized.)

@bwl21
Copy link
Author

bwl21 commented Apr 13, 2015

Indeed, this could be a problem. Maybe we could resolve the paradox by an extra rule as you propose. We have options like:

  • ignore -yaml_metadata_block in document's Metadata
  • ignore subsequent yaml metadata (this sound more pragmatic to me) It could even be a feature when combining multiple files of which each has a yaml metadata block.
  • disable the case by throwing a message

As I think of it, the approach raises even more quesions:

  • what to do if there is the "% " - Metadatablock at the beginning of the document.
    Does it turn off yaml_metadata_block?
  • As there could be multiple metadata-blocks, the extensions could be turned on/off within the document. Could that be supported? This would help if one combines contributions from different authors each with different extension settings.

@lierdakil
Copy link
Contributor

Frankly, I don't think that mixing document parsing strategy and metadata is a good idea. That said, it may be a nice feature to be able to define exact syntax used in document itself. I suppose we could devise a simple syntax extension allowing to specify exact flavor used for given Markdown document, something akin to vim modeline.

E.g, have first or last non-empty line in a document consist of comment like this, for example:

<!-- pandoc-markdown: +compact_definition_lists -->
or
<!-- pandoc-markdown: markdown_gfm -->

This would be fully compatible and would answer concerns about "Liar Paradox". Exact syntax is obviously debatable, that's just something I thought of from the top of my head.

I don't think specifying writer options/flavor in a document is very useful, unless one is often performing Markdown-to-Markdown transformation. That is not a very common use-case, I think. One case (that I can think of) where md2md transformation would be immediately helpful is combining contributions from different authors, and even then output format is not per-document, but rather should be common for all documents.

Being able to specify default Markdown flavor for both reader and writer (possibly separately) from a settings file could be a useful feature though. It would be a good idea to be able to both specify said settings file location on command line, and to have a default location for it (very much like templates are handled now). Exact format requires some discussion, as that would be nice to be able to set default reader and writer options, not just Markdown flavors in it, I think.

Thoughts?

@bwl21
Copy link
Author

bwl21 commented Apr 13, 2015

Well, we have the choice of mixing parsing strategy into metadata or to have an entirely different syntax.
I am now following the approach of having one style of syntax for all that stuff, which finallly ends up considering parsing strategy as metadata.

For me markdown to markdown conversion is a regular usecase:

  • you get an immediate feedback if your markdown file contains syntactical problems
  • you get normalized documents which are easier to diff
  • you harmonize flavors of different contributors

therefore I do this all all the time!

@lierdakil
Copy link
Contributor

@bwl21, in any case, I feel like specifying document flavor in a flavor-dependent block is just plain bad design. F.ex., gfm does not support yaml metadata blocks, so you can't specify that it's gfm and keep compatibility with gfm. Same goes for other flavors. It does not make sense to me.

I can't think of a flavor that does not understand HTML comments though, so that's a pro for my idea IMO. Con is it being yet another syntax extension, when we have more than enough already. But at least this one's not disruptive (i.e. will be silently ignored by parsers not supporting it)

@lierdakil
Copy link
Contributor

Oh, and I assume that in your use-case of md2md transformations, output format is not per-document, but rather per-project at least, so using an external config should be more convenient anyway, unless I utterly misunderstand something.

@jgm
Copy link
Owner

jgm commented Apr 13, 2015

@liedakil raises some good points here. Maybe it would be worth implementing the modeline-like syntax, but I'm not sure. It adds further complexity. If the line is at the beginning of the document, then it's incompatible with pandoc title blocks. So, it would probably have to be at the end. And then it might interact badly with things like references (at least with the current setup, where pandoc-citeproc looks for an empty references header at the end of the document -- this will probably be changed soon).

I have a simple solution to all these problems: Makefiles! Whenever I'm doing anything moderately complex, I just create a simple Makefile, like:

mydoc.pdf: mydoc.txt
<TAB> pandoc $< -o $@ --toc --smart -s -Vversion="1.1" -f markdown-pipe_tables

Then, typing make regenerates the output. This is, essentially, runnable documentation.

@lierdakil
Copy link
Contributor

Makefiles do nicely as a substitution for local config file, that much is true. However, ability to specify flavor in-document does add an extra feature: portability. For the sake of argument, I could send Markdown text to a colleague and not worry about him having trouble converting that. Right now, I'd have to also include part of my Makefile to be sure that everything goes smoothly.

Another point is global config. F.ex., I try to avoid setex headings and simple tables at all times -- it would be handy to be able to disable those globally for writer. At the moment I use a shell alias, but that's cognitive overhead I certainly could live without.

So, this proposal is very rational at its core. Details need some working out though.

@jgm
Copy link
Owner

jgm commented Apr 13, 2015

+++ Nikolay Yakimov [Apr 13 15 09:21 ]:

Makefiles do nicely as a substitution for local config file, that much is true. However, ability to specify flavor in-document does add an extra feature: portability. For the sake of argument, I could send Markdown text to a colleague and not worry about him having trouble converting that. Right now, I'd have to also include part of my Makefile to be sure that everything goes smoothly.

Yes, I see the point. Though, there are still many things that can go wrong: e.g. your colleague might use the wrong writer options. Attaching a Makefile is more failsafe.

Another point is global config. F.ex., I try to avoid setex headings and simple tables at all times -- it would be handy to be able to disable those globally for writer. At the moment I use a shell alias, but that's cognitive overhead I certainly could live without.

This part could be cured with a global Makefile that you just include in all the others.

So, this proposal is very rational at its core. Details need some working out though.

I agree, it's still worth thinking about.

@lierdakil
Copy link
Contributor

2015-04-13 19:36 GMT+03:00 John MacFarlane notifications@github.com:

+++ Nikolay Yakimov [Apr 13 15 09:21 ]:

Another point is global config. F.ex., I try to avoid setex headings and
simple tables at all times -- it would be handy to be able to disable those
globally for writer. At the moment I use a shell alias, but that's
cognitive overhead I certainly could live without.

This part could be cured with a global Makefile that you just include in
all the others.

Not for one-shot conversions though, and bigger projects usually convert
from Markdown, at least in my case. You get the point.

@bwl21
Copy link
Author

bwl21 commented Apr 13, 2015

I plan to solve the issue in my environment (https://github.com/bwl21/wortsammler) such that Wortsammler first reads the metadata of the input and adjust the pandoc command line. This corresponds to the makefile proposal. In addition to this I plan to implement the config hierarchy mentioned before.
Due to the non backwards compatible change in definition list, it is even necessary to choose a proper pandoc version :-) which I plan to handle in Wortsammler.

My proposal was indeed to handle the cases where plain pandoc is used "one shot conversions", respectively out of the box editor integrations (e.g. sublime) which act on one single file.

We all agree that the most important part is the configuration of the reader. In my Opinion, metadata is information about data. And the markdown reader configuration is meta information about the current markdown file. So I still think that adding it to the metadata block is still a valid approach.
As the metadata block can contain an arbitrary structure, there could be one more reserved entry named "pandoc". I tried it, pandoc preserves the specific entries, so it can interpret them as well.

This is, how I plan to represent this in Wortsammler's config file:

:pandoc:
  :system_command: ! 'pandoc_1.13.1 '
  :markdown_intput_switches:
  - -backtick_code_blocks
  - +fenced_code_blocks
  - +compact_definition_lists
  :markdown_output_switches:
  - -backtick_code_blocks
  - +fenced_code_blocks
  - -compact_definition_lists

If by whatever reason another syntax shall be applied, then I feel that xml processing instruction would be the adequate solution, not xml comments.

<?pandoc mardown-reader="-backtick_code_blocks+fenced_code_blocks+compact_definition_lists" ?>

But yaml metadata appears much more adequate to me.

@jgm
Copy link
Owner

jgm commented Apr 15, 2015

Oh, by the way, pandoc ignores fields in YAML that end with an
underscore. This is designed to allow you to include raw data that you
can process with external tools. So you could do:


pandoc_opts_:
system_command: ! 'pandoc_1.13.1'
markdown_input_switches:

  • -backtick_code_blocks
  • +fenced_code_blocks
  • +compact_definition_lists
    markdown_output_switches:
  • -backtick_code_blocks
  • +fenced_code_blocks
  • -compact_definition_lists
    title: My Title, etc.
    ...

and write a small wrapper script that reads the YAML at the head of the
input file, extracts the system options, and runs pandoc using these
options on the file.

@lierdakil
Copy link
Contributor

@bwl21, all your proposals are only valid in context of always using parser that supports your proposed extensions. That is not always the case, esp. in the context of multiple authors. XML processing instructions will be parsed by most parsers verbatim, which is certainly not something I would want.

HTML comments are mostly ignored, hence I suggest using them. I feel like that would be the least-disruptive option.

@bwl21
Copy link
Author

bwl21 commented Apr 20, 2015

I hereby close this issue as I fix it in my surrounding environment as described above.

@mb21
Copy link
Collaborator

mb21 commented Sep 18, 2018

Btw. the --metadata-file option is now implemented. For the rest, see #4627

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants