Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: use variables in document body text #1950

Open
DaveJarvis opened this issue Feb 17, 2015 · 22 comments

Comments

@DaveJarvis
Copy link

commented Feb 17, 2015

variables.yaml:


---
protagonist:
- first: Ishmael
antagonist:
- first: Moby-Dick
- classification: whale
- colour: white
- possessive: #{protagonist.first}'s

---

(Only back references allowed for one-pass parsing.)

chapters/1.md:

"Call me #{protagonist.first}. I won't rest until I've mounted #{antagonist.possessive} fluke on my roof. That giant #{antagonist.color} fish of the sea is my nemesis."

Then pandoc variables.yaml chapter/1.md would write the following HTML to stdout, with the variables from the markdown file substituted using the values from the YAML file:

<html>
<body>
<p>
"Call me Ishmael. I won't rest until I've mounted Moby-Dick's fluke on my roof. That giant #{antagonist.color} fish of the sea is my nemesis."
</p>
</body>
</html>

Since color couldn't be found (due to the variable name being colour), no variable substitution is made. To stderr, a listing of all missing variables:

#{antagonist.color}

If this is already possible with pandoc, please link to the documentation showing a clear example for how to accomplish this task (without using templates, as they are inappropriate for this situation).

Ideas on how to write a preprocessor for markdown documents (that could then be piped to pandoc) are also quite welcome.

@jgm

This comment has been minimized.

Copy link
Owner

commented Feb 17, 2015

+++ Dave Jarvis [Feb 16 15 19:20 ]:

If this is already possible with pandoc, please link to the documentation showing a clear example for how to accomplish this task (without using templates, as they are inappropriate for this situation).

Template expansion occurs only in template, not in body text.

However, nothing stops you from using a Markdown file as a template
for itself. Take this my.md:

---
hello:
  english: world
  german: Welt
...

Hello $hello.english$.

Now do

% pandoc my.md --template my.md | pandoc -t html
<p>Hello world.</p>

This is a bit roundabout, admittedly. But it works.

@DaveJarvis

This comment has been minimized.

Copy link
Author

commented Feb 17, 2015

Thank you jgm: that's an interesting work around and a good idea, given the constraints. Iterating over multiple chapters makes the problem a bit more difficult. A small shell script that first combines the variables with each chapter is useful:

#!/bin/bash
OUTDIR=output
rm -f $OUTDIR/*
mkdir -p $OUTDIR

for i in chapter/*.md; do
  out=$OUTDIR/$(basename $i);
  cat variables.yaml $i > $out;
  pandoc $out --template $out | \
    pandoc -t context > $OUTDIR/$(basename $i .md).tex;
done

This way the variables can be saved in a single file, without having to reference the file in every chapter. That said, the following would be a simpler, cleaner, and much more robust solution:

pandoc --variables variables.yaml chapter/1.md -t context -o chapter/1.tex

Piping the combined variables and chapters directly to pandoc won't work because the --template option cannot read from standard input.

@nkalvi

This comment has been minimized.

Copy link

commented Feb 17, 2015

I like the power of pandoc!
/cc @jgm Would it be helpful to print an error/warning if a variable's value cannot be found?

/cc @DaveJarvis
I took your example as an exercise - let me know whether it'll work :)

Expand variables file first (if it uses variables self):

pandoc variables.yaml --template variables.yaml > var-exp.yaml

Can we use xargs instead of script?

ls chap* | xargs -I file pandoc --template file var-exp.yaml file | pandoc -t context
@jgm

This comment has been minimized.

Copy link
Owner

commented Feb 17, 2015

+++ nkalvi [Feb 17 15 06:53 ]:

I like the power of pandoc!
/cc @jgm Would it be helpful to print an error/warning if a variable's value cannot be found?

No, because in lots of templates we test for a variable being set with an "if". Printing warnings would generate lots of spurious warnings.

@nkalvi

This comment has been minimized.

Copy link

commented Feb 17, 2015

/cc @jgm That's what I thought why it wasn't done. Thanks.

@DaveJarvis

This comment has been minimized.

Copy link
Author

commented Feb 18, 2015

No, because in lots of templates we test for a variable being set with an "if". Printing warnings would generate lots of spurious warnings.

It is possible to filter warnings. For example:

pandoc --stderr=variables,conversion,formatting ...

If only variable-related errors are desired, then:

pandoc --stderr=variables ...

That said, why is testing for a variable being set repeated throughout the code? Shouldn't all the code rely on a single function so that variable tests are performed in one spot?

What would it take to track of referenced variables that could not be found, then list those (and the context) that couldn't be dereferenced? For example:

warning variables.yaml: $antagonist.color$ not found

For variables from standard input:

warning stdin: $antagonist.color$ not found
@jgm

This comment has been minimized.

Copy link
Owner

commented Feb 18, 2015

+++ Dave Jarvis [Feb 17 15 17:52 ]:

No, because in lots of templates we test for a variable being set with an "if". Printing warnings would generate lots of spurious warnings.

That said, why is testing for a variable being set repeated throughout the code?

Not throughout the code. This is all handled in the Templates module.
My point was that many templates have variables that may or may not be
set, and this is a useful feature. So the suggested warning would
trigger many spurious, non-useful warnings.

@bpj

This comment has been minimized.

Copy link

commented Feb 20, 2015

Den 2015-02-17 04:20, Dave Jarvis skrev:

variables.yaml:

---
protagonist:
- first: Ishmael
antagonist:
- first: Moby-Dick
- classification: whale
- colour: white
- possessive: #{protagonist.first}'s
---

(Only back references allowed for one-pass parsing.)

chapters/1.md:

"Call me #{protagonist.first}. I won't rest until I've mounted #{antagonist.possessive} fluke on my roof. That giant #{antagonist.color} fish of the sea is my nemesis."

Then pandoc variables.yaml chapter/1.md would produce to stdout:

"Call me Ishmael. I won't rest until I've mounted Moby-Dick's fluke on my roof. That giant #{antagonist.color} fish of the sea is my nemesis."

Since color couldn't be found (due to the variable name being colour), no variable substitution is made. To stderr, a listing of all missing variables:

#{antagonist.color}

If this is already possible with pandoc, please link to the documentation showing a clear example for how to accomplish this task (without using templates, as they are inappropriate for this situation).

Ideas on how to write a preprocessor for markdown documents (that could then be piped to pandoc) are also quite welcome.

I use [Template::Toolkit][] to do this among other things,
including reading variables from a YAML file, having written my
own commandline wrapper script -- which I'll share if you are
interested -- which can either read in one set of variables and
apply them to several templates/documents or read in several sets
of variables and apply them in turn to the same document template.
Unfortunately the commandline wrapper which comes with TT can't
read variables from files, and the only other publicly available
wrapper which can has some issues with the current version of TT.
You can use any tag delimiters you want with TT on a per document
basis, even regular expressions, but if the tag delimiters are
e.g. {% and %} TT sees all instances of those characters, or
all matches against the regular expression, as tag delimiters, so
you can't use something which clashes with regular Pandoc syntax
like {# or } (It would be an extremely bad idea to use a
single curly pracket as tag delimiter!) but e.g.
#{protagonist.first}# which is close to your preferred syntax
would work.

I usually use double backticks around curly brackets as tag delimiters

 ``{protagonist.first}``

because the template tags will then stand out as 'code' if I
render the doc with pandoc without running it through TT (for
proofing), and if I actually need a multi-backtick code span which
begins/ends with braces I just put a space between the backticks
and the bracket:

 `` { } ``

Pandoc will see a code span beginning and ending with curly
brackets in both cases, but TT won't see the latter as tag delimiters.

@DaveJarvis

This comment has been minimized.

Copy link
Author

commented Feb 21, 2015

nkalvi:

ls chapter/* | xargs -I file pandoc --template file variables.yaml file | pandoc -t context

Good idea, but it doesn't quite reproduce the same output as the script. Also, running the variables through itself is a nice way to help resolve references.

bpj:

which I'll share if you are interested

I appreciate the offer and will let you know if the scripts start to become a time-waster. The only part that remains unsolved is the ability to know when a missing/non-existent tag is used. If there was a feature that prevented pandoc from substituting empty strings for undefined variables, then it'd be easy to grep the output for variables that were not dereferenced.

@DaveJarvis

This comment has been minimized.

Copy link
Author

commented Sep 23, 2016

I've written a Java application that resolves these issues and more.

https://bitbucket.org/djarvis/yamlp

@michaelstepner

This comment has been minimized.

Copy link

commented May 13, 2017

@jgm Apologies for digging up your 2 year old comment, but I liked this solution you suggested:

However, nothing stops you from using a Markdown file as a template for itself.

Yet I'm finding that having inline math prevents me from using a Markdown file as a template for itself. Adapting your example, take this my.md:

---
hello:
  english: world
  german: Welt
...

Hello $hello.english$. Did you know $1+1=2$?

Now do:

$ pandoc --template my.md my.md | pandoc -t markdown
pandoc: "template" (line 7, column 38):
unexpected "1"
expecting letter
CallStack (from HasCallStack):
  error, called at src/Text/Pandoc/Templates.hs:73:35 in pandoc-1.19.2.1-J1nmFBg9ln971v0RrPbKLJ:Text.Pandoc.Templates

I suspect I should handle this by using a template processor like Mustache or Liquid to preprocess the markdown, instead of the workaround that uses the markdown file as a template. But I thought I'd see if you had an alternative suggestion/workaround first 😄

@DaveJarvis

This comment has been minimized.

Copy link
Author

commented May 13, 2017

Define the calculation in YAML. For example:

  game:
    played:
      first: $date.protagonist.born$ - 672

Then reference the YAML variable within the document.

@michaelstepner

This comment has been minimized.

Copy link

commented May 14, 2017

Define the calculation in YAML.

@DaveJarvis, my goal is to typeset an equation in LaTeX/MathJAX, not perform a calculation. But your suggestion was a good idea.

@mb21 mb21 changed the title Feature request: reference variables in content text Feature request: use variables in document body text Oct 7, 2018

@mb21 mb21 added the enhancement label Oct 7, 2018

@mb21

This comment has been minimized.

Copy link
Collaborator

commented Oct 7, 2018

I'm reopening this as a feature request. Note that multimarkdown supports this under the name Metadata “Variables”. For example:

---
my name: John Doe
---

Best regards, [%my name]

Yes, weirdly you can put a space in there (and no, there is no way to access nested values).

Something like this could be easily implemented in the markdown reader, or just as a pandoc filter. Thoughts @jgm?

@mb21 mb21 reopened this Oct 7, 2018

@michaelstepner

This comment has been minimized.

Copy link

commented Oct 7, 2018

@mb21: The pandoc-mustache filter that I've written satisfied my desire for this feature. (Although it may not satisfy everyone's needs!) Here's an example, pasted from the README for pandoc-mustache:

Example

This document, in document.md:

---
mustache: ./le_gaps.yaml
---
The richest American men live {{diff_le_richpoor_men}} years longer than the poorest men,
while the richest American women live {{diff_le_richpoor_women}} years longer than the poorest women.

Combined with these variable definitions, in le_gaps.yaml:

diff_le_richpoor_men: "14.6"
diff_le_richpoor_women: "10.1"

Will be converted by pandoc document.md --filter pandoc-mustache to:

The richest American men live 14.6 years longer than the poorest men, while the richest American women live 10.1 years longer than the poorest women.

@DaveJarvis

This comment has been minimized.

Copy link
Author

commented Oct 7, 2018

(Although it may not satisfy everyone's needs!)

There are a few key aspects that would make this feature more versatile:

  • Filename. Provide the name of the file containing variables on the command line. Such as:
    • pandoc document.md --filter pandoc-mustache variables.yaml
  • Delimiters. Ability to define the start and end token delimiters, as hard-coding is an unnecessary restriction. See:
  • String interpolation. This YAML preprocessor first performs recursive string interpolation before attempting to substitute back into the document. The algorithm is a trivial 8 lines of code, once the data structures are defined.

See: michaelstepner/pandoc-mustache#5

@michaelstepner

This comment has been minimized.

Copy link

commented Oct 7, 2018

@DaveJarvis The pandoc-mustache filter is certainly quite barebones (but also quite useful to me). Anyone interested in improving it should check out the Contributing section of the README.

Further discussion of pandoc-mustache feature requests should probably be posted to the pandoc-mustache repo rather than this issue.

@jgm

This comment has been minimized.

Copy link
Owner

commented Oct 7, 2018

@DaveJarvis

This comment has been minimized.

Copy link
Author

commented Aug 1, 2019

There's actually a sample lua filter in the docs for doing just this:

It's pretty close and an excellent example, but has practical shortcomings, some easier to resolve than others:

  • Escaped dollar symbols. Having to escape the $ signs is not directly compatible with pandoc's existing ability to parse YAML variables by piping pandoc through pandoc.
  • Interpolation. It seems this is an arduous feature to implement and there are a number of edge cases.
  • Namespaces. No dot-notation for organizing variables is supported.

Using lua makes calling pandoc simpler. For example, compare the following invocations:

cat *.md > body.md
pandoc body.md --lua-filter=variables.lua \
  --metadata-file=interpolated.yaml -t context > body.tex

# ...versus the equivalent....
cat interpolated.yaml > body.md
cat *.md >> body.md

pandoc body.md --template body.md --metadata pagetitle="unused" | \
    pandoc -t context > body.tex

Such simplifications using lua would make complex format conversions faster and easier to maintain (fewer lines of code).

Namespaces are quite helpful for organizing data in a meaningful way. Consider:

ice_make: "Lexus"
ice_model: "LS 430"
ice_year: "1991"

ice:
  make: "Lexus"
  model: "LS 430"
  year: 1991
ev:
  make: "Ford"
  model: "Focus Electric"
  year: 2019

The lua filter assumes a flat hierarchy of variable names (e.g., ice_make), which is understandable; however, the ice_ prefix is duplication that is best avoided to ease maintainability.

@mb21

This comment has been minimized.

Copy link
Collaborator

commented Aug 23, 2019

Maybe we could adjust the example lua filter jgm mentioned above, and make it a somewhat more official solution? Or do you think it's worth doing this in the markdown reader?

I agree with @DaveJarvis:

  • change the syntax to something else than dollars (as they're taken by math already). I'm fine with multimarkdown's [%author] (not sure that spaces should be allowed though).
  • allow dot notation like [%author.last_name]

P.S. Not sure what @DaveJarvis meant with "Interpolation".
P.P.S. I don't think we'd need the control structures (if, for, etc.) of pandoc-templates.

@DaveJarvis

This comment has been minimized.

Copy link
Author

commented Aug 24, 2019

P.S. Not sure what @DaveJarvis meant with "Interpolation".

See: https://en.wikipedia.org/wiki/String_interpolation

manufacturer:
  ford:
    name: Ford
ev:
  full: $ev.year$ $ev.make$ $ev.model$ 
  model: Focus Electric
  make: $manufacturer.ford.name$
  year: 2019

The value $ev.full$ resolves to 2019 Ford Focus Electric.

change the syntax to something else than dollars (as they're taken by math already). I'm fine with multimarkdown's [%author] (not sure that spaces should be allowed though).

Preferably it would work with any sigil or start/end token delimiters, provided by the user. My yamlp provides this facility using a regular expression; Red Hat Fuse also allows customizing start and end tokens; Apache Camel might also have similar functionality --- point being there's really little reason to hard-code the sigils when more flexible approaches exist.

The overall algorithm becomes:

  1. Load and parse a Markdown document with YAML header.
  2. Pass the YAML header through the string interpolation preprocessor (lua or otherwise).
  3. Replace the original YAML header with the preprocessed header.
  4. Apply the resulting YAML hierarchy to the Markdown document.
  5. Transform the AST as per usual.

Having an option to preprocess and export YAML files alone would also be useful. For example, an empty Markdown document having no body but a YAML header. Like the following example.md file:

---
manufacturer:
  ford:
    name: Ford
ev:
  full: $ev.year$ $ev.make$ $ev.model$ 
  model: Focus Electric
  make: $manufacturer.ford.name$
  year: 2019
---

Then something like:

pandoc --lua-filter=preprocess.lua --lua-args "start-token='$' stop-token='$'"  example.md

Produces (note the lack of quotation marks for numeric values):

---
manufacturer:
  ford:
    name: "Ford"
ev:
  full: "2019 Ford Focus Electric"
  model: "Focus Electric"
  make: "Ford"
  year: 2019
---

With a default maximum of 20 substitutions per key. Any keys having variable references that are nested deeper than the maximum will result in the last (e.g., 20th) key name being substituted without any corresponding value. This prevents infinite loops in interpolated references. The number 20 is arbitrary, but could be configurable. Similarly, any key that has no reference remains as its placeholder name, such as:

key1: value1
key2: $missing.key$

The value of key2 resolves to $missing.key$.

By processing the YAML header before pandoc parses the entire document, it prevents having to escape the dollar symbols (i.e., \$) or use a specific symbol set (e.g., [% and ]). My understanding is that pandoc -t context+tex_math_dollars allows the user control whether $ symbols are interpreted as inline math expressions.

See also: https://dave.autonoma.ca/blog/2019/07/06/typesetting-markdown-part-5/

Being able to configurable the variable path separator token (.) to use a user-specified value would offer greater flexibility. This would allow users to supply XPath-like references and other unconstrained possibilities, such as:

  • [%author/name/last]
  • ${author.name.last}
  • $author>name>last$
  • {{author🠖name🠖last}}
  • `r v$author$name$last`
    • Uses `r v$ to start, ` at end, and $ to separate (a contrived example based on R Markdown variables).
@alerque

This comment has been minimized.

Copy link

commented Sep 12, 2019

Thanks for the great write up @DaveJarvis! That technique served me pretty well for several projects.

I've since landed on one that didn't go very well, but I realized what I was trying to do was fundamentally different. I wasn't iterating over data so much as localizing content based on context. Hence I ended up with a frustrating mess of YAML 'data' that didn't quite make sense and it was unclear how to generate Markdown that had what I wanted.

In the end I realized that i18n tools were closer to what I needed, and I started pre-processing my content files with handlebars. By default that was not much different that the YAML data substitution approach using Pandoc templates talked about above, but it allowed me to write a helper application to do something different that just substitute data from a table. What I ended up with was handlebars-helper-fluent which wraps the Project Fluent i18n tools (specifically the JS toolkit) into a Handlebars helper. Now I can use both YAML data and FTL message files to provide content to inform my template. Once hbs-cli fills in all the blanks for me using ether it's own substitutions for the string data or Fluent for localization (or data transformations that are functionally similar to translation), then the content gets passed to Pandoc.

Hopefully somebody else finds that helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.