New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw vs parsed metadata fields #2139

Closed
dato opened this Issue May 7, 2015 · 29 comments

Comments

Projects
None yet
7 participants
@dato

dato commented May 7, 2015

Hello!

I have the following use-case of header-include:


---
title: Pandoc bug?
header-includes: \pgfpagesuselayout{2 on 1}[a4paper]

---

Test.

When generating a PDF, the string [a4paper] gets converted to the invalid LaTeX sequence {[}a4paper{]}:

% pandoc -s -w latex <foo.md | grep uselayout
\pgfpagesuselayout{2 on 1}{[}a4paper{]}

Am I doing something wrong?

I also tried the following variations, to no avail:


---
title: Pandoc bug?
header-includes: |
  \usepackage{pgfpages}
  \pgfpagesuselayout{2 on 1}[a4paper]

---

Test.

And:


---
title: Pandoc bug?
header-includes:
  - \usepackage{pgfpages}
  - \pgfpagesuselayout{2 on 1}[a4paper]

---

Test.

May thanks in advance! My pandoc version is 1.13.2.

@dato

This comment has been minimized.

Show comment
Hide comment
@dato

dato May 7, 2015

(Note that this doesn't happen when using --include-in-header from the command line.)

dato commented May 7, 2015

(Note that this doesn't happen when using --include-in-header from the command line.)

@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm May 7, 2015

Owner

Here's a clue:

% pandoc -t native   # [edit: fixed command line]
 \pgfpagesuselayout{2 on 1}[a4paper]
[Para [RawInline (Format "latex") "\\pgfpagesuselayout{2 on 1}",Str "[a4paper]"]]

The basic problem is that pandoc has no way of knowing that the command \pgfpagesuselayout consumes an optional argument after its first argument. It guesses, incorrectly, that [a4paper] is regular text, not part of a LaTeX command. And then, when the regular text brackets are written back out to LaTeX, they're escaped.

Owner

jgm commented May 7, 2015

Here's a clue:

% pandoc -t native   # [edit: fixed command line]
 \pgfpagesuselayout{2 on 1}[a4paper]
[Para [RawInline (Format "latex") "\\pgfpagesuselayout{2 on 1}",Str "[a4paper]"]]

The basic problem is that pandoc has no way of knowing that the command \pgfpagesuselayout consumes an optional argument after its first argument. It guesses, incorrectly, that [a4paper] is regular text, not part of a LaTeX command. And then, when the regular text brackets are written back out to LaTeX, they're escaped.

@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm May 7, 2015

Owner

When you use --include-in-header from the command line, the line is included verbatim, not parsed as Markdown. Metadata fields are parsed as Markdown.

Owner

jgm commented May 7, 2015

When you use --include-in-header from the command line, the line is included verbatim, not parsed as Markdown. Metadata fields are parsed as Markdown.

@lierdakil

This comment has been minimized.

Show comment
Hide comment
@lierdakil

lierdakil May 7, 2015

Contributor

Makes sense. As I see it, there are two ways to handle this: either modify
Markdown parser to understand this type of LaTeX commands, or invent some
syntax to allow inclusion of verbatim strings in metadata. Former would be
better generally, but may be a bit fragile, I think. Latter sounds like a
hack, but may be useful in some cases. Are there any reservations for any
of these options?

2015-05-07 19:16 GMT+03:00 John MacFarlane notifications@github.com:

When you use --include-in-header from the command line, the line is
included verbatim, not parsed as Markdown. Metadata fields are parsed as
Markdown.


Reply to this email directly or view it on GitHub
#2139 (comment).

Contributor

lierdakil commented May 7, 2015

Makes sense. As I see it, there are two ways to handle this: either modify
Markdown parser to understand this type of LaTeX commands, or invent some
syntax to allow inclusion of verbatim strings in metadata. Former would be
better generally, but may be a bit fragile, I think. Latter sounds like a
hack, but may be useful in some cases. Are there any reservations for any
of these options?

2015-05-07 19:16 GMT+03:00 John MacFarlane notifications@github.com:

When you use --include-in-header from the command line, the line is
included verbatim, not parsed as Markdown. Metadata fields are parsed as
Markdown.


Reply to this email directly or view it on GitHub
#2139 (comment).

@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm May 7, 2015

Owner

+++ Nikolay Yakimov [May 07 15 09:25 ]:

Makes sense. As I see it, there are two ways to handle this: either
modify
Markdown parser to understand this type of LaTeX commands, or invent
some
syntax to allow inclusion of verbatim strings in metadata. Former would
be
better generally, but may be a bit fragile, I think. Latter sounds like
a
hack, but may be useful in some cases. Are there any reservations for
any
of these options?

I think include-in-header really only makes sense as verbatim, so one option would be to special-case that metadata field, though that seems a bit unprincipled.

Owner

jgm commented May 7, 2015

+++ Nikolay Yakimov [May 07 15 09:25 ]:

Makes sense. As I see it, there are two ways to handle this: either
modify
Markdown parser to understand this type of LaTeX commands, or invent
some
syntax to allow inclusion of verbatim strings in metadata. Former would
be
better generally, but may be a bit fragile, I think. Latter sounds like
a
hack, but may be useful in some cases. Are there any reservations for
any
of these options?

I think include-in-header really only makes sense as verbatim, so one option would be to special-case that metadata field, though that seems a bit unprincipled.

@dato

This comment has been minimized.

Show comment
Hide comment
@dato

dato May 9, 2015

Wow, I hadn’t realized that the metadata block, including header-includes, was parsed as Markdown.

From where I stand, I agree that header-includes only makes sense as verbatim; but I don't know if there'd be other bizarre use cases out there.

All in all, I'd appreciate if I could use this, whichever way, without having to resort to an external header file.

Many thanks in advance!

[edited: s/include-in-header/header-includes/]

dato commented May 9, 2015

Wow, I hadn’t realized that the metadata block, including header-includes, was parsed as Markdown.

From where I stand, I agree that header-includes only makes sense as verbatim; but I don't know if there'd be other bizarre use cases out there.

All in all, I'd appreciate if I could use this, whichever way, without having to resort to an external header file.

Many thanks in advance!

[edited: s/include-in-header/header-includes/]

@lierdakil

This comment has been minimized.

Show comment
Hide comment
@lierdakil

lierdakil May 15, 2015

Contributor

I hacked together a proof-of-concept code for inclusion of "raw strings" in metadata. Current work can be found on lierdakil/pandoc:rawstring-metadata branch (diff)

This hacks on YAML object syntax.

Example:

pandoc -t native -s 
---
header-includes:
  rawstring: \pgfpagesuselayout{2 on 1}[a4paper]
...

Test
^D
Pandoc (Meta {unMeta = fromList [("header-includes",MetaString "\\pgfpagesuselayout{2 on 1}[a4paper]")]})
[Para [Str "Test"]]

I'd like some comments on the viability of this idea.

Another option is to add new flag to ReaderOptions (e.g. optRawMetaStrings :: Bool), and implicitly set it for select few metadata fields. Maybe we could also add an explicit CLI argument (e.g. --raw-metadata-strings) to set this option for all metadata, but I'm not convinced that would be useful.

Contributor

lierdakil commented May 15, 2015

I hacked together a proof-of-concept code for inclusion of "raw strings" in metadata. Current work can be found on lierdakil/pandoc:rawstring-metadata branch (diff)

This hacks on YAML object syntax.

Example:

pandoc -t native -s 
---
header-includes:
  rawstring: \pgfpagesuselayout{2 on 1}[a4paper]
...

Test
^D
Pandoc (Meta {unMeta = fromList [("header-includes",MetaString "\\pgfpagesuselayout{2 on 1}[a4paper]")]})
[Para [Str "Test"]]

I'd like some comments on the viability of this idea.

Another option is to add new flag to ReaderOptions (e.g. optRawMetaStrings :: Bool), and implicitly set it for select few metadata fields. Maybe we could also add an explicit CLI argument (e.g. --raw-metadata-strings) to set this option for all metadata, but I'm not convinced that would be useful.

@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm May 15, 2015

Owner

+++ Nikolay Yakimov [May 14 15 20:44 ]:

I hacked together a proof-of-concept code for inclusion of "raw strings" in metadata. Current work can be found on lierdakil/pandoc:rawstring-metadata branch (diff)

This hacks on YAML object syntax.

Example:

pandoc -t native -s
---
header-includes:
 rawstring: \pgfpagesuselayout{2 on 1}[a4paper]
...

Test
^D
Pandoc (Meta {unMeta = fromList [("header-includes",MetaString "\\pgfpagesuselayout{2 on 1}[a4paper]")]})
[Para [Str "Test"]]

I'd like some comments on the viability of this idea.

I think that ultimately we'll want this to be customizable
per output format. Your proposal gives you a document that
can only be rendered in one format, which is sort of against
the spirit of pandoc.

Your proposal does have the advantage of not needing
changes to pandoc-types (e.g. addition of a new metadata type that
defines raw string output for each format). But I'm not sure
we should go for a half-solution.

I suppose we could implement something now that could be naturally
extended in the direction I suggested. For example:

header-includes:
  (*):  raw stuff

((*) means: raw content in any output format. Yes, this actually
works as a field name!) Then in the future we could extend:

header-includes:
  (latex):  '\raw{latex}'
  (html):   '<raw>html</raw>'
  (*):      'raw fallback'

Another option is to add new flag to ReaderOptions (e.g. optRawMetaStrings :: Bool), and implicitly set it for select few metadata fields. Maybe we could also add an explicit CLI argument (e.g. --raw-metadata-strings) to set this option for all metadata, but I'm not convinced that would be useful.

This seems less useful to me.

Owner

jgm commented May 15, 2015

+++ Nikolay Yakimov [May 14 15 20:44 ]:

I hacked together a proof-of-concept code for inclusion of "raw strings" in metadata. Current work can be found on lierdakil/pandoc:rawstring-metadata branch (diff)

This hacks on YAML object syntax.

Example:

pandoc -t native -s
---
header-includes:
 rawstring: \pgfpagesuselayout{2 on 1}[a4paper]
...

Test
^D
Pandoc (Meta {unMeta = fromList [("header-includes",MetaString "\\pgfpagesuselayout{2 on 1}[a4paper]")]})
[Para [Str "Test"]]

I'd like some comments on the viability of this idea.

I think that ultimately we'll want this to be customizable
per output format. Your proposal gives you a document that
can only be rendered in one format, which is sort of against
the spirit of pandoc.

Your proposal does have the advantage of not needing
changes to pandoc-types (e.g. addition of a new metadata type that
defines raw string output for each format). But I'm not sure
we should go for a half-solution.

I suppose we could implement something now that could be naturally
extended in the direction I suggested. For example:

header-includes:
  (*):  raw stuff

((*) means: raw content in any output format. Yes, this actually
works as a field name!) Then in the future we could extend:

header-includes:
  (latex):  '\raw{latex}'
  (html):   '<raw>html</raw>'
  (*):      'raw fallback'

Another option is to add new flag to ReaderOptions (e.g. optRawMetaStrings :: Bool), and implicitly set it for select few metadata fields. Maybe we could also add an explicit CLI argument (e.g. --raw-metadata-strings) to set this option for all metadata, but I'm not convinced that would be useful.

This seems less useful to me.

@lierdakil

This comment has been minimized.

Show comment
Hide comment
@lierdakil

lierdakil May 15, 2015

Contributor

The more I think about your idea, the more I like it. And parenthesis syntax looks reasonable enough, I definitely prefer it to initially-proposed leading-underscore syntax.

Now as for implementation, I think it's entirely possible to do without changes to pandoc-types, actually, since we already have MetaMap.

It should be possible (and not that horrible) to hack templating engine, so that if a variable resolves to Object, it would try to select from it based on output format name, if that fails on wildcard, and if that fails, default to "true" (as it does now, if I'm not mistaken). Adding a new constructor to MetaValue wouldn't add anything new to this general algorithm, since it's converted to JSON anyway.

It's not entirely clear if we should do something similar when passing metadata to filters, or just pass these "format-dependent raw string" objects in their entirety.

Thoughts?

Contributor

lierdakil commented May 15, 2015

The more I think about your idea, the more I like it. And parenthesis syntax looks reasonable enough, I definitely prefer it to initially-proposed leading-underscore syntax.

Now as for implementation, I think it's entirely possible to do without changes to pandoc-types, actually, since we already have MetaMap.

It should be possible (and not that horrible) to hack templating engine, so that if a variable resolves to Object, it would try to select from it based on output format name, if that fails on wildcard, and if that fails, default to "true" (as it does now, if I'm not mistaken). Adding a new constructor to MetaValue wouldn't add anything new to this general algorithm, since it's converted to JSON anyway.

It's not entirely clear if we should do something similar when passing metadata to filters, or just pass these "format-dependent raw string" objects in their entirety.

Thoughts?

lierdakil added a commit to lierdakil/pandoc that referenced this issue May 15, 2015

@lierdakil

This comment has been minimized.

Show comment
Hide comment
@lierdakil

lierdakil May 15, 2015

Contributor

So, I've played around with this idea for a little bit. See ec17fff for proof-of-concept implementation of above proposal. Code does need some polish, nevermind docs and tests, but general concept should be possible to grasp.

I struggle a bit with HTML-based output though, since only HTML writer actually uses template engine, so we'll have to pass writer type to writeHtml if we want to parametrize for other HTML-based formats, like EPUB (it's possible to keep API relatively intact by introducing new function, of course, but I'd like to have some feedback before getting on with it)

Also, I'm not sure about my implementation of WriterType. On one hand, it should be possible to delegate these declarations to writers (using classes and probably Data.Typeable), on the other hand, it makes this either considerably more verbose (and less type-safe), or considerably less flexible (i.e. at the moment, it's relatively simple to use multiple prioritized keys for single writer -- one example where it could be useful is, again, HTML-based output, not sure on how to implement similar concept with classes)

Contributor

lierdakil commented May 15, 2015

So, I've played around with this idea for a little bit. See ec17fff for proof-of-concept implementation of above proposal. Code does need some polish, nevermind docs and tests, but general concept should be possible to grasp.

I struggle a bit with HTML-based output though, since only HTML writer actually uses template engine, so we'll have to pass writer type to writeHtml if we want to parametrize for other HTML-based formats, like EPUB (it's possible to keep API relatively intact by introducing new function, of course, but I'd like to have some feedback before getting on with it)

Also, I'm not sure about my implementation of WriterType. On one hand, it should be possible to delegate these declarations to writers (using classes and probably Data.Typeable), on the other hand, it makes this either considerably more verbose (and less type-safe), or considerably less flexible (i.e. at the moment, it's relatively simple to use multiple prioritized keys for single writer -- one example where it could be useful is, again, HTML-based output, not sure on how to implement similar concept with classes)

@jgm jgm changed the title from Unexpected braces inserted in LaTeX output header to Raw vs parsed metadata fields Oct 14, 2015

@jgm jgm added the enhancement label Oct 21, 2015

@bpj

This comment has been minimized.

Show comment
Hide comment
@bpj

bpj Feb 4, 2016

This can be worked around with a simple filter which overloads flagged code/codeblock elements.
https://gist.github.com/bpj/e6e53cbe679d3ec77e25#file-pandoc-code2raw-py

bpj commented Feb 4, 2016

This can be worked around with a simple filter which overloads flagged code/codeblock elements.
https://gist.github.com/bpj/e6e53cbe679d3ec77e25#file-pandoc-code2raw-py

@yihui

This comment has been minimized.

Show comment
Hide comment
@yihui

yihui Sep 22, 2016

Contributor

I have been bitten by this issue a couple of times. Today I tried to set mainfontoptions in the YAML metadata like this:

---
mainfont: Alegreya
mainfontoptions: "UprightFeatures={SmallCapsFont=AlegreyaSC-Regular}"
---

It didn't work because Pandoc escaped the curly braces:

\setmainfont[UprightFeatures=\{SmallCapsFont=AlegreyaSC-Regular\}]{Alegreya}

I know this has been reported in #2565 -- just one more vote, hoping raw YAML fields could be possible some day.

Contributor

yihui commented Sep 22, 2016

I have been bitten by this issue a couple of times. Today I tried to set mainfontoptions in the YAML metadata like this:

---
mainfont: Alegreya
mainfontoptions: "UprightFeatures={SmallCapsFont=AlegreyaSC-Regular}"
---

It didn't work because Pandoc escaped the curly braces:

\setmainfont[UprightFeatures=\{SmallCapsFont=AlegreyaSC-Regular\}]{Alegreya}

I know this has been reported in #2565 -- just one more vote, hoping raw YAML fields could be possible some day.

@nichtich

This comment has been minimized.

Show comment
Hide comment
@nichtich

nichtich Sep 23, 2016

Contributor

Why not make use of YAML tags? By now these tags are are ignored, so

---
foo: !!whatever "*bar*"
...

is parsed equivalent to

---
foo: *bar*
...

as

Pandoc (Meta {unMeta = fromList [("foo",MetaInlines [Emph [Str "bar"]])]})

Metadata values passed on command line are not parsed in Markdown but end up as MetaString (or MetaBool). The former should also settable in YAML like this:

---
foo: !!MetaString "*bar*"
...

as

Pandoc (Meta {unMeta = fromList [("foo",MetaString "*bar*")]})

This does not solve the full issue but at least gives a workaround. Last but not least disabling of Markdown parsing for selected YAML fields makes sense in other use cases too.

P.S: It would be consistent to also support MetaBool but actually not needed much.

---
foo: !!MetaBool "true"
...

should be parsed like

---
foo: true
...
Contributor

nichtich commented Sep 23, 2016

Why not make use of YAML tags? By now these tags are are ignored, so

---
foo: !!whatever "*bar*"
...

is parsed equivalent to

---
foo: *bar*
...

as

Pandoc (Meta {unMeta = fromList [("foo",MetaInlines [Emph [Str "bar"]])]})

Metadata values passed on command line are not parsed in Markdown but end up as MetaString (or MetaBool). The former should also settable in YAML like this:

---
foo: !!MetaString "*bar*"
...

as

Pandoc (Meta {unMeta = fromList [("foo",MetaString "*bar*")]})

This does not solve the full issue but at least gives a workaround. Last but not least disabling of Markdown parsing for selected YAML fields makes sense in other use cases too.

P.S: It would be consistent to also support MetaBool but actually not needed much.

---
foo: !!MetaBool "true"
...

should be parsed like

---
foo: true
...
@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm Sep 23, 2016

Owner

That's a really nice idea.

+++ Jakob Voß [Sep 23 16 00:26 ]:

Why not make use of YAML tags? By now these tags are are ignored, so


foo: !!whatever "bar"
...

is parsed equivalent to


foo: bar
...

as
Pandoc (Meta {unMeta = fromList [("foo",MetaInlines [Emph [Str "bar"]])]})

Metadata values passed on command line are not parsed in Markdown but
end up as MetaString (or MetaBool). The former should also settable in
YAML like this:


foo: !!MetaString "bar"
...

as
Pandoc (Meta {unMeta = fromList [("foo",MetaString "bar")]})

This does not solve the full issue but at least gives a workaround.
Last but not least disabling of Markdown parsing for selected YAML
fields makes sense in other use cases too.


You are receiving this because you commented.
Reply to this email directly, [1]view it on GitHub, or [2]mute the
thread.

References

  1. #2139 (comment)
  2. https://github.com/notifications/unsubscribe-auth/AAAL5DJOcw8Htvi5f0IlDSxAyuW1eCq2ks5qs38ogaJpZM4ESjxf
Owner

jgm commented Sep 23, 2016

That's a really nice idea.

+++ Jakob Voß [Sep 23 16 00:26 ]:

Why not make use of YAML tags? By now these tags are are ignored, so


foo: !!whatever "bar"
...

is parsed equivalent to


foo: bar
...

as
Pandoc (Meta {unMeta = fromList [("foo",MetaInlines [Emph [Str "bar"]])]})

Metadata values passed on command line are not parsed in Markdown but
end up as MetaString (or MetaBool). The former should also settable in
YAML like this:


foo: !!MetaString "bar"
...

as
Pandoc (Meta {unMeta = fromList [("foo",MetaString "bar")]})

This does not solve the full issue but at least gives a workaround.
Last but not least disabling of Markdown parsing for selected YAML
fields makes sense in other use cases too.


You are receiving this because you commented.
Reply to this email directly, [1]view it on GitHub, or [2]mute the
thread.

References

  1. #2139 (comment)
  2. https://github.com/notifications/unsubscribe-auth/AAAL5DJOcw8Htvi5f0IlDSxAyuW1eCq2ks5qs38ogaJpZM4ESjxf
@bpj

This comment has been minimized.

Show comment
Hide comment
@bpj

bpj Sep 23, 2016

Just two observations:

  1. It may be better to use local tags !MetaMap etc. so that the YAML has
    a chance of parsing 'out of context' without having a formal tag definition
    included.
  2. There should also be a tag !Data or the like to indicate that the
    whole tree below it should be made available to filters as a plain JSON
    structure without any interpretation as Markdown. That would make filter
    configuration much easier.

Den 23 sep 2016 10:13 skrev "John MacFarlane" notifications@github.com:

That's a really nice idea.

+++ Jakob Voß [Sep 23 16 00:26 ]:

Why not make use of YAML tags? By now these tags are are ignored, so


foo: !!whatever "bar"
...

is parsed equivalent to


foo: bar
...

as
Pandoc (Meta {unMeta = fromList [("foo",MetaInlines [Emph [Str "bar"]])]})

Metadata values passed on command line are not parsed in Markdown but
end up as MetaString (or MetaBool). The former should also settable in
YAML like this:


foo: !!MetaString "bar"
...

as
Pandoc (Meta {unMeta = fromList [("foo",MetaString "bar")]})

This does not solve the full issue but at least gives a workaround.
Last but not least disabling of Markdown parsing for selected YAML
fields makes sense in other use cases too.


You are receiving this because you commented.
Reply to this email directly, [1]view it on GitHub, or [2]mute the
thread.

References

  1. #2139 (comment)
  2. https://github.com/notifications/unsubscribe-auth/
    AAAL5DJOcw8Htvi5f0IlDSxAyuW1eCq2ks5qs38ogaJpZM4ESjxf


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#2139 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABG3Uy5bCI_lF-zAaKmXCB7GSdUsIl1Iks5qs4pFgaJpZM4ESjxf
.

bpj commented Sep 23, 2016

Just two observations:

  1. It may be better to use local tags !MetaMap etc. so that the YAML has
    a chance of parsing 'out of context' without having a formal tag definition
    included.
  2. There should also be a tag !Data or the like to indicate that the
    whole tree below it should be made available to filters as a plain JSON
    structure without any interpretation as Markdown. That would make filter
    configuration much easier.

Den 23 sep 2016 10:13 skrev "John MacFarlane" notifications@github.com:

That's a really nice idea.

+++ Jakob Voß [Sep 23 16 00:26 ]:

Why not make use of YAML tags? By now these tags are are ignored, so


foo: !!whatever "bar"
...

is parsed equivalent to


foo: bar
...

as
Pandoc (Meta {unMeta = fromList [("foo",MetaInlines [Emph [Str "bar"]])]})

Metadata values passed on command line are not parsed in Markdown but
end up as MetaString (or MetaBool). The former should also settable in
YAML like this:


foo: !!MetaString "bar"
...

as
Pandoc (Meta {unMeta = fromList [("foo",MetaString "bar")]})

This does not solve the full issue but at least gives a workaround.
Last but not least disabling of Markdown parsing for selected YAML
fields makes sense in other use cases too.


You are receiving this because you commented.
Reply to this email directly, [1]view it on GitHub, or [2]mute the
thread.

References

  1. #2139 (comment)
  2. https://github.com/notifications/unsubscribe-auth/
    AAAL5DJOcw8Htvi5f0IlDSxAyuW1eCq2ks5qs38ogaJpZM4ESjxf


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#2139 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABG3Uy5bCI_lF-zAaKmXCB7GSdUsIl1Iks5qs4pFgaJpZM4ESjxf
.

@lierdakil

This comment has been minimized.

Show comment
Hide comment
@lierdakil

lierdakil Sep 24, 2016

Contributor

A couple observations of my own:

  1. Data.Yaml doesn't support yaml tags, unless using experimental and
    unstable (as of yaml-0.8.18.7) Data.Yaml.Parser. And even ignoring that
    it's pretty awkward to use.
  2. Data.Libyaml does support yaml tags, but it's a low-level streaming
    parser based on pipes, so it's kinda complicated.

So, at the moment, there seems to be no concise way of getting at yaml tags
from Haskell. So while the idea of using yaml tags is good on paper,
implementation... uh... won't be pretty, that's for sure.

2016-09-23 22:43 GMT+03:00 Benct Philip Jonsson notifications@github.com:

Just two observations:

  1. It may be better to use local tags !MetaMap etc. so that the YAML has
    a chance of parsing 'out of context' without having a formal tag definition
    included.
  2. There should also be a tag !Data or the like to indicate that the
    whole tree below it should be made available to filters as a plain JSON
    structure without any interpretation as Markdown. That would make filter
    configuration much easier.

Den 23 sep 2016 10:13 skrev "John MacFarlane" notifications@github.com:

That's a really nice idea.

+++ Jakob Voß [Sep 23 16 00:26 ]:

Why not make use of YAML tags? By now these tags are are ignored, so


foo: !!whatever "bar"
...

is parsed equivalent to


foo: bar
...

as
Pandoc (Meta {unMeta = fromList [("foo",MetaInlines [Emph [Str
"bar"]])]})

Metadata values passed on command line are not parsed in Markdown but
end up as MetaString (or MetaBool). The former should also settable in
YAML like this:


foo: !!MetaString "bar"
...

as
Pandoc (Meta {unMeta = fromList [("foo",MetaString "bar")]})

This does not solve the full issue but at least gives a workaround.
Last but not least disabling of Markdown parsing for selected YAML
fields makes sense in other use cases too.


You are receiving this because you commented.
Reply to this email directly, [1]view it on GitHub, or [2]mute the
thread.

References

  1. #2139 (comment)
  2. https://github.com/notifications/unsubscribe-auth/
    AAAL5DJOcw8Htvi5f0IlDSxAyuW1eCq2ks5qs38ogaJpZM4ESjxf


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#2139 (comment), or
mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABG3Uy5bCI_lF-
zAaKmXCB7GSdUsIl1Iks5qs4pFgaJpZM4ESjxf>
.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#2139 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AG8EZi7-LlCSnVrSShRor3PEF94wjzO4ks5qtCvfgaJpZM4ESjxf
.

Contributor

lierdakil commented Sep 24, 2016

A couple observations of my own:

  1. Data.Yaml doesn't support yaml tags, unless using experimental and
    unstable (as of yaml-0.8.18.7) Data.Yaml.Parser. And even ignoring that
    it's pretty awkward to use.
  2. Data.Libyaml does support yaml tags, but it's a low-level streaming
    parser based on pipes, so it's kinda complicated.

So, at the moment, there seems to be no concise way of getting at yaml tags
from Haskell. So while the idea of using yaml tags is good on paper,
implementation... uh... won't be pretty, that's for sure.

2016-09-23 22:43 GMT+03:00 Benct Philip Jonsson notifications@github.com:

Just two observations:

  1. It may be better to use local tags !MetaMap etc. so that the YAML has
    a chance of parsing 'out of context' without having a formal tag definition
    included.
  2. There should also be a tag !Data or the like to indicate that the
    whole tree below it should be made available to filters as a plain JSON
    structure without any interpretation as Markdown. That would make filter
    configuration much easier.

Den 23 sep 2016 10:13 skrev "John MacFarlane" notifications@github.com:

That's a really nice idea.

+++ Jakob Voß [Sep 23 16 00:26 ]:

Why not make use of YAML tags? By now these tags are are ignored, so


foo: !!whatever "bar"
...

is parsed equivalent to


foo: bar
...

as
Pandoc (Meta {unMeta = fromList [("foo",MetaInlines [Emph [Str
"bar"]])]})

Metadata values passed on command line are not parsed in Markdown but
end up as MetaString (or MetaBool). The former should also settable in
YAML like this:


foo: !!MetaString "bar"
...

as
Pandoc (Meta {unMeta = fromList [("foo",MetaString "bar")]})

This does not solve the full issue but at least gives a workaround.
Last but not least disabling of Markdown parsing for selected YAML
fields makes sense in other use cases too.


You are receiving this because you commented.
Reply to this email directly, [1]view it on GitHub, or [2]mute the
thread.

References

  1. #2139 (comment)
  2. https://github.com/notifications/unsubscribe-auth/
    AAAL5DJOcw8Htvi5f0IlDSxAyuW1eCq2ks5qs38ogaJpZM4ESjxf


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#2139 (comment), or
mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABG3Uy5bCI_lF-
zAaKmXCB7GSdUsIl1Iks5qs4pFgaJpZM4ESjxf>
.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#2139 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AG8EZi7-LlCSnVrSShRor3PEF94wjzO4ks5qtCvfgaJpZM4ESjxf
.

@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm Sep 28, 2016

Owner

OK, I've lost enthusiasm for the Yaml tags idea.
I still like the idea I expressed above, of using objects with parenthesized fields for raw content.
Thus, for example,

    header-includes:
      - (latex):  '\raw{latex}'
        (html):   '<raw>html</raw>'
        (*):      'raw fallback'

or

mainfontoptions:
  (latex): "UprightFeatures={SmallCapsFont=AlegreyaSC-Regular}"

This would require the following changes, I think:

  • The code in the Markdown reader for parsing YAML metadata would parse as a raw MetaString any field name surrounded by parentheses.
  • The template expansion code would, when rendering an object, check to see if it has a field named (FORMAT), where FORMAT is the current output format. If so, render that content. If not, check for (*), and render it if found. Otherwise treat like other objects.

This could be done without touching pandoc-types, I believe. More ambitiously we could change MetaString to a map from formats to raw strings.

Owner

jgm commented Sep 28, 2016

OK, I've lost enthusiasm for the Yaml tags idea.
I still like the idea I expressed above, of using objects with parenthesized fields for raw content.
Thus, for example,

    header-includes:
      - (latex):  '\raw{latex}'
        (html):   '<raw>html</raw>'
        (*):      'raw fallback'

or

mainfontoptions:
  (latex): "UprightFeatures={SmallCapsFont=AlegreyaSC-Regular}"

This would require the following changes, I think:

  • The code in the Markdown reader for parsing YAML metadata would parse as a raw MetaString any field name surrounded by parentheses.
  • The template expansion code would, when rendering an object, check to see if it has a field named (FORMAT), where FORMAT is the current output format. If so, render that content. If not, check for (*), and render it if found. Otherwise treat like other objects.

This could be done without touching pandoc-types, I believe. More ambitiously we could change MetaString to a map from formats to raw strings.

@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm Sep 28, 2016

Owner

(_) or () would be alternatives to (*) for the fallback.

Owner

jgm commented Sep 28, 2016

(_) or () would be alternatives to (*) for the fallback.

@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm Oct 3, 2016

Owner

Another option would be to modify the Markdown reader so it parses

mainfontoptions:
  (latex): "UprightFeatures={SmallCapsFont=AlegreyaSC-Regular}"

as metafontoptions with a RawBlock (Format "latex") "the string".
Then no modifications to the template engine would be needed.

Owner

jgm commented Oct 3, 2016

Another option would be to modify the Markdown reader so it parses

mainfontoptions:
  (latex): "UprightFeatures={SmallCapsFont=AlegreyaSC-Regular}"

as metafontoptions with a RawBlock (Format "latex") "the string".
Then no modifications to the template engine would be needed.

@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm Oct 3, 2016

Owner

To put yet another option on the table (again involving on the Markdown reader):

mainfontoptions[latex]: "UprightFeatures={SmallCapsFont=AlegreyaSC-Regular}"

This would just require checking the key; if it ends with [format], then we parse it as RawInline (Format format) (or perhaps RawBlock? It might not matter in this context.)

This would do the right thing in most cases, e.g. the raw latex would be omitted automatically in HTML output. And it would be easy to implement and easy to remember.

Owner

jgm commented Oct 3, 2016

To put yet another option on the table (again involving on the Markdown reader):

mainfontoptions[latex]: "UprightFeatures={SmallCapsFont=AlegreyaSC-Regular}"

This would just require checking the key; if it ends with [format], then we parse it as RawInline (Format format) (or perhaps RawBlock? It might not matter in this context.)

This would do the right thing in most cases, e.g. the raw latex would be omitted automatically in HTML output. And it would be easy to implement and easy to remember.

@nichtich

This comment has been minimized.

Show comment
Hide comment
@nichtich

nichtich Oct 3, 2016

Contributor

I prefer the latter format with no addition nesting but I can't say why ;-) mainfontoptions@latex may be another alternative. What woul be the metadata for the following document for different output formats?

foo[latex]: a
foo[html]: b
foo[json]: c
foo[native]: d
foo[notexist]: e
foo[]: f
foo: g

I'd guess that it's

  • fromList [("foo", RawInline (Format "latex") "a")] for latex
  • fromList [("foo", RawInline (Format "html") "b")] for html
  • fromList [("foo",MetaInlines [Str "g"])] for all other formats, including json and native.

In any way the AST only contains a single metadata field foo, right?

Contributor

nichtich commented Oct 3, 2016

I prefer the latter format with no addition nesting but I can't say why ;-) mainfontoptions@latex may be another alternative. What woul be the metadata for the following document for different output formats?

foo[latex]: a
foo[html]: b
foo[json]: c
foo[native]: d
foo[notexist]: e
foo[]: f
foo: g

I'd guess that it's

  • fromList [("foo", RawInline (Format "latex") "a")] for latex
  • fromList [("foo", RawInline (Format "html") "b")] for html
  • fromList [("foo",MetaInlines [Str "g"])] for all other formats, including json and native.

In any way the AST only contains a single metadata field foo, right?

@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm Oct 3, 2016

Owner

I'd guess that it's
* fromList [("foo", RawInline (Format "latex") "a")] for latex
* fromList [("foo", RawInline (Format "html") "b")] for html
* fromList [("foo",MetaInlines [Str "g"])] for all other formats,
including json and native.

No, the parser doesn't know what the output format is going
to be, so it would have to be something like

 fromList [("foo", [MetaInlines [RawInline (Format "latex") "a",
                                 RawInline (Format "html") "b"],
                     MetaString "f"])]

There are obviously some complexities here to consider...

Owner

jgm commented Oct 3, 2016

I'd guess that it's
* fromList [("foo", RawInline (Format "latex") "a")] for latex
* fromList [("foo", RawInline (Format "html") "b")] for html
* fromList [("foo",MetaInlines [Str "g"])] for all other formats,
including json and native.

No, the parser doesn't know what the output format is going
to be, so it would have to be something like

 fromList [("foo", [MetaInlines [RawInline (Format "latex") "a",
                                 RawInline (Format "html") "b"],
                     MetaString "f"])]

There are obviously some complexities here to consider...

@bpj

This comment has been minimized.

Show comment
Hide comment
@bpj

bpj Oct 3, 2016

@jgm wrote:

(_) or () would be alternatives to (*) for the fallback.

Are you aware that the asterisk is a metacharacter in YAML? All keys containing it would have to be quoted. The underscore doesn't have that issue.

bpj commented Oct 3, 2016

@jgm wrote:

(_) or () would be alternatives to (*) for the fallback.

Are you aware that the asterisk is a metacharacter in YAML? All keys containing it would have to be quoted. The underscore doesn't have that issue.

@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm Jun 22, 2017

Owner

I think that with 2b34337 we now have a decent solution to this problem.

---
title: Pandoc bug?
header-includes: `\pgfpagesuselayout{2 on 1}[a4paper]`{=latex}
---
    header-includes:
      - `\raw{latex}`{=latex}
        `<raw>html</raw>`{=html}
Owner

jgm commented Jun 22, 2017

I think that with 2b34337 we now have a decent solution to this problem.

---
title: Pandoc bug?
header-includes: `\pgfpagesuselayout{2 on 1}[a4paper]`{=latex}
---
    header-includes:
      - `\raw{latex}`{=latex}
        `<raw>html</raw>`{=html}
@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm Jun 23, 2017

Owner

Please comment here if you think that we should still support a syntax like

    header-includes:
      - (latex):  '\raw{latex}'
        (html):   '<raw>html</raw>'
        (*):      'raw fallback'

now that similar things can be achieved with the raw_attribute.

Owner

jgm commented Jun 23, 2017

Please comment here if you think that we should still support a syntax like

    header-includes:
      - (latex):  '\raw{latex}'
        (html):   '<raw>html</raw>'
        (*):      'raw fallback'

now that similar things can be achieved with the raw_attribute.

@yihui

This comment has been minimized.

Show comment
Hide comment
@yihui

yihui Jun 23, 2017

Contributor

I think at least my problem will be solved by the new raw attribute. Thanks!

Contributor

yihui commented Jun 23, 2017

I think at least my problem will be solved by the new raw attribute. Thanks!

@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm Jun 23, 2017

Owner

I'm going to close this unless anyone really wants to bring it back.

Owner

jgm commented Jun 23, 2017

I'm going to close this unless anyone really wants to bring it back.

@jgm jgm closed this Jun 23, 2017

@pauljohn32

This comment has been minimized.

Show comment
Hide comment
@pauljohn32

pauljohn32 Sep 14, 2017

How to use this? My short run goal is to make this work in the YAML header

logoleft: "/home/pauljohn/R/x86_64-pc-linux-gnu-library/3.4/crmda/theme/jayhawk.pdf"

With pandoc 1.19, I get "_" in the tex file.

My long run goal is to replace the explicit path with the result of an R chunk which retrieves a file path. This does work to retrieve the path, I can use it in export to HTML, but not PDF because "_" is inserted:

logoleft: "`r system.file('theme/jayhawk.pdf', package = 'crmda')`"

That works if package is installed in /usr/local/share, for example. However, if the R package is in user directory, where there is "_", its a total fail.

pauljohn32 commented Sep 14, 2017

How to use this? My short run goal is to make this work in the YAML header

logoleft: "/home/pauljohn/R/x86_64-pc-linux-gnu-library/3.4/crmda/theme/jayhawk.pdf"

With pandoc 1.19, I get "_" in the tex file.

My long run goal is to replace the explicit path with the result of an R chunk which retrieves a file path. This does work to retrieve the path, I can use it in export to HTML, but not PDF because "_" is inserted:

logoleft: "`r system.file('theme/jayhawk.pdf', package = 'crmda')`"

That works if package is installed in /usr/local/share, for example. However, if the R package is in user directory, where there is "_", its a total fail.

@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm Sep 15, 2017

Owner
logoleft: "`r system.file('theme/jayhawk.pdf', package = 'crmda')`{=html}"

But this only works with the dev version of pandoc, not with 1.19.

Owner

jgm commented Sep 15, 2017

logoleft: "`r system.file('theme/jayhawk.pdf', package = 'crmda')`{=html}"

But this only works with the dev version of pandoc, not with 1.19.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment