Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling mmd_title_block causes YAML title blocks to be mis-parsed #2026

Closed
zackw opened this issue Mar 25, 2015 · 13 comments
Closed

Enabling mmd_title_block causes YAML title blocks to be mis-parsed #2026

zackw opened this issue Mar 25, 2015 · 13 comments

Comments

@zackw
Copy link

zackw commented Mar 25, 2015

Consider


---
booktitle_url: http://www.sigsac.org/ccs/CCS2013/
...

This is unambiguously a YAML header, and the manual says YAML headers take precedence over MMD headers. But watch what happens:

$ pandoc -f markdown -t json < test.md
[{"unMeta":{"booktitle_url":{"t":"MetaInlines","c":[{"t":"Str","c":"http://www.sigsac.org/ccs/CCS2013/"}]}}},[]]

$ pandoc -f markdown+mmd_title_block -t json < test.md
[{"unMeta":{"booktitle_url":{"t":"MetaBlocks","c":[]}}},[]]

Not all values are affected. URLs consistently seem to get eaten, and strings containing no punctuation consistently seem to survive, but don't quote me on that.

I have pandoc 1.12.4.2, in case that matters.

zackw added a commit to zackw/pandoc_reader that referenced this issue Mar 26, 2015
This involves a fairly complicated dance with a Pandoc "filter"
module in order to get all of the metadata to be visible in the
output, but means that all metadata formats supported by Pandoc
are available without the need for any additional Python modules.
It also means strings in metadata will be processed as Markdown.

NOTE: Thanks to jgm/pandoc#2026 and
backward compatibility constraints, this change defaults to
enabling 'mmd_title_block' and *disabling* 'pandoc_title_block' and
'yaml_metadata_block'.  Moreover, putting either +pandoc_title_block or
+yaml_metadata_block in PANDOC_EXTENSIONS will cause mmd_title_block to
be disabled.
@jgm
Copy link
Owner

jgm commented Mar 27, 2015

I can reproduce this with latest pandoc:

% pandoc -f markdown+mmd_title_block -s -t native
---
booktitle_url: http://www.sigsac.org/ccs/CCS2013/
...
^D
Pandoc (Meta {unMeta = fromList [("booktitle_url",MetaBlocks [])]})
[]

No idea why off hand, but will have to look into it.

@lierdakil
Copy link
Contributor

Well, I think I know why that happens.

echo -e "http://google.com" | pandoc -f markdown+mmd_title_block -t json

produces

[{"unMeta":{"http":{"t":"MetaBlocks","c":[{"t":"Plain","c":[{"t":"Str","c":"//google.com"}]}]}}},[]]

i.e. "http" is interpreted as a metadata key, and the rest as metadata value.

Since metadata values are parsed with the same parser... yeah. No idea on how to fix this properly though. How does multimarkdown handle this, I wonder?

@zackw
Copy link
Author

zackw commented Mar 27, 2015

Possibly related, then:

$ printf '%s\n%s\n%s\n' '---' 'title: This Title Has: A Colon In It' '...' | pandoc
pandoc: Could not parse YAML header: mapping values are not allowed in this context "source" (line 3, column 22)

If I'm reading the YAML spec correctly, This Title Has: A Colon In It should be interpreted as a string literal. But I'm not 100% sure about that.

@lierdakil
Copy link
Contributor

@zackw, not sure about YAML spec, but it's parsed by yaml, so that's not directly related to pandoc itself.

Multimarkdown title block is parsed by pandoc, however.

Fix options would include:

  1. Require space after semicolon in mmd title block
  2. Disable all metadata block extensions when parsing metadata fields

In all honesty, I think both should be implemented...

@jgm
Copy link
Owner

jgm commented Mar 27, 2015

+++ Nikolay Yakimov [Mar 27 15 16:18 ]:

Well, I think I know why that happens.

echo -e "http://google.com" | pandoc -f markdown+mmd_title_block -t json

produces

[{"unMeta":{"http":{"t":"MetaBlocks","c":[{"t":"Plain","c":[{"t":"Str","c":"//google.com"}]}]}}},[]]

i.e. "http" is interpreted as a metadata key, and the rest as metadata value.

Since metadata values are parsed with the same parser... yeah. No idea on how to fix this properly though. How does multimarkdown handle this, I wonder?

Multimarkdown does not parse metadata values at all - so you can't e.g. have italics in a metadata value. (Not the greatest feature.)

@jgm
Copy link
Owner

jgm commented Mar 27, 2015

Yes, good point: try putting the whole URL in single quotes. Colons in YAML values normally need to be escaped.

+++ Zack Weinberg [Mar 27 15 16:24 ]:

Possibly related, then:

$ printf '%s\n%s\n%s\n' '---' 'title: This Title Has: A Colon In It' '...' | pandoc
pandoc: Could not parse YAML header: mapping values are not allowed in this context "source" (line 3, column 22)

If I'm reading the YAML spec correctly, This Title Has: A Colon In It should be interpreted as a string literal. But I'm not 100% sure about that.


Reply to this email directly or view it on GitHub:
#2026 (comment)

lierdakil added a commit to lierdakil/pandoc that referenced this issue Mar 27, 2015
Disable all metadata block extensions when parsing metadata field
values. Issue jgm#2026
lierdakil added a commit to lierdakil/pandoc that referenced this issue Mar 27, 2015
Require space after key-value delimiter colon in mmd title block.
Issue jgm#2026
@lierdakil
Copy link
Contributor

@jgm, I mean, how does mmd handle markdown that begins with URI? Does it interpret it as title block?

Anyway, I've pushed commits for my fix proposals. You can cherry-pick one or both, or I can create a PR.

@lierdakil
Copy link
Contributor

Oh, and by the way, single/double quotes won't work, since those are stripped away by yaml parser.

@jgm
Copy link
Owner

jgm commented Mar 28, 2015

+++ Nikolay Yakimov [Mar 27 15 17:01 ]:

@jgm, I mean, how does mmd handle markdown that begins with URI? Does it interpret it as title block?

It definitely requires a space after the colon.

@jgm
Copy link
Owner

jgm commented Mar 28, 2015

+++ Nikolay Yakimov [Mar 27 15 17:01 ]:

@jgm, I mean, how does mmd handle markdown that begins with URI? Does it interpret it as title block?

Anyway, I've pushed commits for my fix proposals. You can cherry-pick one or both, or I can create a PR.

I think both fixes are needed. IF you can create a PR, that would be helpful.

(Btw, I've confirmed experimentally that multimarkdown does not allow a
metadata field with no value, or a blank value. So technically we
should both (a) skip one or more spaces, and (b) use many1 instead of
many for the value.)

@lierdakil lierdakil mentioned this issue Mar 28, 2015
3 tasks
@lierdakil
Copy link
Contributor

Ok, I've created #2030. I'll see about requiring a value in a minute.

lierdakil added a commit to lierdakil/pandoc that referenced this issue Mar 28, 2015
Require space after key-value delimiter colon in mmd title block.
Issue jgm#2026
Amend: parsec's `spaces` include newlines, but we don't want that. Had
to make custom `spaceNoNewline` parser here
@lierdakil
Copy link
Contributor

Skipping spaces is more convoluted than I initially thought, when considering empty value, since Parsec's space matches newline as well, but I think I managed to make it work in #2030. Please review.

jgm added a commit that referenced this issue Mar 28, 2015
@jgm
Copy link
Owner

jgm commented Mar 28, 2015

@lierdakil, I probably would have just used Text.Pandoc.Parsing.spaceChar which is just space or tab. But I did check, and multimarkdown allowed a unicode nonbreaking space. So your solution seems more correct.

Closed by 86a4442

@jgm jgm closed this as completed Mar 28, 2015
zackw added a commit to zackw/pandoc_reader that referenced this issue Mar 30, 2015
This involves a fairly complicated dance with a Pandoc "filter"
module in order to get all of the metadata to be visible in the
output, but means that all metadata formats supported by Pandoc
are available without the need for any additional Python modules.
It also means strings in metadata will be processed as Markdown.

NOTE: Thanks to jgm/pandoc#2026 and
backward compatibility constraints, this change defaults to
enabling 'mmd_title_block' and *disabling* 'pandoc_title_block' and
'yaml_metadata_block'.  Moreover, putting either +pandoc_title_block or
+yaml_metadata_block in PANDOC_EXTENSIONS will cause mmd_title_block to
be disabled.
jgm added a commit that referenced this issue Oct 27, 2015
We now allow blank metadata fields.  These were explicitly
disallowed before.

For background see #2026.  The issue in #2026 has since
been fixed in another way, so there is no need to forbid
blank metadata fields.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants