Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple new lines between entries in the frontmatter "breaks" the parser when rendering #6042

Closed
joelvonrotz opened this issue Jun 27, 2023 · 9 comments · Fixed by #6053
Closed
Labels
bug Something isn't working
Milestone

Comments

@joelvonrotz
Copy link

Bug description

I've noticed, the Frontmatter/YAML parser seems to act funny when multiple new lines are inserted between entries, or well anywhere in the frontmatter.

When only one new line between entries is used, the parser detects the pdf format automatically.

Steps to reproduce

  • Working Version: Compile the following document
---
title: "Untitled"

format:
  pdf:
    pdf-engine: pdflatex
---

hello world
  • Breaking Behaviour: Add a new line directly after --- in the frontmatter
---

title: "Untitled"

format:
  pdf:
    pdf-engine: pdflatex
---

hello world
  • Breaking Behaviour: Add double new lines anywhere in the frontmatter
---
title: "Untitled"


format:


  pdf:
    pdf-engine: pdflatex


---

hello world

Expected behavior

Compile the document correctly using the given format (in this case being pdf).

image

Actual behavior

image

Following also breaks the expected behaviour

---

title: "Untitled"

format: 
  pdf:
    pdf-engine: pdflatex
---
---
title: "Untitled"


format: 
  pdf:
    pdf-engine: pdflatex

---
---
title: "Untitled"
format:


  pdf:
    pdf-engine: pdflatex
---

Your environment

  • IDE: VSCode
  • OS: Windows 10

Quarto check output

$ quarto check

[>] Checking versions of quarto binary dependencies...
      Pandoc version 3.1.1: OK
      Dart Sass version 1.55.0: OK
[>] Checking versions of quarto dependencies......OK
[>] Checking Quarto installation......OK
      Version: 1.3.433
      Path: C:\Users\Joel\AppData\Local\Programs\Quarto\bin
      CodePage: 1252

[>] Checking basic markdown render....OK

[>] Checking Python 3 installation....OK
      Version: 3.10.5
      Path: C:/Users/Joel/AppData/Local/Programs/Python/Python310/python.exe     
      Jupyter: 5.1.1
      Kernels: python3

[>] Checking Jupyter engine render....OK

[>] Checking R installation...........OK
      Version: 4.2.2
      Path: C:/PROGRA~1/R/R-42~1.2
      LibPaths:
        - C:/Users/Joel/AppData/Local/R/win-library/4.2
        - C:/Program Files/R/R-4.2.2/library
      knitr: 1.41
      rmarkdown: 2.20

[>] Checking Knitr engine render......OK
@joelvonrotz joelvonrotz added the bug Something isn't working label Jun 27, 2023
@joelvonrotz joelvonrotz changed the title multiple new lines in a row in the frontmatter breaks it's parser when rendering Multiple new lines between entries in the frontmatter "breaks" the parser when rendering Jun 27, 2023
@cscheid
Copy link
Collaborator

cscheid commented Jun 27, 2023

Unfortunately, that's not behavior we can control, it comes directly from Pandoc. You can verify this by running your documents against Pandoc directly with quarto pandoc -f markdown -t native test.md and inspecting the output.

@cscheid cscheid added wontfix This will not be worked on upstream Bug is in upstream library labels Jun 27, 2023
@cscheid
Copy link
Collaborator

cscheid commented Jun 27, 2023

We should document this on quarto.org, though.

@cscheid cscheid added the documentation Doc improvements & quarto-web label Jun 27, 2023
@joelvonrotz
Copy link
Author

Ah okay, thank you for your explanation. I've checked out following command to see if pandoc actually parses the metadata of the file.

$ quarto pandoc --template=./metadata.pandoc-tpl -f markdown test.md

{"format":{"pdf":{"pdf-engine":"pdflatex"}},"title":"Untitled"}

with metadata.pandoc-tpl containing $meta-json$

It seems to detect the metadata without issue, regardless of the amount of newlines used, except for the newline before the title field.

I don't know how pandoc handles all this stuff, since I've never used it before. Do you just provide a list of defaults, when pandoc doesn't receive them?

@cderv
Copy link
Collaborator

cderv commented Jun 28, 2023

Unfortunately, that's not behavior we can control, it comes directly from Pandoc.

@cscheid I believe the only format that Pandoc does not support is when --- is followed by an empty line.
Otherwise, Pandoc will parse the YAML while ignoring empty lines. I tested all the cases above using the command you shared.

I do not know if we plan to support YAML with empty lines, but currently we don't because of the first regex here IMO

// exclude yaml blocks that start with a blank line, start with
// a yaml delimiter (can occur if two "---" stack together) or
// are entirely empty
// (that's not valid for pandoc yaml blocks)
if (
!yamlBlock.match(/^\n\s*\n/m) &&
!yamlBlock.match(/^\n\s*\n---/m) &&
(yamlBlock.trim().length > 0)
) {

The m modifier for the regex, means to apply ^ as the start of each line. From w3school

The "m" modifier specifies a multiline match.
(...)
With the "m" set, ^ and $ also match at the beginning and end of each line.

This means /^\n\s*\n/m will match any set of 2 consecutive empty lines in the YAML, not just at the start of the YAML but anywhere in the YAML block. This means Quarto won't parse the YAML block to get the format in the example.

I see the comment says

// exclude yaml blocks that start with a blank line, start with

So maybe the intended regex was /^\n\s*\n/ without the modifier

Now, writing empty lines in YAML is really not something that should be considered good practice. So we could disallow that, but maybe this should be a parsing error if 2 consecutive empty lines are found. Instead of silently rendering while ignoring the option like it does now.

So I would change the regex to match Pandoc behavior maybe. Is there anywhere else where we are parsing the YAML ?

FWIW changing the regex

diff --git a/src/core/yaml.ts b/src/core/yaml.ts
index bcf8883f6..2a1a4935c 100644
--- a/src/core/yaml.ts
+++ b/src/core/yaml.ts
@@ -95,7 +95,7 @@ export function readYamlFromMarkdown(
       // are entirely empty
       // (that's not valid for pandoc yaml blocks)
       if (
-        !yamlBlock.match(/^\n\s*\n/m) &&
+        !yamlBlock.match(/^\n\s*\n/) &&
         !yamlBlock.match(/^\n\s*\n---/m) &&
         (yamlBlock.trim().length > 0)
       ) {

will correctly make the YAML block be parsed with parse() correctly. Empy lines are ignored. I believe they are allowed in YAML spec BTW.

Hope it helps

@joelvonrotz
Copy link
Author

Oh neat, thank you for looking further into this! This seems to make a lot more sense now.

@cscheid
Copy link
Collaborator

cscheid commented Jun 28, 2023

Thanks for the proper context, @cderv!

Empy lines are ignored. I believe they are allowed in YAML spec BTW.

(Yes, they are.)

This looks like a great catch and fix, assuming it passes our test suite!

@cderv
Copy link
Collaborator

cderv commented Jun 28, 2023

This looks like a great catch and fix, assuming it passes our test suite!

I'll do a PR and we'll see then.

@cscheid cscheid removed documentation Doc improvements & quarto-web wontfix This will not be worked on upstream Bug is in upstream library labels Jun 28, 2023
@cscheid
Copy link
Collaborator

cscheid commented Jun 28, 2023

Thanks, @cderv!!

@cderv
Copy link
Collaborator

cderv commented Jun 28, 2023

Thanks for the report @joelvonrotz !

@cderv cderv added this to the v1.4 milestone Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants