Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raw HTML with tables incurs quarto processing #8582

Closed
daslu opened this issue Feb 3, 2024 · 10 comments · Fixed by #8678
Closed

raw HTML with tables incurs quarto processing #8582

daslu opened this issue Feb 3, 2024 · 10 comments · Fixed by #8678
Assignees
Labels
bug Something isn't working
Milestone

Comments

@daslu
Copy link

daslu commented Feb 3, 2024

Bug description

I ran across a situation, where text indentation seems to be broken by the content of raw HTML.

Trying to make the example as small as possible, I reached the following qmd example.

Steps to reproduce

### A
```{=html}
<div>
  <table></table>
  <h3></h3>
  <table>
    <tbody>
      <tr></tr>
    </tbody>
  </table>
</div>
```
1

Expected behavior

Usually, the "A" and the "1" would have the same indentation.

Actual behavior

On rendering, the resulting HTML looks as follows:
image

You see, the "1" text is no longer indented.

Your environment

Ubuntu 22.04.
Quarto 1.4.549.
Edited with Emacs and ran quarto preview of the qmd file from the linux shell.

Quarto check output

quarto check
Quarto 1.4.549
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.1.11: OK
      Dart Sass version 1.69.5: OK
      Deno version 1.37.2: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.4.549
      Path: /opt/quarto/bin

[✓] Checking tools....................OK
      TinyTeX: v2024.01
      Chromium: (not installed)

[✓] Checking LaTeX....................OK
      Using: TinyTex
      Path: /home/daslu/.TinyTeX/bin/x86_64-linux
      Version: 2023

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.10.9 (Conda)
      Path: /workspace/anaconda3/bin/python
      Jupyter: 5.3.0
      Kernels: python3

[✓] Checking Jupyter engine render....OK

R scripting front-end version 4.1.2 (2021-11-01)
[✓] Checking R installation...........(None)

      Unable to locate an installed version of R.
      Install R from https://cloud.r-project.org/

@daslu daslu added the bug Something isn't working label Feb 3, 2024
@mcanouil
Copy link
Collaborator

mcanouil commented Feb 3, 2024

Side note

There is no (real) bugs here.
You are making a "single paragraph" since you did not add empty lines (and the code cell confuses the parser even more).
Headers and code blocks/cells should be surrounded by empty lines.

FYI, Markdown linter rules to ensure markdown parser get it right: https://github.com/markdownlint/markdownlint/blob/main/docs/RULES.md

Rules for your example:

All this being said, Pandoc is able to render this properly, so something in Quarto HTML envelope:

quarto pandoc index.qmd --from markdown --to html -o index.html
<h3 id="a">A</h3>
<div>
  <table></table>
  <h3></h3>
  <table>
    <tbody>
      <tr></tr>
    </tbody>
  </table>
</div>
<p>1</p>

@mcanouil mcanouil added the markdown Related to markdown-like output format label Feb 3, 2024
@cscheid
Copy link
Collaborator

cscheid commented Feb 3, 2024

I think this is happening because of Quarto's HTML table parsing. Try adding html-table-processing: none to your YAML front matter.

@mcanouil
Copy link
Collaborator

mcanouil commented Feb 3, 2024

The HTML produced by Quarto on the following example (to make it obvious):

### A

```{=html}
<div>
  <table></table>
  <h3>Raw header</h3>
  <table>
    <tbody>
      <tr>
        <td>1</td>
        <td>2</td>
      </tr>
      <tr>
        <td>3</td>
        <td>4</td>
      </tr>
    </tbody>
  </table>
</div>
```

1
<main class="content" id="quarto-document-content">
  <section id="a" class="level3">
    <h3 class="anchored" data-anchor-id="a">A</h3>
    <div>
      <div>
        <table data-quarto-postprocess="true">
        </table>
        <section id="raw-header" class="level3">
          <h3 class="anchored" data-anchor-id="raw-header">Raw header</h3>
          <table data-quarto-postprocess="true" class="table">
            <tbody>
              <tr class="odd">
                <td>1</td>
                <td>2</td>
              </tr>
              <tr class="even">
                <td>3</td>
                <td>4</td>
              </tr>
            </tbody>
          </table>
        </section>
      </div>
    </div>
  </section>
</main>
</div>
<p>1</p>

The issue is the table when not empty which triggers Quarto table processing.
Disabling the processing in the code solves the issue:

Quarto documentHTML
### A

```{=html}
<div>
  <table data-quarto-disable-processing="true"></table>
  <h3>Raw header</h3>
  <table data-quarto-disable-processing="true">
    <tbody>
      <tr>
        <td>1</td>
        <td>2</td>
      </tr>
      <tr>
        <td>3</td>
        <td>4</td>
      </tr>
    </tbody>
  </table>
</div>
```

1
image

@mcanouil mcanouil added the tables Issues with Tables including the gt integration label Feb 3, 2024
@cscheid cscheid added enhancement New feature or request and removed bug Something isn't working tables Issues with Tables including the gt integration markdown Related to markdown-like output format labels Feb 3, 2024
@cscheid cscheid changed the title raw HTML may break text layout raw HTML with tables incurs quarto processing Feb 3, 2024
@cscheid
Copy link
Collaborator

cscheid commented Feb 3, 2024

So this isn't a bug, it's intended (although perhaps obscure) behavior. I'm not quite sure how we can make it clearer what's going on. I'm going to leave this open as an enhancement ticket, but I don't have anything actionable at the moment.

@daslu
Copy link
Author

daslu commented Feb 3, 2024

html-table-processing: none

This works beautifully.

Many thanks for your kind & quick help.

@cscheid cscheid added this to the Future milestone Feb 3, 2024
@daslu
Copy link
Author

daslu commented Feb 3, 2024

Many thanks for your kind & quick help.

@RegalPlatypus
Copy link

I think this is happening because of Quarto's HTML table parsing. Try adding html-table-processing: none to your YAML front matter.

Passing you my thanks as well! After updating to the latest Quarto version all my {gt} tables started aligning right. This addition to the YAML fixed the issue.

@cscheid cscheid added bug Something isn't working and removed enhancement New feature or request labels Feb 11, 2024
@cscheid cscheid self-assigned this Feb 11, 2024
@cscheid
Copy link
Collaborator

cscheid commented Feb 11, 2024

This is actually a bug.

Specifically, the problem is that <h3> element is getting parsed by Pandoc into a section.

Then, this section is getting converted into a Pandoc section which is bringing in the closing </div> part of the rawblock, causing our HTML to be invalid, and the document to be misrendered.

@cscheid
Copy link
Collaborator

cscheid commented Feb 11, 2024

The bad .html we generate here is this:

<main class="content" id="quarto-document-content">

<header id="title-block-header" class="quarto-title-block default">
<div class="quarto-title">
<h1 class="title">hello</h1>
</div>



<div class="quarto-title-meta">

    
  
    
  </div>
  


</header>


<section id="a" class="level3">
<h3 class="anchored" data-anchor-id="a">A</h3>
<div>
<div>
  
<table data-quarto-postprocess="true">
</table>
<section id="section" class="level3">
<h3 class="anchored" data-anchor-id="section"></h3>
<table data-quarto-postprocess="true">
<tbody>
<tr class="header">
</tr>

</tbody>
</table>

</section></div>
</div></section>
</main></div>
<p>1</p>

Note the mess we made out of the matching tags.

@cscheid
Copy link
Collaborator

cscheid commented Feb 11, 2024

Looking at our trace output, the bug is actually that we are parsing the interstitial header element between the two tables:

  - op: "add"
    path: "/blocks/1/content"
    value:
      - t: "RawBlock"
        format: "html"
        text: "<div>\n  "
      - t: "RawBlock"
        format: "html"
        text: "<table data-quarto-postprocess=\"true\">"
      - t: "RawBlock"
        format: "html"
        text: "</table>"
      - t: "Header"
        level: 3
        attr: "('section', [], [])"
        content: []
      - t: "Table"
        attr: "('', [], ['quarto-postprocess,true'])"
        caption:
          - null
          - []
        colspecs: []
        head:
          t: "TableHead"
          attr: "('', [], [])"
          rows: []
        body:
          - t: "TableBody"
            row_head_columns: 0
            attr: "('', [], [])"
            intermediate_head:
              - t: "TableRow"
                attr: "('', [], [])"
                cells: []
            body: []
        foot:
          t: "TableFoot"
          attr: "('', [], [])"
          rows: []
      - t: "RawBlock"
        format: "html"
        text: "\n</div>"

Notice how Pandoc interprets the heading as a section marker, and then the closing </div> tag is misplaced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants