Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text in <pre> lost when converting <table> in markdown to other formats #9305

Closed
mohd-akram opened this issue Jan 5, 2024 · 4 comments
Closed
Labels

Comments

@mohd-akram
Copy link

Explain the problem.
Text inside <pre> tags in a markdown <table> seems to be lost when converting the file to other formats.

To reproduce, create table.md:

<table>
  <tbody>
    <tr>
      <td>first column</td>
      <td><pre>second column</pre></td>
    </tr>
  </tbody>
</table>

Run pandoc -t plain table.md.

Expected output:

first column
second column

Actual output:

first column

Removing the <pre> tags or using other tags results in the correct output.

Pandoc version?
3.1.11

@mohd-akram mohd-akram added the bug label Jan 5, 2024
@jgm
Copy link
Owner

jgm commented Jan 5, 2024

This is all expected behavior.
You can see how pandoc parses this by doing pandoc -f markdown -t native:

% pandoc -f markdown -t native
<table>
  <tbody>
    <tr>
      <td>first column</td>
      <td><pre>second column</pre></td>
    </tr>
  </tbody>
</table>
^D
[ RawBlock (Format "html") "<table>"
, RawBlock (Format "html") "<tbody>"
, RawBlock (Format "html") "<tr>"
, RawBlock (Format "html") "<td>"
, Plain [ Str "first" , Space , Str "column" ]
, RawBlock (Format "html") "</td>"
, RawBlock (Format "html") "<td>"
, RawBlock (Format "html") "<pre>second column</pre>"
, RawBlock (Format "html") "</td>"
, RawBlock (Format "html") "</tr>"
, RawBlock (Format "html") "</tbody>"
, RawBlock (Format "html") "</table>"
]

When you are producing fomats other than html, all the raw HTML blocks will be ignored. So the only thing that will come out are the contents of HTML elements, other than verbatim ones like <pre>.
See the docs on markdown_in_raw_html_blocks.

I think those docs should be amended slightly, adding pre to style, script, and textarea.

@jgm jgm added docs and removed bug labels Jan 5, 2024
@jgm jgm closed this as completed in b3a471b Jan 6, 2024
@mohd-akram
Copy link
Author

Thanks for updating the docs (and for the -t native tip). Can I ask why <pre> is included in this list? Is it because it has no direct mapping in pandoc? I see there's a CodeBlock but that seems to imply <pre><code>.

@jgm
Copy link
Owner

jgm commented Jan 6, 2024

Because the contents of pre are meant to be verbatim text and hence shhould not be interpreted as markdown.

@jgm
Copy link
Owner

jgm commented Jan 6, 2024

See also #2716.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants