Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

< and > in code blocks are parsed as tags/components #7

Closed
squidfunk opened this issue Feb 4, 2020 · 2 comments
Closed

< and > in code blocks are parsed as tags/components #7

squidfunk opened this issue Feb 4, 2020 · 2 comments
Assignees

Comments

@squidfunk
Copy link
Contributor

@squidfunk squidfunk commented Feb 4, 2020

I just realized that when a code block contains < or > characters, it is parsed as a tag, probably by htm.

A reproducible case:

import { h } from "preact"
import { htmdx } from "htmdx"

const markdown = `
\`\`\` json
{
  ...,
  "authors": [
    "John Doe <john.doe@example.com>"
  ],
  ...
}
\`\`\``

console.dir(htmdx(data, h), { depth: null })

Yields:

[ 'pre',
  { type: 'code',
    props:
     { className: 'language-json',
       children:
        [ '{\n  ...,\n  "authors": [\n    "John Doe ',
          { type: 'john.doe@example.com',
            props: { children: '"\n  ],\n  ...\n}' },
            key: null,
            ref: null,
            __k: null,
            __: null,
            __b: 0,
            __e: null,
            __d: null,
            __c: null,
            constructor: undefined } ] },
    key: undefined,
    ref: undefined,
    __k: null,
    __: null,
    __b: 0,
    __e: null,
    __d: null,
    __c: null,
    constructor: undefined } ]

The type of the node is john.doe@example.com. Furthermore, the pre is messed up.

@squidfunk

This comment has been minimized.

Copy link
Contributor Author

@squidfunk squidfunk commented Feb 4, 2020

Interestingly, marked seems to do the right thing.

When I step through the code:

function decodeHTML(m: string): string {

m contains:

<pre><code class="language-json">{
  ...,
  &quot;authors&quot;: [
    &quot;John Doe &lt;john.doe@example.com&gt;&quot;
  ],
  ...
}</code></pre>

At the end of decodeHTML, m contains:

<pre><code class="language-json">{
  ...,
  "authors": [
    "John Doe <john.doe@example.com>"
  ],
  ...
}</code></pre>

The encoding has been undone. I believe that the decoding must happen after parsing with htm, probably on the tree of virtual nodes. Otherwise bracketed content may be mistaken for tags.

@michael-klein michael-klein self-assigned this Feb 6, 2020
@michael-klein

This comment has been minimized.

Copy link
Owner

@michael-klein michael-klein commented Feb 6, 2020

Thanks for your report! The issue was that the decoding itself was in fact broken (the regex). I pushed a fix and published it to npm :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.