Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid HTML syntax together with rehype-raw causes crash #833

Closed
4 tasks done
EvHaus opened this issue Jun 16, 2024 · 6 comments
Closed
4 tasks done

Invalid HTML syntax together with rehype-raw causes crash #833

EvHaus opened this issue Jun 16, 2024 · 6 comments
Labels
💪 phase/solved Post is done

Comments

@EvHaus
Copy link

EvHaus commented Jun 16, 2024

Initial checklist

Affected packages and versions

react-markdown@9.0.1 + rehype-raw@7.0.0

Link to runnable example

https://stackblitz.com/edit/github-nbnqac?file=src%2Fapp.tsx

Steps to reproduce

When react-markdown is used together with rehype-raw and invalid HTML is provided inside the markdown content, it causes a crash with:

TypeError: Cannot read properties of null (reading 'tagName')
    at Tokenizer._stateTagName (index.js:1095:1)
    at Tokenizer._callState (index.js:586:1)
    at Tokenizer._runParsingLoop (index.js:223:1)
    at Tokenizer.write (index.js:248:1)
    at handleRaw (index.js:349:1)
    at one (index.js:108:1)
    at Object.handle (index.js:100:1)
    at all (index.js:156:1)
    at root (index.js:172:1)
    at one (index.js:108:1)

See Stackblitz link for a repro, or use this code:

import Markdown from 'react-markdown';
import rehypeRaw from 'rehype-raw';

const markdownSource = `
Note how this markdown has invalid HTML below. The closing "summary" tag is missing a closing angle bracket.

<details>
<summary>Complete logs</summary
</details>
`;

function App() {
  return <Markdown rehypePlugins={[[rehypeRaw]]}>{markdownSource}</Markdown>;
}

export default App;

Expected behavior

No crash. Ideally rehype-raw could be smart enough to auto-close the tag (like GitHub's markdown editor does), but if that's too difficult, simply not rendering that element is fine too.

Actual behavior

Code crashes with the error above.

Runtime

Node v17

Package manager

npm 8

OS

Linux

Build and bundle tools

Webpack

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Jun 16, 2024
@wooorm
Copy link
Member

wooorm commented Jun 17, 2024

Can also be reproduced by pasting that code at https://remarkjs.github.io/react-markdown/

@wooorm
Copy link
Member

wooorm commented Jun 17, 2024

Can also be reproduced with just the utilities:

import assert from 'node:assert/strict'
import {fromMarkdown} from 'mdast-util-from-markdown'
import {toHast} from 'mdast-util-to-hast'
import {raw} from 'hast-util-raw'
import {toHtml} from 'hast-util-to-html'

const document = `
Note how this markdown has invalid HTML below. The closing "summary" tag is missing a closing angle bracket.

<details>
<summary>Complete logs</summary

</details>
`

const mdast = fromMarkdown(document)
const hast = toHast(mdast, {allowDangerousHtml: true})
assert(hast.type === 'root')

const reformatted = raw(hast)

console.log(reformatted)
console.log(toHtml(reformatted))
file:///Users/tilde/Projects/oss/hast-util-raw/node_modules/parse5/dist/tokenizer/index.js:1095
                token.tagName += String.fromCodePoint(isAsciiUpper(cp) ? toAsciiLower(cp) : cp);
                ^

TypeError: Cannot read properties of null (reading 'tagName')
    at Tokenizer._stateTagName (file:///Users/tilde/Projects/oss/hast-util-raw/node_modules/parse5/dist/tokenizer/index.js:1095:17)
    at Tokenizer._callState (file:///Users/tilde/Projects/oss/hast-util-raw/node_modules/parse5/dist/tokenizer/index.js:586:22)
    at Tokenizer._runParsingLoop (file:///Users/tilde/Projects/oss/hast-util-raw/node_modules/parse5/dist/tokenizer/index.js:223:22)
    at Tokenizer.write (file:///Users/tilde/Projects/oss/hast-util-raw/node_modules/parse5/dist/tokenizer/index.js:248:14)
    at handleRaw (file:///Users/tilde/Projects/oss/hast-util-raw/lib/index.js:345:26)
    at one (file:///Users/tilde/Projects/oss/hast-util-raw/node_modules/zwitch/index.js:108:17)
    at Object.handle (file:///Users/tilde/Projects/oss/hast-util-raw/lib/index.js:96:7)
    at all (file:///Users/tilde/Projects/oss/hast-util-raw/lib/index.js:152:13)
    at root (file:///Users/tilde/Projects/oss/hast-util-raw/lib/index.js:168:3)
    at one (file:///Users/tilde/Projects/oss/hast-util-raw/node_modules/zwitch/index.js:108:17)

@wooorm
Copy link
Member

wooorm commented Jun 17, 2024

Even smaller, and making the exception disappear:

/**
 * @typedef {import('hast').Root} Root
 */

import {raw} from 'hast-util-raw'
import {toHtml} from 'hast-util-to-html'

/** @type {Root} */
const hast = {
  type: 'root',
  children: [
    {type: 'raw', value: '<details>\n<summary>Complete logs</summary'},
    {type: 'text', value: '\n'},
    {type: 'raw', value: '</details>'}
  ]
}

const reformatted = raw(hast)

console.log(reformatted)
console.log(toHtml(reformatted))

Removing the {type: 'text', value: '\n'}, makes the exception disappear.

We intentionally do want to support a {type: 'raw', value: '<sum'}, {type: 'raw', value: 'mary>'} to work.
So we have to allow “partial”/“broken” HTML.
But, when a text node appears, we can perhaps reset that.

This comment has been minimized.

@wooorm wooorm added the 💪 phase/solved Post is done label Jun 17, 2024
@github-actions github-actions bot removed the 🤞 phase/open Post is being triaged manually label Jun 17, 2024
@wooorm
Copy link
Member

wooorm commented Jun 17, 2024

released!

@EvHaus
Copy link
Author

EvHaus commented Jun 17, 2024

Awesome thanks. Confirmed that hast-util-raw@9.0.4 fixes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 phase/solved Post is done
Development

No branches or pull requests

2 participants