Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gatsby-transformer-remark] Section titles with code in ``s creates escaped HTML <code> tags #13608

Open
Ameobea opened this issue Apr 24, 2019 · 11 comments

Comments

Projects
None yet
5 participants
@Ameobea
Copy link

commented Apr 24, 2019

Description

When using the automatic TOC generation from gatsby-transformer-remark with markdown section titles that contain code in backticks, the generated TOC heading contains escaped HTML <code> tags that show up in the document. Instead of actually emitting the <code> tag in the HTML string, it's escaped and the HTML is rendered as text.

Steps to reproduce

Create a markdown remark with a heading like this:

## Generating TOCs with `gatsby-transformer-remark`

This will generate the following HTML string for the TOC:

<ul>\n<li><a href=\"/demo/#generating-tocs-with-code-classlanguage-textgatsby-transformer-remarkcode\">Generating TOCs with &#x3C;code class=\"language-text\">gatsby-transformer-remark&#x3C;/code></a></li>\n</ul>

Expected result

The <code> tag shouldn't be escaped and should be emitted directly

Actual result

The <code> tag is escaped, causing it to be rendered as text rather than as HTML.

Environment


  System:
    OS: macOS 10.14.3
    CPU: (12) x64 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
    Shell: 5.3 - /bin/zsh
  Binaries:
    Node: 10.15.0 - ~/.nvm/versions/node/v10.15.0/bin/node
    Yarn: 1.15.2 - /usr/local/bin/yarn
    npm: 6.4.1 - ~/.nvm/versions/node/v10.15.0/bin/npm
  Languages:
    Python: 2.7.10 - /usr/bin/python
  Browsers:
    Chrome: 73.0.3683.103
    Safari: 12.0.3
  npmPackages:
    gatsby: ^2.1.17 => 2.1.17 
    gatsby-cli: ^2.4.8 => 2.4.11 
    gatsby-image: ^2.0.29 => 2.0.29 
    gatsby-plugin-feed: ^2.0.14 => 2.0.14 
    gatsby-plugin-google-analytics: ^2.0.8 => 2.0.14 
    gatsby-plugin-manifest: ^2.0.5 => 2.0.19 
    gatsby-plugin-offline: ^2.0.11 => 2.0.24 
    gatsby-plugin-react-helmet: ^3.0.0 => 3.0.6 
    gatsby-plugin-sass: ^2.0.11 => 2.0.11 
    gatsby-plugin-sharp: ^2.0.15 => 2.0.22 
    gatsby-plugin-typescript: ^2.0.1 => 2.0.9 
    gatsby-plugin-typography: ^2.2.1 => 2.2.7 
    gatsby-remark-images: ^3.0.1 => 3.0.5 
    gatsby-remark-prismjs: ^3.1.4 => 3.2.4 
    gatsby-source-filesystem: ^2.0.4 => 2.0.22 
    gatsby-source-spotify: ^1.0.1 => 1.0.1 
    gatsby-transformer-csv: ^2.0.4 => 2.0.7 
    gatsby-transformer-json: ^2.1.5 => 2.1.8 
    gatsby-transformer-remark: ^2.1.15 => 2.2.6 
    gatsby-transformer-sharp: ^2.1.4 => 2.1.14 
@Ameobea

This comment has been minimized.

Copy link
Author

commented Apr 24, 2019

This looks like the same thing as #5764, but I'm not using the gatsby-remark-autolink-headers plugin. I am using gatsby-remark-prismjs though. I tried putting it at the end of the plugins array, but the same thing happened.

Here's my gatsby-config.js file: https://github.com/Ameobea/homepage/blob/master/gatsby-config.js

@d4rekanguok

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

gatsby-remark-prismjs is indeed the culprit here... it looks like table of content is always generated after all plugins have applied to the markdown ast though -- perhaps there is a way to tell gatsby-remark-prismjs to ignore inline code in headings?

@d4rekanguok

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

I made a dirty edit of gatsby-remark-prismjs that ignore inline code in headings, you can give it a test:

yarn add d4rekanguok/gatsby-remark-prismjs-no-heading

and replace gatsby-remark-prismjs in your config with gatsby-remark-prismjs-no-heading.

All it does is replace unist-util-visit with unist-util-visit-parents, so we can get some information about the direct parent & do nothing if it's inside a heading block.

pinging @pieh, what do you think about this approach? It could be a bit more generalized i.e user can pass in an option to ignore inline code in certain elements etc.

@Ameobea

This comment has been minimized.

Copy link
Author

commented Apr 24, 2019

@d4rekanguok yep that fixed it for me! Thanks so much. Looking forward to seeing the fix upstream 👍

@pieh

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

@d4rekanguok I'm not sure I like the idea - as it would remove <code> completely from heading text (not only from ToC links / heading ids)? (i.e. in https://www.gatsbyjs.org/docs/migrating-from-v1-to-v2/#what-well-cover)

I'm not sure if there's better alternative tho

@d4rekanguok

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

@pieh We'd still keep the <code> in headings, since toHAST turn inline markdown code to <code> anyway (remark-prismjs turns them into html node - which cause the issue in the first place); but it's a bit of annoying because the styling that comes with remark-prismjs is gone:

image

But it looks like all it does is attaching a class name to inline code -- could we just add a node.data.hProperties to the markdown node in remark-prismjs, so toHAST can attach the class name? This way, toc won't get raw html <code>

@d4rekanguok

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

That works, I managed to retain the class name generated by remark-prismjs for inline code inside headings while toc is untouched

image

If we could rewrite highlight-code.js so that it retains the markdown format, I think we could actually keep code highlighting inside header as well! No actually that won't work, I'm stumped at making inline highlight work for now

I'm specifically targeting heading though, so it feels a bit hacky...

if (parent.type === `heading`) {
  if (!node.data) node.data = {};
  node.data.hProperties = { className: [className] }
}
else {
  node.type = `html`;
  node.value = `<code class="${className}">${highlightCode(languageName, node.value)}</code>`;
}
@pieh

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

I think we potentially try to approach this from wrong direction - right now we are trying to workaround that < become &#x3C; and therefore code and class and classname end up in toc - should we rather try to fix that (if not possible in our code, send PR to upstream packages)?

@d4rekanguok

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

oh, do you mean to strip out html code during mdast to toc transformation? That would certainly be a lot cleaner

@BarryThePenguin

This comment has been minimized.

Copy link

commented Apr 25, 2019

I don't have much knowledge of gatsby, or the gatsby ecosystem, so I'm looking at this issue with a 10,000 foot view

In regards to syntax highlighting, remark-highlight.js makes the following suggestion

This package integrates with remark-html. It may be better to work with rehype, which is specifically made for HTML, and to use rehype-highlight instead of this package.

I'm wondering if this issue stems from a similar circumstance?

Looking at gatsby-remark-prismjs, it looks like it operates on the markdownAst. If it used the htmlAst instead, you could do use rehype-prism or rehype-highlight to add syntax highlighting

@d4rekanguok

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

@BarryThePenguin Thank you for the quick response, and great suggestions!

rehype-prism looks really interesting. However in the current Gatsby remark plugin system, it's not possible for user to apply plugins to the transformed htmlAst.

I was exploring the option of turning gatsby-remark-prismjs to retain markdownAst structure, so that no html node is passed into mdast-util-toc, but even if this works for remark-prismjs, other plugins from the community may still add html node to the markdownAst and face the same issue.

Because of this, I was hoping to be able to resolve html nodes with mdast-util-toc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.