Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gatsby-remark-images): Use babel, and fallback to cheerio #21782

Conversation

KoltesDigital
Copy link
Contributor

Description

New take for supporting React within MDX.

This first tries to find and replace img tags using Babel with React support. If it's not valid JSX, it fails and uses the current algorithm using Cheerio.

Pro: This fully supports JSX syntax, so attributes like <Component attr={func()}> will be kept, whereas before this patch this code would result in <component attr="{func()"> and therefore would have broken React bindings.

Con: This might slow down processing of current working pages, as each HTML blocks will first go through Babel, fail early, and go through Cheerio. But I haven't made any benchmark.

Documentation

I guess a mention on https://www.gatsbyjs.org/docs/working-with-images-in-markdown/#using-the-mdx-plugin should be enough.

Related Issues

Fixes #19785

@KoltesDigital KoltesDigital requested a review from a team as a code owner February 26, 2020 23:23

return resolve(node)
return useBabel().catch(() => useCheerio())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given the potential perf hit (we should benchmark) - would it be feasible to just check node.type and if it's html use cheerio and if it's jsx use babel?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use something like https://www.npmjs.com/package/benchmark to check this vs current implementation using input that works now with cheerio (so using some remark AST fixture as input - likely couple of those). I'm very not comfortable with this change as-is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I haven't even thought about that 👍

Copy link
Contributor Author

@KoltesDigital KoltesDigital Feb 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. But I let cheerio for malformed JSX. Is it worth it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if MDX always sets HTML blocks as JSX, or if it somehow distinguishes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I don't know either. @johno halp ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as for "malformatted" jsx part - wouldn't gatsby-plugin-mdx throw exception even before calling gatsby-remark-images if there was malfortmatted input that mdx can't handle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess MDX would first try to parse any block as JSX, and if it fails it considers the block as regular text. My supposition is based on this (supposedly working) snippet:

---
title: Hello World
---

import FooBar from 'foobar';

Hello world!

<FooBar />

Then it would make sense to consider that blocks of type jsx would hold valid JSX, and therefore to only parse them using Babel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump @johno

@KoltesDigital KoltesDigital requested a review from a team as a code owner March 15, 2020 20:30
@KoltesDigital
Copy link
Contributor Author

I didn't take into account that img elements could be inside a elements, and while it's handled for HTML, JSX requires another way to do.

I've also added the ability to force a link or no link, via the img's class. Because it is supported by HTML and JSX, I added it in this PR.

Regarding performance hit, all the JSX tests and my actual use cases are handled by useBabel, i.e. none of them fail at parsing and fallback to useCheerio. I still let the fallback because I can't guarantee the inputs, but I assume it shouldn't be used.

@KoltesDigital
Copy link
Contributor Author

Up

@pieh
Copy link
Contributor

pieh commented Mar 20, 2020

I've also added the ability to force a link or no link, via the img's class. Because it is supported by HTML and JSX, I added it in this PR.

Can you expand a bit on this - why this is needed? this seems to add a whole lot complexity?

@KoltesDigital
Copy link
Contributor Author

I'm designing a site with blog posts, some of them contains images whose layout has been purposefully set, that's why I'm using MDX. Some of the images link to external websites. With the current behavior, all the images would have been linked, some of them to external websites, the others to the original themselves. I think this would be confusing for visitors and I have preferred to disable links on the latters.

Do you see any way to achieve that otherwise? I think I can't put the plugin twice in the config list, because it's really a layout for particular posts (I'm already using multiple filesystem sources to map to different content types with different templates).

The last commit has the effects:

  1. to change generateImagesAndUpdateNode to take an order (linkToOriginal) instead of a state description (inLink). This allows each image to have a link or not, in relation to being able to force the link or not.
  2. to not add nodes with no value in rawHtmlNodes / rawJsxNodes from the start. Beforehand, they would have been added, then mapped to Promise.resolve() and discarded by .filter(node => !!node), so it's useless.
  3. to retrieve image's alt and title attributes in HTML nodes only when it's necessary.

@KoltesDigital
Copy link
Contributor Author

Ok I achieved the same without forcing links. I retrieve images' srcSet manually, actually it somewhat duplicates what the plugin does.

I'm closing this PR and will create two, one for the Babel fix itself, and one for the new feature (with force stuff).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'gatsby-remark-images inside gatsby-plugin-mdx` breaking styled-components...
2 participants