Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use babel-parse for import/export/jsx blocks #345

Closed
johno opened this issue Dec 6, 2018 · 13 comments
Closed

Use babel-parse for import/export/jsx blocks #345

johno opened this issue Dec 6, 2018 · 13 comments
Assignees
Labels
💬 type/discussion This is a request for comments 💎 v1 Issues related to v1

Comments

@johno
Copy link
Member

johno commented Dec 6, 2018

This issue is intended to discuss bringing in @babel/parse to handle ES/JSX parsing.

I've discussed this a few times with @timneutkens and @ChristopherBiscardi in passing but it's something we haven't been prioritizing. Over the last week I've been thinking about it more heavily after @jescalan reported a bunch of new parse issues.

We've reached a point where most open issues with the bug label are related to parsing exports and JSX. So we ultimately need to be able to parse import/export/jsx nodes to handle the following scenarios:

  • Parse out the default export since that maps to the layout
  • Walk a JSX node to its completion (right now newlines aren't properly handled)
  • Handle all use cases where import/export/JSX blocks aren't separated by empty lines
  • Allow (eventually) for the interleaving of MDX in JSX blocks

There might be a small performance hit, but since it's build time it's not something I'm personally worried about. The big bummer will be the heavier bundle, but that really only comes into play for browser runtimes (which isn't really our targeted env anyway).

  • Anyone have any thoughts, insights, or FUD about this?
  • Is there anyone that wants to take this on?
@johno johno added the 💬 type/discussion This is a request for comments label Dec 6, 2018
@silvenon
Copy link
Contributor

silvenon commented Dec 6, 2018

I really like this idea because that way we can finally gain the much needed stability. I brought it up before, but a concern for performance was raised, so I'm glad to see interest in it again.

Other than the scenarios in your list, I wonder if we could also solve parsing inline JSX with this approach. (#222)

My FUD is that we have to parse the content with two different parsers that are mutually incompatible, I'm not sure know how we can combine them in a stable way. E.g. if we're trying to solve the issue with newlines, how do we know how many lines to parse with Babel until we start parsing with remark again?

@johno
Copy link
Member Author

johno commented Dec 6, 2018

My FUD is that we have to parse the content with two different parsers that are mutually incompatible

Similarly to how we're extending the block parser currently, we'll be able to continue doing the same thing.

if we're trying to solve the issue with newlines, how do we know how many lines to parse with Babel until we start parsing with remark again

My first naive approach might be to continue walking the document until the babel parser doesn't error. When that occurs we know we either have a completed JSX block or we've reached the end of the document and there's a syntax error.


This will become less of an issue after micromark, but that's in its very early stages so we can't really wait on it when we can get this stable now. Swapping out remark for micromark down the road won't require an API change for the end user.

@silvenon
Copy link
Contributor

silvenon commented Dec 6, 2018

My first naive approach might be to continue walking the document until the babel parser doesn't error. When that occurs we know we either have a completed JSX block or we've reached the end of the document and there's a syntax error.

That's a great idea, we can split the file content based on two or more consecutive newlines and keep adding those chunks until we don't get an error. Works for me!

@jonsherrard
Copy link

jonsherrard commented Dec 10, 2018

Hey @johno

I wrote an mdx-to-mdx-ast package here: https://github.com/devular/mdx-to-mdx-ast

It uses the Acorn JS parser that powers babel to verify nodes are import, export, or JSX elements.

I'm not sure if it's of any use to you, I mostly wrote it for fun and to learn a bit about the MDX spec and ASTs.

Thanks for all your work on MDX, there's so many opportunities around content authoring, an amazing library. 🙏

@jonsherrard
Copy link

(Oh yes - I'd be happy to take this on, if people want to give me examples of:

Handle all use cases where import/export/JSX blocks aren't separated by empty line

or any other case the mdx-ast generator should handle.

@wooorm
Copy link
Member

wooorm commented Dec 10, 2018

@silvenon @johno I remember @ChristopherBiscardi tweeting somewhere that neither babel nor remarkrehype complains about invalid input, in his work on working to bridge Gatsby and MDX. Chris, any thoughts on this?

@johno
Copy link
Member Author

johno commented Dec 10, 2018

This is amazing @jonsherrard!

We'd love for you to take this on if you're interested. Perhaps you could coordinate a bit with @wooorm on how to merge work into remark-mdx? Ideally it'd be great to get all the special MDX parsing to live together in there and consumed by MDX core itself.

neither babel nor rehype complains about invalid input

😞. I imagine that means we need to walk the AST to ensure that the JSX block is closed then? Even if it doesn't complain we can ensure that the block is closed/self-closing.

@jonsherrard
Copy link

Hi @johno, yeah I'd love to. @wooorm where's the best place to start (if you're happy to!)

Also @johno - what's the issue with invalid input? I'm not sure I've grokked the invalid cases. Do you have an example?

@johno
Copy link
Member Author

johno commented Dec 10, 2018

This evening I can start a branch with failing tests that we can use for you to dev against. But for now, here are a few off the top of my head:

1. parsing out exports

The following breaks because we don't find the export since we expect an empty newline before the export.

import Foo from './bar'
export default Foo

2. allowing other types of exports

We currently error when encountering exports like this, but we should really handle them properly:

export { default } from './foo'

3. empty lines in JSX blocks

The following won't properly parse because of the empty line before the return:

<Component>
  {renderProps => {
    const data = doStuffWith(renderProps)

    return (
      <Graph data={data} />
    )
  }
</Component>

4. JSX blocks that contain numbers are ignored

<Component2 />

Related issues

@jonsherrard
Copy link

jonsherrard commented Dec 10, 2018

Cheers!

  1. Acorn will parse it into two ES expressions, one import and one export, so I just need to find a unist utility that allows me to replace a single AST node with two nodes at the same depth ☑️

  2. Works as expected if I add it to my fixture with acorn ✅

  3. I'll take a look at prettier's recent additions to see how they group funky whitespace like that. 🤔

  4. Works as expected if add it to my fixture with acorn ✅

Bonus. Supports variable declaration named exports as well: #341

export const meta = {
  title: "Testing 123",
  description: "This is a default description"
};

On bundlesize: Bundlephobia reports 95kb for Acorn and Acorn-JSX 🤔

@ChristopherBiscardi
Copy link
Member

@wooorm @johno Probably referring to these tests: ChristopherBiscardi/gatsby-mdx@97c4e36 where I was having trouble figuring out which combination of rehype, babel, etc would fail for different combinations of content with the intent of figuring out a way to differentiate jsx from html and convert html to jsx.

If you look at the PR I switched to an approach that just touches attributes and handles capitalization/to-object on styles though babel after the entire mdx process runs rather than try to run on segments of code inside the AST. Not sure any of that will help here

@jonsherrard
Copy link

jonsherrard commented Dec 11, 2018

Update so far, I'm still in the playground phase over at: https://github.com/devular/mdx-to-mdx-ast

Handle all use cases where import/export/JSX blocks aren't separated by empty lines

Remark

remark-parse will return

import Foo from './foo'
export default foo

as a single MDAST node.

Acorn

  1. Acorn will parse the same example into two javascript expressions represented by the Acorn AST.
  2. We now have to robust AST nodes (albeit Acorn AST nodes), that represent the two declarations.
  3. Using Unist Flatmap, we can return multiple nodes to our new MDXAST tree from a single node of our MDAST tree.
  4. I did think about constructing the new `import Foo from './foo', by hand using the object, but thought better of it and installed escodegen, to robustly create ES compatible code.

So now we have two solid solutions:

-> Acorn parsing of strings to a JS AST
<- ESCodegen to create strings of ES2015 JavaScript for our MDXAST

You can see this handled here: https://github.com/devular/mdx-to-mdx-ast/blob/master/handle-multi-program-nodes.js

Reservations

Using https://github.com/zeit/ncc, the bundle is a whopping 287kb, now that we've added acorn, acorn-jsx, and escodegen. There may be optimisations here through tree-shaking, or potentially this a point where I look at babel again. It might have codegen and JS parsing built in, and most projects will have it installed already.

I'd like to use this in the browser at some point... that's pretty huge - but I was thinking for a CMS - slightly more palatable.

There's now a lot going in here JS parsing, and JS codegen, Markdown parsing, and remark,rehype plugin execution.... it feels like a lot, but that's maybe how robust it needs to be.

johno added a commit that referenced this issue Feb 12, 2019
This introduces babel for better handling import and
export parsing. Anything that is an import/export block
will be parsed by babel and split up ensuring that
edge cases will no longer occur based on how the
JS is grouped. It also makes parsing the export default
nodes more robust.

- #345
- #340
johno added a commit that referenced this issue Feb 14, 2019
* Improve import and export parsing with babel

This introduces babel for better handling import and
export parsing. Anything that is an import/export block
will be parsed by babel and split up ensuring that
edge cases will no longer occur based on how the
JS is grouped. It also makes parsing the export default
nodes more robust.

- #345
- #340

* Add multiline fixtures

* Preserve positional information for AST nodes

In order to maintain positional metadata for AST nodes
that are parsed, we need to ensure that each import/export
is eaten in order as a correct subvalue.

Since the remark eat function requires the string to be
consumed in order, with no missing characters, the babel
plugin now returns the start value which is used to sort
the nodes and partition the input string. They're then
combined again and returned as nodes that can be eaten
in order by the tokenizer.

This also comes with the benefit of no longer changing the
input code which occurs with the prior babel generator.

* Remove unneeded babel generator dep

* Simplify partition function

* Make naming consistent

* Add snapshot test to verify positional info
@johno johno pinned this issue Feb 15, 2019
johno added a commit that referenced this issue Feb 18, 2019
* Improve import and export parsing with babel

This introduces babel for better handling import and
export parsing. Anything that is an import/export block
will be parsed by babel and split up ensuring that
edge cases will no longer occur based on how the
JS is grouped. It also makes parsing the export default
nodes more robust.

- #345
- #340

* Add multiline fixtures

* Preserve positional information for AST nodes

In order to maintain positional metadata for AST nodes
that are parsed, we need to ensure that each import/export
is eaten in order as a correct subvalue.

Since the remark eat function requires the string to be
consumed in order, with no missing characters, the babel
plugin now returns the start value which is used to sort
the nodes and partition the input string. They're then
combined again and returned as nodes that can be eaten
in order by the tokenizer.

This also comes with the benefit of no longer changing the
input code which occurs with the prior babel generator.

* Remove unneeded babel generator dep

* Simplify partition function

* Make naming consistent

* Add snapshot test to verify positional info
@johno johno added the 💎 v1 Issues related to v1 label Feb 26, 2019
@johno johno self-assigned this Feb 26, 2019
@johno
Copy link
Member Author

johno commented Mar 6, 2019

import/export parsing have landed in the alpha release so we can track block parsing in #195

@johno johno closed this as completed Mar 6, 2019
@johno johno unpinned this issue Mar 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💬 type/discussion This is a request for comments 💎 v1 Issues related to v1
Development

No branches or pull requests

5 participants