Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a Babel AST mapper #99

Closed
syavorsky opened this issue Dec 15, 2020 · 17 comments
Closed

Provide a Babel AST mapper #99

syavorsky opened this issue Dec 15, 2020 · 17 comments

Comments

@syavorsky
Copy link
Owner

Provide a way to map the parser output into Babel AST as suggested in #93.

@syavorsky
Copy link
Owner Author

@jaydenseric can you point me to the exact spec, so I know for sure what you are asking for. I couldn't spot it from quick look into babel spec doc

@jaydenseric
Copy link
Contributor

The loc field in a Babel AST Node that has a SourceLocation type:

https://github.com/babel/babel/blob/main/packages/babel-parser/ast/spec.md#node-objects

@syavorsky
Copy link
Owner Author

@jaydenseric is this example doing what you are looking for? I am still not sure if this should go into comment-parser itself, but hopefully parsed data provides all you need just in slightly different format

@jaydenseric
Copy link
Contributor

Thanks for looking into it.

I find that example pretty confusing…

  • What is the left pane for? It seems to be used for both JSDoc being parsed, and the parsing logic itself.

  • How come there are two output areas on the right. After looking closely, it seems to relate to the showParsed and showStringified calls at the end of the left pane?

  • What version of comment-parser is this demo for, is it published? The source fields in tags seem to be objects, which is at odds with the readme documentation that says they are strings:

    Screen Shot 2020-12-27 at 11 25 26 am
  • Sometimes source means the source code string, other times it means a detailed object. This makes it hard to grok what is going on.

  • The example seems to show the line and column information for the whole tags, but we need such information for the content of the type, name, etc. To produce this manually it would be pretty complicated to loop all the source parts looking for newlines, etc. Also, it would only be accurate if absolutely all the chars in the source code are present in the parts being stitched back together. Without documentation, is that a safe assumption?

As a side note, it seems comment-parser can't handle multiline type content?

/**
 * @param {{
 *   foo: true
 * }} foo Lorum ipsum.
 */

I'm not 100% sure, but I think it's valid and VS Code can syntax highlight it ok:

Screen Shot 2020-12-27 at 11 21 26 am

@syavorsky
Copy link
Owner Author

@jaydenseric let me rephrase my last comment. Is .source[].tokens sufficient fo constructing babel AST?

@syavorsky
Copy link
Owner Author

syavorsky commented Dec 27, 2020

@jaydenseric this is all about 1.0 branch, check out its README, it should answer some of your questions above.

Having .source[].tokens you should be able to build AST for tag, name, type, etc.

The structure is

parse(source, opts) => Block[]{
  ...
  tags: Spec[]{
    ...
    source: Line[] // arr of refs to `Block.source[]`'s items
  }
  source: Line[]{
    ...
    source: string
    tokens: Tokens {...}
  }
}

all result types live in primitives.ts.
Playground is using most recent 1.0 source, eventually master. I will try to improve the UI to make it less confusing

@syavorsky
Copy link
Owner Author

1.0 went to the master with all updates above. Let's return to this conversation if you find AST conversion is not feasible

@jaydenseric
Copy link
Contributor

jaydenseric commented Jan 23, 2021

As promised, here is the getJsdocBlockTagSpanCodeLocation utility function I came up with in jsdoc-md v9.0.0:

https://github.com/jaydenseric/jsdoc-md/blob/v9.0.0/private/getJsdocBlockTagSpanCodeLocation.js

A JSDoc block tag "span" is a chunk of syntax that holds actual (sometimes multiline) content, vs whitespace separators. The, tag name, type, name group, and description are spans.

@jaydenseric
Copy link
Contributor

Here is the end result:

Screen Shot 2021-01-23 at 12 23 03 pm

Screen Shot 2021-01-23 at 12 23 05 pm

@jaydenseric
Copy link
Contributor

@syavorsky how can I figure out a start and end code location for the main JSDoc comment description? Is it possible just by looking at the comment source array, given that the description tokens for all the following block tags are also in there?

This is needed to solve jaydenseric/jsdoc-md#19 .

@jaydenseric
Copy link
Contributor

Can we please add a descriptionSource array to the parse result?

@syavorsky
Copy link
Owner Author

Take a look at the Block.source items up to the first tag line. You can find where tags start by matching Block.tags[0].source[0].number

@jaydenseric
Copy link
Contributor

jaydenseric commented Feb 1, 2021

Ok, so this took about a week of work to get right in jsdoc-md v9.1.1:

Screen Shot 2021-02-02 at 12 08 33 am

Of of several gotchas is that comment-parser includes leading newlines in a .description value, but my getJsdocSourceTokenCodeLocation would skip those source lines with the description token value of '' and therefore get the start code location too late. Instead of trying to solve this problem (how could you?) I trimmed newlines from the .description value so the code location matches:

https://github.com/jaydenseric/jsdoc-md/blob/02208056f158a2d28be4769d4070a0255b18ff7c/private/jsdocCommentToMember.js#L566-L582

This is not too bad for my use case because markdown content looks better with pointless newlines trimmed anyway. But if you needed a description to be exact for some reason (including start and end newlines), you're in trouble.

Overall, honestly speaking, it's been a nightmare trying to figure out line and column source code locations for the things comment-parser parsed. I really, really, wish comment-parser:

  1. Has a source array for main block descriptions, like tags each have (see https://github.com/jaydenseric/jsdoc-md/blob/v9.1.1/private/jsdocCommentToMember.js#L575).
  2. Had line and column, start and end code location data for every detail comment-parser parses.
  3. For things such as .description, the raw JSDoc content spanning it's code location is available, that includes the * fence in the string. This is important for example for regex searching for inline @link tags, and being able to figure out their line and column code locations based on the regex match index and what the start line and column is for the description they are in (see https://github.com/jaydenseric/jsdoc-md/blob/v9.1.1/private/jsdocDataMdToMdAst.js#L54-L58).
  4. comment-parser can parse @example <caption> and content (see https://github.com/jaydenseric/jsdoc-md/blob/v9.1.1/private/jsdocCommentToMember.js#L387-L491).

@jaydenseric
Copy link
Contributor

Added a few more details to the last comment in edits.

@brettz9
Copy link
Contributor

brettz9 commented Apr 30, 2021

Although it looks like ESLint-friendly AST would the same as Babel in the case of adding custom types, It would also be nice to support exporting VisitorKeys so that esquery (as used in ESLint rules, e.g., to require or prohibit certain comment structures) or estraverse could be utilized with comment-parser.

(Ideally this would also allow optional specification of a jsdoc type parser, like catharsis, jsdoctypeparser, or jsdoc-type-pratt-parser, so that in addition to the raw types, one could also get parsed types, with its VisitorKeys being reexported.)

@brettz9
Copy link
Contributor

brettz9 commented May 5, 2021

As discussed in #117 , I've released https://github.com/es-joy/jsdoccomment which converts comments (with comment types) alone to Babel AST (neglected to mention here), and https://github.com/es-joy/jsdoc-eslint-parser (however inefficiently) iterates all nodes to add a jsdoc property to them containing the relevant detected comment AST.

Note that the detection of the comment for a given structure is not a trivial matter. For example, with:

/* A */
const /* B */ aFunc = /* C */ function () {};

... for the function expression, we might look for the JSDoc Block at point C first, but then if not present, look for it at point A. My parser uses such an algorithm, and this may currently result in the same jsdoc being repeated on two different nodes, e.g., if looking at the node for the aFunc Identifier, it might add a JSDoc Block at point A as well if one is found there.

(I've added this explanation to the parser README and also updated @es-joy/jsdoccomment to indicate the AST comment (and type) structure.)

FWIW, here is the basic structure:

  1. {type: 'anyESNodeType...', jsdoc: {type: 'JSDocBlock', ...}}
  2. {type: 'JSDocBlock', tags: [{type: 'JSDocTag', ...}], descriptionLines: [{type: 'JSDocDescriptionLine', ...}], lastDescriptionLine: aNumber, /* Then these unmodified comment-parser ones */ description, delimiter, postDelimiter, end}
  3. {type: 'JSDocTag', parsedType: {type: 'oneOfTheJSDocTypeParserTypes--seebelow'}, descriptionLines: [{type: 'JSDocDescriptionLine', ...}], typeLines: [{type: 'JSDocTypeLine', ...}], /* 'type' from 'comment-parser' renamed to avoid conflict: */ rawType, along with other comment-parser types besides end}
  4. {type: 'JSDocDescriptionLine', delimiter, postDelimiter, start, description}
  5. {type: 'JSDocTypeLine', /* Renamed to avoid conflict*/ rawType, delimiter, postDelimiter, start}

And the jsdoctypeparser types behave as in jsdoctypeparser, but I have renamed their node type so that all are now prefixed with JSDocType and camel-cased, so, e.g.,INSTANCE_MEMBER becomesJSDocTypeInstanceMember.

Note that this is all fairly experimental, and may change. We may also need some pointing to where the jsdoc block actually is present, and the user's indent is not currently preserved.

But I thought the AST does at least get us started in allowing any possible precise targeting that might be desired, from individual tag lines to even multi-line types as well as descriptions--preserving all the detail comment-parser thankfully exposes. Feedback is welcome (probably best on the relevant project than cluttering discussions here).

@brettz9
Copy link
Contributor

brettz9 commented May 15, 2021

Btw, I'm also thinking of removing @ at the beginning of tag in the transformed AST. I think it may be referenced too frequently to have to strip that off within selectors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants