Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source map support #163

Merged
merged 21 commits into from
Oct 18, 2021
Merged

Source map support #163

merged 21 commits into from
Oct 18, 2021

Conversation

Mingun
Copy link
Member

@Mingun Mingun commented Jun 1, 2021

Draft PR for getting feedback about general conception.

Probably will fix #105, but I need to learn how I can test that. I've never working with source maps before so I need learn how to use it for debugging. If anyone have a link to a good course, please let me known.

Notes about implementation: I'm not sure that introducing a new output type is a good idea. Maybe a new option sourceMap: true will be better?

@humanchimp
Copy link

thanks for working on this. as far as grokking sourcemaps goes, I've seen a bunch of different visualization tools, but this is pretty groovy, IMO: https://evanw.github.io/source-map-visualization/

@Mingun
Copy link
Member Author

Mingun commented Jun 26, 2021

Thanks for the link! Visualization looks great, but unusable, unfortunately. The author, it seems, forgot to provide a way to upload source files. The generated file and it map, obviously, is not enough. But I found another same-named project -- https://sokra.github.io/source-map-visualization.

@o-alexandre-felipe
Copy link

o-alexandre-felipe commented Jul 8, 2021

@Mingun,

Thank for the for working in the much better approach, looking forward to having this feature. When do you think this will be ready?

Looking into your implementation I have a few doubts.

  1. The SourceNode will be able to map the statements when you give a block of code? (I imagined you would have to tokenize it before)
    push(new SourceNode(
      ast.topLevelInitializer.codeLocation.start.line,
      node.codeLocation.start.column - 1,
      options.grammarSource,
      ast.topLevelInitializer.code,
      node.type
    ));
  1. It seems that your function definition
function buildFunc(a, i) {
      return new SourceNode(
        null, null, options.grammarSource,
        [
          "\n  var " + f(i) + " = function(" + a.params.join(", ") + ") {",
          a.source,
          "};"
        ],
        f(i)
      );
    }

Will align incorrectly the action code with the beginning of the line (before var), the intention wasn't something like this?

new SourceNode(null, null, null, [
    "\n var " + f(i) + " = function(" a.params.join(", ") + ") {",
    new SourceNode(null, null, options.grammarSource, a.source),
    "};"
  ],
  a.source
); 
  1. I see that you are assigning source maps to the javascript (precisely the part I ignored in my solution), but where are you mapping the rests? I can't see where you set the source maps for Rules/Literals and classes (regular expressions).

@Mingun
Copy link
Member Author

Mingun commented Jul 8, 2021

The SourceNode will be able to map the statements when you give a block of code? (I imagined you would have to tokenize it before)

Yes, that's what it's designed for.

Will align incorrectly the action code with the beginning of the line (before var), the intention wasn't something like this?

As you can see in the regenerated parser.js, indentation is correct.

I can't see where you set the source maps for Rules/Literals and classes (regular expressions).

Only code blocks need source map because all other pieces of grammar you cannot debug.


I plan to finish work on this after #167 will merge in order to avoid merge conflicts in CLI, but @hildjj taken a little vacation, I think. Actually, locally I've already done some job and fixed some moments after testing with visualizer. Actually, this work can be finished in couple days. I think I'll do it this weekend.

@hildjj
Copy link
Contributor

hildjj commented Jul 8, 2021

Yes, sorry, I've had other things I was working on. I'll circle back around to 167 in the next few days.

@o-alexandre-felipe
Copy link

o-alexandre-felipe commented Jul 8, 2021

The SourceNode will be able to map the statements when you give a block of code? (I imagined you would have to tokenize it before)
Yes, that's what it's designed for.

Well, it was designed to receive one Node (token), and the corresponding position in the source file, if you send a big chunk only the first token will be mapped, and in this case the first token is not an identifier, so it is like if it was not mapped.

Will align incorrectly the action code with the beginning of the line (before var), the intention wasn't something like this?

As you can see in the regenerated parser.js, indentation is correct.

Sorry if that was ambiguous, when I said alignment I meant the association between the position in the generated code and the position in the source code.

It seems you are aligning var with the start of the action.

I can't see where you set the source maps for Rules/Literals and classes (regular expressions).

Only code blocks need source map because all other pieces of grammar you cannot debug.

What do you mean with I cannot debug? By debugging I mean to be able to run step by step over the pieces that may consume some input.
image
All you have to do is to choose one token of the generated code as the anchor for the rule/literal/class.

I plan to finish work on this after #167 will merge in order to avoid merge conflicts in CLI, but @hildjj taken a little vacation, I think. Actually, locally I've already done some job and fixed some moments after testing with visualizer. Actually, this work can be finished in couple days. I think I'll do it this weekend.

Thank you!

@o-alexandre-felipe
Copy link

o-alexandre-felipe commented Jul 19, 2021

Hi @Mingun and @hildjj I am looking forward to use this feature. I closed my pull request after you mentioned this work, since I think is not the best use of my time to work on code that someone else is already doing. Are you still working on this or should I resume what I started, or would you prefer to setup a branch that I can work from this work as a starting point.

Thanks

@Mingun
Copy link
Member Author

Mingun commented Jul 20, 2021

Sorry for the delay, the new old interesting project captured my attention :). I will try to mend.

Well, it was designed to receive one Node (token), and the corresponding position in the source file, if you send a big chunk only the first token will be mapped, and in this case the first token is not an identifier, so it is like if it was not mapped.

Large parts of the generated parser code have not any representation in the source grammar. I've tested my mapping on https://sokra.github.io/source-map-visualization and it seems it works quite well. There are some errors in the mapping but it seems unavoidable because source-map by design maps only start coordinates of chunks, but not their length, so you can observe artifacts as on that image:
071

If you has thoughts how this can be fixed, you are welcome!

Sorry if that was ambiguous, when I said alignment I meant the association between the position in the generated code and the position in the source code.

It seems you are aligning var with the start of the action.

In my tests on https://sokra.github.io/source-map-visualization it seems to working correctly.

What do you mean with I cannot debug? By debugging I mean to be able to run step by step over the pieces that may consume some input.

I mean that this is not an executable code. You cannot put breakpoint in it. You shows me an interesting way to use the map, I not thinking about it. But I do not really understand how this can be done now because lengths of original and generated strings is different (rule vs peg$rule) and SourceMap allow only mapping of the same-length strings (at least I do not see any clear way how to map strings with non-identical lengths). So this will be outside of scope of my PR, at least for now, but you can start prepare your own PR based on mine :).

@o-alexandre-felipe
Copy link

Sorry for the delay, the new old interesting project captured my attention :). I will try to mend.

That's fine, many projects out there

If you has thoughts how this can be fixed, you are welcome!

Yes I have an idea about how to do this, the javascript should be the easy part. Also it would be fine if we extended the javascript code rule in the peggy parser, so that we could build the JS AST and also report errors properly. But that is a big change.

I mean that this is not an executable code. You cannot put breakpoint in it. You shows me an interesting way to use the map, I not thinking about it. But I do not really understand how this can be done now because lengths of original and generated strings is different (rule vs peg$rule) and SourceMap allow only mapping of the same-length strings (at least I do not see any clear way how to map strings with non-identical lengths). So this will be outside of scope of my PR, at least for now, but you can start prepare your own PR based on mine :).

Thanks, when I have the opportunity I will prepare a PR with the source maps only for the embedded source maps.

@Mingun
Copy link
Member Author

Mingun commented Aug 2, 2021

I've finished this and it is ready for final review.
I've implemented two variants of API:

  • the first one in this PR: new boolean option sourceMap. When it is true and output is "source", compiler returns a SourceNode object. Usage:
    let node = peggy.generate(grammar, {
      output: "source",
      // Required, see below
      grammarSource: "my-file",
      sourceMap: true,
    });
    let { code, map } = node.toStringWithSourceMap();
    map.toJSON();// Get source map as JSON object
    map.toString();// Get source map as a string (serialized JSON object)
  • the second one in the branch https://github.com/Mingun/peggy/tree/alternative-source-map-api: new output variant "source-and-map". Usage quite similar:
     let node = peggy.generate(grammar, {
    -  output: "source",
    +  output: "source-and-map",
       // Required, see below
       grammarSource: "my-file",
    -  sourceMap: true,
     });
     let { code, map } = node.toStringWithSourceMap();
     map.toJSON();// Get source map as JSON object
     map.toString();// Get source map as a string (serialized JSON object)

So you can compare and choose what your like better. If an alternative API is more preferable, I add this commit to that PR.

Because of the mozilla/source-map#444 you can't use null, undefined and "" as a source file names. Because when we create a source map sources taken from the grammarSource option, it should be defined and not equal to the null, undefined, or "". source-map library automatically converts all other objects into strings, so be sure that your grammarSource.toString() returns a string representation that will be used in the source map.

@o-alexandre-felipe
Copy link

Nice work. I like the first option you propose.
I'm glad that you ended before I started (well I was off last week)

@Mingun
Copy link
Member Author

Mingun commented Aug 17, 2021

@hildjj , I forgot to mention: the first commit should also include changes in package-lock.json, but I have npm 7.20.6 which produces only version 2 file, but repository contains version 1 file. Is there any reason why version 1 is used? Version 2 is backward-compatible and can be read by older npm.

@hildjj
Copy link
Contributor

hildjj commented Aug 17, 2021

We had some issue testing on node10 with the newer package-lock format at some point. I've been generating the package-lock file with docker run --rm -it -v $PWD:/app -v ~/.npm/:/root/.npm --workdir /app node:12 npm install --package-lock-only since.

I continue to prefer solving this problem by not checking in the package-lock file, since we have no runtime dependencies.

@hildjj
Copy link
Contributor

hildjj commented Aug 19, 2021

If you rebase, I'll prioritize reviewing this over the next few days. I'd like to get this in soon, have a few people try it, then do a release next week if possible.

@Mingun
Copy link
Member Author

Mingun commented Aug 20, 2021

I plan to finish this this weekend.

@Mingun
Copy link
Member Author

Mingun commented Aug 20, 2021

Copy link
Contributor

@hildjj hildjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor stuff, except that we need to get rollup to work cleanly, and have the output work in the browser. I like the follow-on patch with the API change, so let's go ahead and work from that point.

return toSourceNode(node.code, node.codeLocation, "$" + node.type);
}

return node.code;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be able to reach this with a plugin

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand your thought? If you think about something like ts-pegjs which originally will have a TypeScript code here, it should already translate it to the JS, but I suspect it have their own generate-ts pass. Or you talk about extracting this function into utils?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry i was terse. :) This line is missed in code coverage, and i think it should not be too difficult to write a small plugin in the test that would reach this line of code.

After we merge this, I'm going to do a PR that adds a bunch of code coverage, so I can fix it up then if you like.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time to understand that nyc command in the package.json used only for debug CLI and not required for ordinally run, because jest has built-in support for coverage, but the results are not showing in the console. I think we need to fix that. I can prepare a PR for that if you not already handle that in your upcoming PR.

Cover this line, will update PR soon

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a separate issue. Assign it to me, and I'll fix it when I touch the CLI.

bin/peggy.mjs Outdated Show resolved Hide resolved
@@ -65,6 +65,7 @@
"rimraf": "^3.0.2",
"rollup": "^2.56.2",
"sinon": "^11.1.2",
"source-map": "^0.7.3",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be "^0.8.0-beta.0", which includes a fix so that rollup doesn't pull in fs and path. Then there's a change needed to rollup.config.js, add json(), to line 31 so that the mapping table included by that version of source-map will get pulled in correctly..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and then there are a couple of other rollup issues behind that which I'm not sure why we haven't seen before. I'm going to put those aside for the moment and review the rest of this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Actually, I doubted should I depend on the beta version of the project that not seems to be actively developed for 2 years for now. Even now, npm offers to install version 0.6.1 instead of the last available version (although now I realized that this is due to the fact that such version is specified in package-lock.js at the top level)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should rethink our dependency on source-map. All we need is the generator, and I think it's the consumer that depends on tr46, etc, which adds the huge tables. I wonder if we could make https://github.com/a-la/source-map-generator work. It would need at least one patch to not use node's url, and rely on the built-in URL instead, but we can send them a PR or fork if needed. I think this would make rollup a lot easier and reduce the size of the generated code by a lot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working up that PR for source-map-generator, just in case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall the specific issue I had with rollup, but I'll check out your current head to see if it shows up again.

The way I solved the test issue in source-map-generator was to use the original source-map as a dev dependency. I think that's what you said you tried. Anyway, I'll take a look here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All our dependencies are dev dependencies, so yes, I tried have both and right now it seems that having https://github.com/hildjj/source-map-generator is just redundant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost have it working here, I think. Give me another hour or so, and I'll send a PR to your repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Mingun#1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes merged

@@ -31,6 +31,7 @@ Follow these steps to upgrade:
— more powerful than traditional LL(_k_) and LR(_k_) parsers
- Usable [from your browser](https://peggyjs.org/online), from the command line,
or via JavaScript API
- [Source map](https://developer.mozilla.org/en-US/docs/Tools/Debugger/How_to/Use_a_source_map) support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked when I added a //# sourceMappingURL= comment to the end of the generated .js file, which I think should be done automatically.

This pointed out that the filenames in the .map file might need to be tweaked; from the project root, try ./bin/peggy.js examples/fizzbuzz.peggy --source-map. The map file includes "sources":["examples/fizzbuzz.peggy"] which should probably instead be `"sources":["fizzbuzz.peggy"]". The filename should be relative to the .map file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that automatic appending of this line will be desired behavior in each case, mostly because I don't known how people will use source-map feature. Source maps could be hosted not in the same directory where it was generated and only the user know the right place. Adding this line is a trivial task, so I leaved it out of scope of my PR. We can firstly gather the feedback and add this line later if users will complain about it. Or, if you have a clear vision, I can add it now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the line should be auto-appended in the CLI only. We can add an option for inlining it later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filename should be relative to the .map file

Done.

I think the line should be auto-appended in the CLI only. We can add an option for inlining it later.

I'll leave that for the next PRs. There a many options to forming such URL. For example, someone can want to form a data URL embedding generated source map in it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to take a few more PRs before we're ready to release, and I don't want to docs to be wrong for that long a time, so I'm going to create a "1.3" branch that we can merge this into, and start to merge other work into as well.

After this lands in the 1.3 branch, and we add in #199, we should have a better way to talk about how to test whether the maps are working as expected. Make sense @Mingun?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is a good plan

bin/peggy.mjs Show resolved Hide resolved
@@ -93,10 +94,10 @@ const compiler = {

switch (options.output) {
case "parser":
return eval(ast.code);
return eval(ast.code.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this is going to break ts-pegjs? I guess .toString is safe to add to a string, but I wonder if there's anything else that will break.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you worried that ts-pegjs might have replaced the code with its object? Yes, it should be upgraded to correctly handle the new sourceMap option (or new "output-with-map" output variant)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get @pjmolina 's opinion early. We can move several of the functions in generate-js into utils to make them easier to reuse, if that would help.

lib/peg.d.ts Show resolved Hide resolved
Copy link
Member Author

@Mingun Mingun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to address review comments this week.

@@ -31,6 +31,7 @@ Follow these steps to upgrade:
— more powerful than traditional LL(_k_) and LR(_k_) parsers
- Usable [from your browser](https://peggyjs.org/online), from the command line,
or via JavaScript API
- [Source map](https://developer.mozilla.org/en-US/docs/Tools/Debugger/How_to/Use_a_source_map) support
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that automatic appending of this line will be desired behavior in each case, mostly because I don't known how people will use source-map feature. Source maps could be hosted not in the same directory where it was generated and only the user know the right place. Adding this line is a trivial task, so I leaved it out of scope of my PR. We can firstly gather the feedback and add this line later if users will complain about it. Or, if you have a clear vision, I can add it now.

bin/peggy.mjs Outdated Show resolved Hide resolved
bin/peggy.mjs Show resolved Hide resolved
@@ -93,10 +94,10 @@ const compiler = {

switch (options.output) {
case "parser":
return eval(ast.code);
return eval(ast.code.toString());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you worried that ts-pegjs might have replaced the code with its object? Yes, it should be upgraded to correctly handle the new sourceMap option (or new "output-with-map" output variant)

return toSourceNode(node.code, node.codeLocation, "$" + node.type);
}

return node.code;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand your thought? If you think about something like ts-pegjs which originally will have a TypeScript code here, it should already translate it to the JS, but I suspect it have their own generate-ts pass. Or you talk about extracting this function into utils?

lib/peg.d.ts Show resolved Hide resolved
@@ -65,6 +65,7 @@
"rimraf": "^3.0.2",
"rollup": "^2.56.2",
"sinon": "^11.1.2",
"source-map": "^0.7.3",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Actually, I doubted should I depend on the beta version of the project that not seems to be actively developed for 2 years for now. Even now, npm offers to install version 0.6.1 instead of the last available version (although now I realized that this is due to the fact that such version is specified in package-lock.js at the top level)

@hildjj
Copy link
Contributor

hildjj commented Oct 4, 2021

Where do we stand on this? I've talked to the folks a Moz, and we're figuring out a better way to move forward in the long run. In the short run, it's ok to depend on https://github.com/hildjj/source-map-generator

@Mingun
Copy link
Member Author

Mingun commented Oct 5, 2021

I'll return to this this weekend. Need to run coverage and see where I can improve it, as you noticed.

Ok, I'll switch to https://github.com/hildjj/source-map-generator.

@Mingun
Copy link
Member Author

Mingun commented Oct 17, 2021

Latest changes:

  • included package-lock.js update in the first commit
  • added ef9f014:
    • check exit code of the CLI in the tests
    • exit with code 2 if error from the input test
    • do not interrupt writing of the parser code to file if the tested input raises grammar error
  • e47b5f4
    • increase coverage
    • relativise paths in the source map
    • handle --test/--test-file interaction
    • add examples to documentation
  • fe39cbf -- cover ast2SourceNode as requested

Do not use https://github.com/hildjj/source-map-generator, because I need SourceMapConsumer in tests

@hildjj
Copy link
Contributor

hildjj commented Oct 18, 2021

Please move base branch to 1.3 and I'll merge.

@Mingun Mingun changed the base branch from main to 1.3 October 18, 2021 20:20
@Mingun
Copy link
Member Author

Mingun commented Oct 18, 2021

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Grammar action errors need to be more informative (inherited from PEGjs)
4 participants