Attach comments #81

mamaniv · 2020-05-26T14:32:23Z

I'm trying to use attachComments functionality of escodegen, which gets tree,comments,tokens as arguments.

Unfortunately, it doesn't preserve the comments using Meriayh and it looks like the tokens types and range are different than what escodegen expect to have.

I tried the following scenario:

`

const sourceCode = `
  //Comment1
  const test1 = 1;
  //Comment2
  const test2 = 2;
`;
let tokens = [];
let comments = [];
let tree = meriyah.parse(sourceCode, {
  ranges: true,
  globalReturn: true,
  onToken: function (type, start, end) {
    let value = this.tokenValue ? this.tokenValue.toString() : '';
    tokens.push({type: type, value: value, range: [start, end]});
  },
  onComment: function (type, value, start, end) {
    type = type === 'SingleLine' ? 'Line' : type;
    type = type === 'MultiLine' ? 'Block' : type;
    comments.push({type: type, value: value, range: [start, end]});
  }
});
replaceStartEndToRange(tree);

tree = escodegen.attachComments(tree, comments, tokens);
const res = escodegen.generate(tree, {
  sourceCode: sourceCode,
  format: {
    compact: false,
    preserveBlankLines: true
  },
  comment: true
});
expect(res).to.be.equal(sourceCode);

`

sourceCode:
//Comment1
const test1 = 1;
//Comment2
const test2 = 2;

res:
const test1 = 1;

 const test2 = 2;

As you can see, the comments are missing.

I use
"meriyah": "1.9.12"
"escodegen": "1.14.1"

BTW, it works with Esprima.

Thanks,
Niv

The text was updated successfully, but these errors were encountered:

KFlash · 2020-05-26T14:44:05Z

Could you look at the code and try to figure out where it goes wrong? What's the differences vs Esprima as you mentioned?
This project was done by some students of mine, and I'm knee deep involved in a private project so little time atm to fix this.

@3cp do you have any idea what's going wrong here?

mamaniv · 2020-05-26T14:50:31Z

I'm not sure I fully understand the logic, but I can attach the tokens I get from Esprima and what I get in Meriyah.

BTW, it also get the wrong Punctuator values (I used a workaround to fixed that, but it actually broken too).


Meriyah:
[
  {
    "type": "Keyword",
    "value": "const",
    "range": [
      0,
      16
    ]
  },
  {
    "type": "Identifier",
    "value": "test1",
    "range": [
      16,
      22
    ]
  },
  {
    "type": "Punctuator",
    "value": "=",
    "range": [
      22,
      24
    ]
  },
  {
    "type": "Numeric",
    "value": "1",
    "range": [
      24,
      26
    ]
  },
  {
    "type": "Punctuator",
    "value": ";",
    "range": [
      26,
      27
    ]
  },
  {
    "type": "Keyword",
    "value": "const",
    "range": [
      27,
      44
    ]
  },
  {
    "type": "Identifier",
    "value": "test2",
    "range": [
      44,
      50
    ]
  },
  {
    "type": "Punctuator",
    "value": "=",
    "range": [
      50,
      52
    ]
  },
  {
    "type": "Numeric",
    "value": "2",
    "range": [
      52,
      54
    ]
  },
  {
    "type": "Punctuator",
    "value": ";",
    "range": [
      54,
      55
    ]
  }
]

Esprima:
[
  {
    "type": "Keyword",
    "value": "const",
    "range": [
      11,
      16
    ]
  },
  {
    "type": "Identifier",
    "value": "test1",
    "range": [
      17,
      22
    ]
  },
  {
    "type": "Punctuator",
    "value": "=",
    "range": [
      23,
      24
    ]
  },
  {
    "type": "Numeric",
    "value": "1",
    "range": [
      25,
      26
    ]
  },
  {
    "type": "Punctuator",
    "value": ";",
    "range": [
      26,
      27
    ]
  },
  {
    "type": "Keyword",
    "value": "const",
    "range": [
      39,
      44
    ]
  },
  {
    "type": "Identifier",
    "value": "test2",
    "range": [
      45,
      50
    ]
  },
  {
    "type": "Punctuator",
    "value": "=",
    "range": [
      51,
      52
    ]
  },
  {
    "type": "Numeric",
    "value": "2",
    "range": [
      53,
      54
    ]
  },
  {
    "type": "Punctuator",
    "value": ";",
    "range": [
      54,
      55
    ]
  }
]

KFlash · 2020-05-26T15:12:28Z

I see the range for ident is out of range?

mamaniv · 2020-05-26T16:47:49Z

It is just different than it should be and that's why escodegen can't generate comments.
If I change, for example, the first block from:
{
"type": "Keyword",
"value": "const",
"range": [
0,
16
]
},

to

{
"type": "Keyword",
"value": "const",
"range": [
11,
16
]
},

we'll successfully get the first comment.

mamaniv · 2020-05-26T16:49:46Z

I also had to convert "start" and "end" properties into range:[start,end], for each body objects, to be able to generate the code (even without comments)

3cp · 2020-05-26T17:05:19Z

Both start, end, and range[] are not estree standard. There was some argument in estree spec repo but no action was taken so far.
But range[] is definitely the more popular one.
Some parsers supply both properties, and I think meriyah should do the same to improve compatibility with other tools. I can make a PR for that. (I personally need that too because eslint-scope uses range[])

As for skipped comments, I think I saw the behavior too but it doesn't affect my use cases. I will see if I can find anything.

mamaniv · 2020-05-26T17:14:34Z

Thank you! Looking forward for that.
If you need more information, let me know.

KFlash · 2020-05-27T00:01:23Z

Use of range[] kills performance, but the loc tracking itself does it too. It's better to use an optional linear incrementing id to store 4 * 4 bytes (32bit for start, stop, col, line) per
node as a blob of binary numbers. But that's out of scope of ESTree :)

3cp · 2020-05-27T00:05:28Z

Use of range[] kills performance

I guess that's the effect of an array.

@KFlash I am not sure do you want to add range[] or not? Maybe another option to add this compatible range[] thing?

KFlash · 2020-05-27T00:09:42Z

A 'range[]' property is fine, and makes Meriyah more compatible with other parsers.
In ESTree world there are no other options, so it's fine :)

3cp · 2020-05-27T04:47:49Z

The comments are not missing, but the token after comment has wrong start.

const meriyah = require('meriyah');
const sourceCode = `
  // Comment1
  const a = 1;
  // Comment2
  const b = 2;
`;
let tokens = [];
let comments = [];
let tree = meriyah.parse(sourceCode, {
  ranges: true,
  globalReturn: true,
  onToken: function (type, start, end) {
    console.log('onToken', arguments);
    let value = this.tokenValue ? this.tokenValue.toString() : '';
    tokens.push({type: type, value: value, range: [start, end]});
  },
  onComment: function (type, value, start, end) {
    console.log('onComment', arguments);
    type = type === 'SingleLine' ? 'Line' : type;
    type = type === 'MultiLine' ? 'Block' : type;
    comments.push({type: type, value: value, range: [start, end]});
  }
});

console.log(tree);

Output:

onComment [Arguments] { '0': 'SingleLine', '1': ' Comment1\n', '2': 5, '3': 15 }
onToken [Arguments] { '0': 'Keyword', '1': 0, '2': 22 } // should be from 17 to 22
onToken [Arguments] { '0': 'Identifier', '1': 22, '2': 24 }
onToken [Arguments] { '0': 'Punctuator', '1': 24, '2': 26 }
onToken [Arguments] { '0': 'NumericLiteral', '1': 26, '2': 28 }
onToken [Arguments] { '0': 'Punctuator', '1': 28, '2': 29 }
onComment [Arguments] { '0': 'SingleLine', '1': ' Comment2\n', '2': 34, '3': 44 }
onToken [Arguments] { '0': 'Keyword', '1': 29, '2': 51 } // should be from 46 to 51
onToken [Arguments] { '0': 'Identifier', '1': 51, '2': 53 }
onToken [Arguments] { '0': 'Punctuator', '1': 53, '2': 55 }
onToken [Arguments] { '0': 'NumericLiteral', '1': 55, '2': 57 }
onToken [Arguments] { '0': 'Punctuator', '1': 57, '2': 58 }

3cp · 2020-05-27T05:24:50Z

The two PRs hopefully can fix your use case.

3cp · 2020-05-27T06:47:44Z

The #83 removed leading white space before token, so there are gaps between tokens. If this affects your reconstruction of the code, you might need to study the behaviour of other parsers.

I never tried escodegen. For now, you might need to fill up the gap with synthesised token.

KFlash · 2020-05-27T06:49:58Z

Released v. 1.9.14.

mamaniv · 2020-05-27T09:12:02Z

Works now.
Thank you!

3cp mentioned this issue May 27, 2020

feat(parser): add .range: [start, end] to improve compatibility with other ESTree tools #82

Merged

3cp mentioned this issue May 27, 2020

fix(scan): token start should skip leading white spaces and comments #83

Merged

KFlash closed this as completed in #83 May 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attach comments #81

Attach comments #81

mamaniv commented May 26, 2020

KFlash commented May 26, 2020 •

edited

mamaniv commented May 26, 2020

KFlash commented May 26, 2020

mamaniv commented May 26, 2020

mamaniv commented May 26, 2020

3cp commented May 26, 2020 •

edited

mamaniv commented May 26, 2020

KFlash commented May 27, 2020 •

edited

3cp commented May 27, 2020 •

edited

KFlash commented May 27, 2020

3cp commented May 27, 2020

3cp commented May 27, 2020

3cp commented May 27, 2020

KFlash commented May 27, 2020

mamaniv commented May 27, 2020

Attach comments #81

Attach comments #81

Comments

mamaniv commented May 26, 2020

KFlash commented May 26, 2020 • edited

mamaniv commented May 26, 2020

KFlash commented May 26, 2020

mamaniv commented May 26, 2020

mamaniv commented May 26, 2020

3cp commented May 26, 2020 • edited

mamaniv commented May 26, 2020

KFlash commented May 27, 2020 • edited

3cp commented May 27, 2020 • edited

KFlash commented May 27, 2020

3cp commented May 27, 2020

3cp commented May 27, 2020

3cp commented May 27, 2020

KFlash commented May 27, 2020

mamaniv commented May 27, 2020

KFlash commented May 26, 2020 •

edited

3cp commented May 26, 2020 •

edited

KFlash commented May 27, 2020 •

edited

3cp commented May 27, 2020 •

edited