Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attach comments #81

Closed
mamaniv opened this issue May 26, 2020 · 15 comments · Fixed by #83
Closed

Attach comments #81

mamaniv opened this issue May 26, 2020 · 15 comments · Fixed by #83

Comments

@mamaniv
Copy link

mamaniv commented May 26, 2020

I'm trying to use attachComments functionality of escodegen, which gets tree,comments,tokens as arguments.

Unfortunately, it doesn't preserve the comments using Meriayh and it looks like the tokens types and range are different than what escodegen expect to have.

I tried the following scenario:

`

const sourceCode = `
  //Comment1
  const test1 = 1;
  //Comment2
  const test2 = 2;
`;
let tokens = [];
let comments = [];
let tree = meriyah.parse(sourceCode, {
  ranges: true,
  globalReturn: true,
  onToken: function (type, start, end) {
    let value = this.tokenValue ? this.tokenValue.toString() : '';
    tokens.push({type: type, value: value, range: [start, end]});
  },
  onComment: function (type, value, start, end) {
    type = type === 'SingleLine' ? 'Line' : type;
    type = type === 'MultiLine' ? 'Block' : type;
    comments.push({type: type, value: value, range: [start, end]});
  }
});
replaceStartEndToRange(tree);

tree = escodegen.attachComments(tree, comments, tokens);
const res = escodegen.generate(tree, {
  sourceCode: sourceCode,
  format: {
    compact: false,
    preserveBlankLines: true
  },
  comment: true
});
expect(res).to.be.equal(sourceCode);

`

sourceCode:
//Comment1
const test1 = 1;
//Comment2
const test2 = 2;

res:
const test1 = 1;

 const test2 = 2;

As you can see, the comments are missing.

I use
"meriyah": "1.9.12"
"escodegen": "1.14.1"

BTW, it works with Esprima.

Thanks,
Niv

@KFlash
Copy link
Contributor

KFlash commented May 26, 2020

Could you look at the code and try to figure out where it goes wrong? What's the differences vs Esprima as you mentioned?
This project was done by some students of mine, and I'm knee deep involved in a private project so little time atm to fix this.

@3cp do you have any idea what's going wrong here?

@mamaniv
Copy link
Author

mamaniv commented May 26, 2020

I'm not sure I fully understand the logic, but I can attach the tokens I get from Esprima and what I get in Meriyah.

BTW, it also get the wrong Punctuator values (I used a workaround to fixed that, but it actually broken too).


Meriyah:
[
  {
    "type": "Keyword",
    "value": "const",
    "range": [
      0,
      16
    ]
  },
  {
    "type": "Identifier",
    "value": "test1",
    "range": [
      16,
      22
    ]
  },
  {
    "type": "Punctuator",
    "value": "=",
    "range": [
      22,
      24
    ]
  },
  {
    "type": "Numeric",
    "value": "1",
    "range": [
      24,
      26
    ]
  },
  {
    "type": "Punctuator",
    "value": ";",
    "range": [
      26,
      27
    ]
  },
  {
    "type": "Keyword",
    "value": "const",
    "range": [
      27,
      44
    ]
  },
  {
    "type": "Identifier",
    "value": "test2",
    "range": [
      44,
      50
    ]
  },
  {
    "type": "Punctuator",
    "value": "=",
    "range": [
      50,
      52
    ]
  },
  {
    "type": "Numeric",
    "value": "2",
    "range": [
      52,
      54
    ]
  },
  {
    "type": "Punctuator",
    "value": ";",
    "range": [
      54,
      55
    ]
  }
]

Esprima:
[
  {
    "type": "Keyword",
    "value": "const",
    "range": [
      11,
      16
    ]
  },
  {
    "type": "Identifier",
    "value": "test1",
    "range": [
      17,
      22
    ]
  },
  {
    "type": "Punctuator",
    "value": "=",
    "range": [
      23,
      24
    ]
  },
  {
    "type": "Numeric",
    "value": "1",
    "range": [
      25,
      26
    ]
  },
  {
    "type": "Punctuator",
    "value": ";",
    "range": [
      26,
      27
    ]
  },
  {
    "type": "Keyword",
    "value": "const",
    "range": [
      39,
      44
    ]
  },
  {
    "type": "Identifier",
    "value": "test2",
    "range": [
      45,
      50
    ]
  },
  {
    "type": "Punctuator",
    "value": "=",
    "range": [
      51,
      52
    ]
  },
  {
    "type": "Numeric",
    "value": "2",
    "range": [
      53,
      54
    ]
  },
  {
    "type": "Punctuator",
    "value": ";",
    "range": [
      54,
      55
    ]
  }
]

@KFlash
Copy link
Contributor

KFlash commented May 26, 2020

I see the range for ident is out of range?

@mamaniv
Copy link
Author

mamaniv commented May 26, 2020

It is just different than it should be and that's why escodegen can't generate comments.
If I change, for example, the first block from:
{
"type": "Keyword",
"value": "const",
"range": [
0,
16
]
},

to

{
"type": "Keyword",
"value": "const",
"range": [
11,
16
]
},

we'll successfully get the first comment.

@mamaniv
Copy link
Author

mamaniv commented May 26, 2020

I also had to convert "start" and "end" properties into range:[start,end], for each body objects, to be able to generate the code (even without comments)

@3cp
Copy link
Member

3cp commented May 26, 2020

Both start, end, and range[] are not estree standard. There was some argument in estree spec repo but no action was taken so far.
But range[] is definitely the more popular one.
Some parsers supply both properties, and I think meriyah should do the same to improve compatibility with other tools. I can make a PR for that. (I personally need that too because eslint-scope uses range[])

As for skipped comments, I think I saw the behavior too but it doesn't affect my use cases. I will see if I can find anything.

@mamaniv
Copy link
Author

mamaniv commented May 26, 2020

Thank you! Looking forward for that.
If you need more information, let me know.

@KFlash
Copy link
Contributor

KFlash commented May 27, 2020

Use of range[] kills performance, but the loc tracking itself does it too. It's better to use an optional linear incrementing id to store 4 * 4 bytes (32bit for start, stop, col, line) per
node as a blob of binary numbers. But that's out of scope of ESTree :)

@3cp
Copy link
Member

3cp commented May 27, 2020

Use of range[] kills performance

I guess that's the effect of an array.

@KFlash I am not sure do you want to add range[] or not? Maybe another option to add this compatible range[] thing?

@KFlash
Copy link
Contributor

KFlash commented May 27, 2020

A 'range[]' property is fine, and makes Meriyah more compatible with other parsers.
In ESTree world there are no other options, so it's fine :)

@3cp
Copy link
Member

3cp commented May 27, 2020

The comments are not missing, but the token after comment has wrong start.

const meriyah = require('meriyah');
const sourceCode = `
  // Comment1
  const a = 1;
  // Comment2
  const b = 2;
`;
let tokens = [];
let comments = [];
let tree = meriyah.parse(sourceCode, {
  ranges: true,
  globalReturn: true,
  onToken: function (type, start, end) {
    console.log('onToken', arguments);
    let value = this.tokenValue ? this.tokenValue.toString() : '';
    tokens.push({type: type, value: value, range: [start, end]});
  },
  onComment: function (type, value, start, end) {
    console.log('onComment', arguments);
    type = type === 'SingleLine' ? 'Line' : type;
    type = type === 'MultiLine' ? 'Block' : type;
    comments.push({type: type, value: value, range: [start, end]});
  }
});

console.log(tree);

Output:

onComment [Arguments] { '0': 'SingleLine', '1': ' Comment1\n', '2': 5, '3': 15 }
onToken [Arguments] { '0': 'Keyword', '1': 0, '2': 22 } // should be from 17 to 22
onToken [Arguments] { '0': 'Identifier', '1': 22, '2': 24 }
onToken [Arguments] { '0': 'Punctuator', '1': 24, '2': 26 }
onToken [Arguments] { '0': 'NumericLiteral', '1': 26, '2': 28 }
onToken [Arguments] { '0': 'Punctuator', '1': 28, '2': 29 }
onComment [Arguments] { '0': 'SingleLine', '1': ' Comment2\n', '2': 34, '3': 44 }
onToken [Arguments] { '0': 'Keyword', '1': 29, '2': 51 } // should be from 46 to 51
onToken [Arguments] { '0': 'Identifier', '1': 51, '2': 53 }
onToken [Arguments] { '0': 'Punctuator', '1': 53, '2': 55 }
onToken [Arguments] { '0': 'NumericLiteral', '1': 55, '2': 57 }
onToken [Arguments] { '0': 'Punctuator', '1': 57, '2': 58 }

@3cp
Copy link
Member

3cp commented May 27, 2020

The two PRs hopefully can fix your use case.

@3cp
Copy link
Member

3cp commented May 27, 2020

The #83 removed leading white space before token, so there are gaps between tokens. If this affects your reconstruction of the code, you might need to study the behaviour of other parsers.

I never tried escodegen. For now, you might need to fill up the gap with synthesised token.

@KFlash
Copy link
Contributor

KFlash commented May 27, 2020

Released v. 1.9.14.

@mamaniv
Copy link
Author

mamaniv commented May 27, 2020

Works now.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants