Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

Implement reference to non-existent group regex rule #197

Merged
merged 8 commits into from
Mar 27, 2020

Conversation

diokey
Copy link
Contributor

@diokey diokey commented Mar 25, 2020

This PR adds a new lint rule that catches references to a non-existent capture group.

To achieve this, I had to add a new AST node RegExpNumericBackReference and take into account octal escapes while parsing.

The backreference will match the group capture if it exists, otherwise, it will be treated as an octal escape char.

For example

    String.raw`let foo = /([abc]+)=\1/;foo;`, // matches first capture group
    String.raw`let foo = /([abc]+)=\2/;foo;`, // matches \2 escaped
    String.raw`let foo = /([abc]+)=\8/;foo;`, // matches '8'
    String.raw`let foo = /([abc]+)=\9/;foo;`, // matches '9'
    String.raw`let foo = /([abc]+)=\119/;foo;`, // matches '\t' followd by '9'
    String.raw`let foo = /([abc]+)=\338/;foo;`, // matches \33 (char code 255) followed by '8'
    String.raw`let foo = /([abc]+)=\377/;foo;`, // matches \377 (char code 255)
    String.raw`let foo = /([abc]+)=\777/;foo;`, // matches \77  (char code 63) followed by '7'

The PR catches cases where the backreference is not an octal escape and does not refer to any existing group capture.

    String.raw`let foo = /([abc]+)=\18/;foo;`, 
    String.raw`let foo = /([abc]+)=\49/;foo;`,
    String.raw`let foo = /([abc]+)=\78/;foo;`,
    String.raw`let foo = /([abc]+)=\99/;foo;`,
    String.raw`let foo = /(([abc])\19)+=\28/;foo;`,

image

I am no expert in either compilers or regex, so let me know if this is not the right approach. I used https://www.regular-expressions.info/backref.html and tested those example on https://regexr.com/

Thank you

@@ -214,7 +226,7 @@ export const createRegExpParser = createParser((ParserCore) =>

case 'x':
{
const possibleHex = input.slice(get0(index) + 1, 2);
const possibleHex = input.slice(get0(index) + 1, get0(index) + 2);
Copy link
Contributor Author

@diokey diokey Mar 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sebmck this is not related, but it looks like a wrong index. The condition below is always false. Maybe it should be get0(index) + 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove that change, but wanted to bring it to your attention

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha yeah it should be. Feel free to leave that in change in! Thanks for flagging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 536 to 556
if (token.type === 'NumericBackReferenceCharacter') {
this.nextToken();
const value = token.value;
// \8 \9 are treated as escape char
if (value === 8 || value === 9) {
return {
type: 'RegExpCharacter',
value: String(value),
loc: this.finishLocFromToken(token),
};
}

// octal escapes
if (isOct(String(value))) {
const octal = parseInt(String(value), 8);
return {
type: 'RegExpCharacter',
value: String.fromCharCode(octal),
loc: this.finishLocFromToken(token),
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we could put this into tokenize instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@diokey
Copy link
Contributor Author

diokey commented Mar 26, 2020

@sebmck the CI is failing, but it doesn't look like it's related to this PR. Any ideas?

@Kelbie Kelbie mentioned this pull request Mar 26, 2020
23 tasks
@sebmck
Copy link
Contributor

sebmck commented Mar 27, 2020

This is awesome! Thank you so much!

@sebmck sebmck merged commit 2ae53bd into rome:master Mar 27, 2020
@diokey diokey mentioned this pull request Mar 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants