Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prefer-regex-literals suggestions #15029

Closed
Yash-Singh1 opened this issue Sep 4, 2021 · 28 comments · Fixed by #15077
Closed

prefer-regex-literals suggestions #15029

Yash-Singh1 opened this issue Sep 4, 2021 · 28 comments · Fixed by #15077
Assignees
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion archived due to age This issue has been archived; please open a new issue for any further discussion enhancement This change enhances an existing feature of ESLint rule Relates to ESLint's core rules
Projects

Comments

@Yash-Singh1
Copy link
Contributor

Yash-Singh1 commented Sep 4, 2021

What rule do you want to change?

prefer-regex-literals

Does this change cause the rule to produce more or fewer warnings?

No

How will the change be implemented? (New option, new default behavior, etc.)?

Adds fixer

Please provide some example code that this change will affect:

new RegExp('\\.')

What does the rule currently do for this code?

Reports error

What will the rule do after it's changed?

Fix it to:

/\./
@Yash-Singh1 Yash-Singh1 added enhancement This change enhances an existing feature of ESLint rule Relates to ESLint's core rules triage An ESLint team member will look at this issue soon labels Sep 4, 2021
@eslint-github-bot eslint-github-bot bot added this to Needs Triage in Triage Sep 4, 2021
@nzakas
Copy link
Member

nzakas commented Sep 8, 2021

Seems possible to do.

Are you willing to submit a pull request?

@nzakas nzakas removed the triage An ESLint team member will look at this issue soon label Sep 8, 2021
@nzakas nzakas moved this from Needs Triage to Feedback Needed in Triage Sep 8, 2021
@Yash-Singh1
Copy link
Contributor Author

Sure, I will give this a shot.

@mdjermanovic
Copy link
Member

It would be a useful enhancement.

@Yash-Singh1 any ideas on how this could be implemented?

We generally suggest that contributors wait for the issue to be accepted before working on it, but in this case I think it would be good to evaluate this enhancement on a prototype of the solution.

This is a nontrivial code transformation that should be aware of all differences between the syntax of regular expression literals and the syntax of RegExp constructor input patterns.

For example:

new RegExp("\n");

It's a valid RegExp input pattern consisting of a single code point for \n, but in regular expression literals it would be invalid:

/
/

One of the possible solutions is to use the built-in RegExp.prototype.toString. By the specification, it should be safe:

The returned String has the form of a RegularExpressionLiteral that evaluates to another RegExp object with the same behaviour as this object.

[...new RegExp("\n").toString()]; // ['/', '\', 'n', '/']

Some downsides of this solution would be:

  • It depends on the execution environment - if the environment doesn't support syntax used in the given regular expression, we can't transform.
  • It doesn't preserve syntactic choices from the original string literal:
[...new RegExp("\t").toString()]; // ['/', '\t', '/'] - this would be /    / in the source code, valid but unusual

[...new RegExp("\u1234").toString()]; // ['/', 'ሴ', '/']

@nzakas
Copy link
Member

nzakas commented Sep 8, 2021

I think we can simplify this proposal such that we eliminate any regular expressions that would be problematic. Maybe to start we just say anything with \t, \u, \n, etc. won't be autofixed?

@lydell
Copy link

lydell commented Sep 8, 2021

/ also sometimes needs to be escaped.

Another thing: For example RegExp("+") would throw a runtime error. Converting that to /+/ would now be a syntax error. Is it ok for an autofix to introduce syntax errors?

Similarly, RegExp("*") -> /*/ would start a block comment, which – in the worst case – could be closed by some other */ sequence and still produce “valid” code. Confusing!

So the autofix might need to know regex syntax. Just thinking out loud.

We also need to be careful so that 5/RegExp("a") isn’t converted to 5//a/. That’s a single-line comment now.

@nzakas
Copy link
Member

nzakas commented Sep 9, 2021

@lydell good points!

Given all the edge cases we’ve identified, so we feel like it’s worth autofixing at this point?

@Yash-Singh1
Copy link
Contributor Author

Yash-Singh1 commented Sep 9, 2021

Sorry for the late reply, I was working on the changes and doing a bit of research on how to implement this. So far I got the basic functionality written, I just need to work on the edge cases.

So the autofix might need to know regex syntax.

Yes, there should be some form of validation for the regular expression. Since ESLint already depends on regexpp and it seems to have a validator function. There is an option on the ecmaVersion:

https://github.com/mysticatea/regexpp/blob/master/src/validator.ts#L137

And I think we can pass parserOptions.ecmaVersion to the function and validate the regular expression string.

We also need to be careful so that 5/RegExp("a") isn’t converted to 5//a/. That’s a single-line comment now.

Yes, that will ensure checking for division and not autofixing when there is division. But should your example: 5/RegExp("abc") autofix to 5/RegExp(/abc/) or 5/(/abc/)?

@Yash-Singh1 Yash-Singh1 changed the title prefer-regex-literal autofix prefer-regex-literals autofix Sep 9, 2021
@lydell
Copy link

lydell commented Sep 9, 2021

But should your example: 5/RegExp("abc") autofix to 5/RegExp(/a/) or (/a/)?

Or 5/ /a/ (space)

@ljharb
Copy link
Sponsor Contributor

ljharb commented Sep 9, 2021

Hopefully it'd autofix to 5/(/abc/), since /a/ doesn't match the original pattern.

@lydell
Copy link

lydell commented Sep 9, 2021

That was an obvious typo to me. Of course the discussion is about which solution to use for the “division plus regex should not become a singleline comment” edge case.

@ljharb
Copy link
Sponsor Contributor

ljharb commented Sep 9, 2021

Right - but what would be the point of converting RegExp('abc') to RegExp(/abc/)? The RegExp constructor is redundant there. I think it'd convert to /abc/, and only when necessary, would be parenthesized (like when it's the divisor, or when the next line has an identifier that might be interpreted as flags, or something)

@mdjermanovic
Copy link
Member

We can make a list of safe preceding tokens, e.g., = and ( will cover most common cases. We could add to the list other tokens for which we are sure that scanning cannot interpret our regex literal as something else. In all other cases we can wrap the literal in (), it doesn't have to be perfect.

As for the right side, I think the only problem is adjacent words like in. This can be handled by adding a space.

@mdjermanovic
Copy link
Member

I think this feature should be suggestion, rather than autofix. Often there are multiple valid patterns with the same behavior, so if user doesn't like the suggested one, they can opt to not apply it. The suggested fix should still produce valid and equivalent code, but it doesn't necessarily have to be the preferred variant, since we might not always be able to figure out which one is it.

Here's a babel repl with an example of this transformation (babel-plugin-transform-regexp-constructors). /ሴ/ is a valid replacement for RegExp("\u1234"), but probably not the preferred one.

Also, writing regex patterns in string literals is generally error-prone. For example, RegExp("a\.b") is a common mistake reported by no-useless-escape. If we autofix this to /a.b/, the error will be hidden.

@Yash-Singh1
Copy link
Contributor Author

Yash-Singh1 commented Sep 9, 2021

A few more edge cases: /abc/*something would start a block comment. /abc//something would start a single-line comment.

@mdjermanovic
Copy link
Member

A few more edge cases: /abc/*something would start a block comment. /abc//something would start a single-line comment.

I think this can happen only if the leftmost / was already interpreted as division, which depends only on the code before it, so the adjacent *something or /something on the right side are not causing the problem.

@mdjermanovic
Copy link
Member

I'm not too concerned about the part where we are inserting /foo/ in the place of RegExp("foo") in the source code. We can make a list of tokens that cannot be at the end of expressions (that's where / would be seen as division) and check if the token before RegExp("foo") is in the list. If it is, then it's safe to insert /foo/ without parentheses, and astUtils.canTokensBeAdjacent can tell us if it needs spaces on both ends. If it isn't in the list, we'll insert (/foo/). This check doesn't have to be exhaustive (I believe only =, ( and , would cover the majority of cases where regular expressions appear in a source code), the worst case is that we'll sometimes insert a pair of parentheses that aren't really necessary.

I'm more concerned about how to get /foo/ from RegExp("foo"). We have a string value of the pattern and a raw string literal, but neither of them can be used directly as the content between /.../

@mdjermanovic
Copy link
Member

Another thing: For example RegExp("+") would throw a runtime error. Converting that to /+/ would now be a syntax error. Is it ok for an autofix to introduce syntax errors?

That wouldn't be ok, RegExp("+") can be in a dead code or wrapped in try-catch, while /+/ breaks parsing and thus the whole script (or at least it should by the spec, in V8 it's a runtime error). We should only fix valid regular expressions.

@Yash-Singh1
Copy link
Contributor Author

Should RegExp(1) autofix to /1/? Currently, the prefer-regex-literals rule only detects strings.

@ljharb
Copy link
Sponsor Contributor

ljharb commented Sep 12, 2021

Yes, I’d think so.

@Yash-Singh1
Copy link
Contributor Author

I'm not too concerned about the part where we are inserting /foo/ in the place of RegExp("foo") in the source code. We can make a list of tokens that cannot be at the end of expressions (that's where / would be seen as division) and check if the token before RegExp("foo") is in the list. If it is, then it's safe to insert /foo/ without parentheses
...
If it isn't in the list, we'll insert (/foo/). This check doesn't have to be exhaustive (I believe only =, ( and , would cover the majority of cases where regular expressions appear in a source code), the worst case is that we'll sometimes insert a pair of parentheses that aren't really necessary.

I am just curious, instead of whitelisting tokens, why can't we disallow the / token instead, because that would cover more edge case tokens.

, and astUtils.canTokensBeAdjacent can tell us if it needs spaces on both ends.

Can you help me with an example where we would require space on both ends?

@mdjermanovic
Copy link
Member

Should RegExp(1) autofix to /1/? Currently, the prefer-regex-literals rule only detects strings.

No, if it isn't reported then it shouldn't be fixed. We could discuss in a separate issue whether or not this should be reported.

@mdjermanovic
Copy link
Member

, and astUtils.canTokensBeAdjacent can tell us if it needs spaces on both ends.

Can you help me with an example where we would require space on both ends?

a/RegExp("foo")in b would need spaces on both ends: a/ /foo/ in b

Without spaces, it would be a//foo/in b - which is a followed by a line comment. Because of the adjacent / on the left side, we have to add a space on the left, and then it becomes a/ /foo/in b, but this would be interpreted as a regular expression /foo/in ("SyntaxError: Invalid regular expression flags"), so we have to add a space on the right side too.

@mdjermanovic
Copy link
Member

I am just curious, instead of whitelisting tokens, why can't we disallow the / token instead, because that would cover more edge case tokens.

/ can be in the list of allowed tokens, it just needs a space.

Disallowed preceding tokens should be ), literals, identifiers etc., for example:

x = y
RegExp("foo").test(x) ? bar() : baz()

When autofixed to /foo/, this becomes:

x = y
/foo/.test(x) ? bar() : baz()

This is interpreted as x = y / foo...

Now I realize that parentheses wouldn't solve the problem as they would be interpreted as function call, so it's probably best to not autofix the RegExp if it isn't preceded by one of the allowed tokens (=, (, ;, ,, ...).

Yash-Singh1 added a commit to Yash-Singh1/eslint that referenced this issue Sep 17, 2021
Yash-Singh1 added a commit to Yash-Singh1/eslint that referenced this issue Sep 17, 2021
Yash-Singh1 added a commit to Yash-Singh1/eslint that referenced this issue Sep 17, 2021
Yash-Singh1 added a commit to Yash-Singh1/eslint that referenced this issue Sep 17, 2021
@Yash-Singh1 Yash-Singh1 changed the title prefer-regex-literals autofix prefer-regex-literals suggestions Oct 1, 2021
@github-actions
Copy link

Oops! It looks like we lost track of this issue. What do we want to do here? This issue will auto-close in 7 days without an update.

@github-actions github-actions bot added the Stale label Nov 30, 2021
@ljharb
Copy link
Sponsor Contributor

ljharb commented Nov 30, 2021

There's an open PR: #15077

@nzakas
Copy link
Member

nzakas commented Dec 1, 2021

@mdjermanovic seems like you were driving this. How should we proceed?

@mdjermanovic mdjermanovic removed the Stale label Dec 1, 2021
@mdjermanovic
Copy link
Member

By the proof of concept #15077, this looks feasible and safe to me. I support this change, and I'm willing to champion it.

@mdjermanovic mdjermanovic self-assigned this Dec 1, 2021
@nzakas
Copy link
Member

nzakas commented Dec 2, 2021

I’ll 👍

@nzakas nzakas added the accepted There is consensus among the team that this change meets the criteria for inclusion label Dec 2, 2021
@nzakas nzakas moved this from Feedback Needed to Ready to Implement in Triage Dec 2, 2021
@mdjermanovic mdjermanovic moved this from Ready to Implement to Pull Request Opened in Triage Dec 2, 2021
Triage automation moved this from Pull Request Opened to Complete Dec 17, 2021
mdjermanovic added a commit that referenced this issue Dec 17, 2021
* New: Autofix support to prefer-regex-literals

Fixes #15029

* Use canTokensBeAdjacent

* Fix for NULL

* Switch to validatePattern and validateFlags

* Fix for unicode

* Apply a few suggestions from code review

* Fix: Double Escaping?

* Tests and fixes for no-unicode regexp

* New: Drop usage of getStaticValue

* Fix: Remove whitespace changes, fix jsdoc type, and convert to suggestion for unexpectedRegExp

* New: More test cases for .

* Remove meta.docs.suggestion

Co-authored-by: Milos Djermanovic <milos.djermanovic@gmail.com>

* Fix linting

* Don't fix NULL

* Remove redundant wrapping suggestions for now

* String.raw can have problematic chars

* Remove fixable

* Fix messed up char increase

* Apply suggestion from code review

Co-authored-by: Milos Djermanovic <milos.djermanovic@gmail.com>

* chore: use characterNode.raw instead of characterNode.value

* chore: do a bit of simplification of onCharacterEnter

* Apply suggestions from code review

Co-authored-by: Milos Djermanovic <milos.djermanovic@gmail.com>

* chore: more changes following code review

* chore: Use reliable way of testing if spacing needed

Co-authored-by: Milos Djermanovic <milos.djermanovic@gmail.com>

* diff msg for suggestion than main warning

* chore: stricter testing

* Apply suggestions from code review

Co-authored-by: Milos Djermanovic <milos.djermanovic@gmail.com>

Co-authored-by: Milos Djermanovic <milos.djermanovic@gmail.com>
@eslint-github-bot eslint-github-bot bot locked and limited conversation to collaborators Jun 16, 2022
@eslint-github-bot eslint-github-bot bot added the archived due to age This issue has been archived; please open a new issue for any further discussion label Jun 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion archived due to age This issue has been archived; please open a new issue for any further discussion enhancement This change enhances an existing feature of ESLint rule Relates to ESLint's core rules
Projects
Archived in project
Triage
Complete
Development

Successfully merging a pull request may close this issue.

5 participants