Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update: Fix no-useless-escape false negative in regexes (fixes #7424) #7425

Merged
merged 10 commits into from Oct 27, 2016

Conversation

not-an-aardvark
Copy link
Member

What is the purpose of this pull request? (put an "X" next to item)

[ ] Documentation update
[x] Bug fix (template)
[ ] New rule (template)
[ ] Changes an existing rule (template)
[ ] Add autofixing to a rule
[ ] Add a CLI option
[ ] Add something to the core
[ ] Other, please explain:

See #7424

What changes did you make? (Give an overview)

This updates no-useless-escape to verify whether a character needs to be escaped based on its position in a regular expression. Previously, no-useless-escape had a list of escapable characters for regexes, and it would never report cases where any of them were escaped. However, some characters only need to be escaped if they appear in a character class, and other characters only need to be escaped if they appear outside of a character class. This PR updates the rule to parse the regular expression to determine which characters are in character classes, and report the characters if they are escaped inside a character class and only need to be escaped outside a class (or vice versa).

Is there anything you'd like reviewers to focus on?

We should make sure that the REGEX_GENERAL_ESCAPES, REGEX_CHARCLASS_ESCAPES, and REGEX_NON_CHARCLASS_ESCAPES lists are correct.

@not-an-aardvark not-an-aardvark added bug ESLint is working incorrectly rule Relates to ESLint's core rules accepted There is consensus among the team that this change meets the criteria for inclusion labels Oct 22, 2016
@mention-bot
Copy link

@not-an-aardvark, thanks for your PR! By analyzing the history of the files in this pull request, we identified @onurtemizkan, @vitorbal and @kaicataldo to be potential reviewers.

@eslintbot
Copy link

LGTM

Copy link
Member

@platinumazure platinumazure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment about a complex arrow function, otherwise LGTM.

* characters to be valid in general, and filter out '-' characters that appear in the middle of a
* character class.
*/
.filter((charInfo, index, array) => charInfo.text !== "-" || !charInfo.inCharClass || index === 0 || index === array.length - 1 || !array[index - 1].inCharClass || !array[index + 1].inCharClass)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see this extracted to a function.

* @param {Set} setB The second set
* @returns {Set} The union of the two sets
*/
function union(setA, setB) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so cool.. I never knew abt yield*.
Had to read up on couple of things to understand this but so awesome. 💯

Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a super clever way to merge Sets, i like it 💯

} else {
return;
}
parseRegExp(node.raw.slice(1, -1))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I found a problem: .slice(1, -1) will be incorrect if the regex has flags, since the trailing slash won't be at index -1.

@not-an-aardvark not-an-aardvark force-pushed the no-useless-escape-character-classes branch from f5b4021 to 3f4a7ca Compare October 22, 2016 03:09
@eslintbot
Copy link

LGTM

@not-an-aardvark
Copy link
Member Author

@platinumazure Separated the complex behavior into its own function.

The number of linting errors this causes on the existing ESLint codebase is a bit concerning 😕

* @returns {Set} The union of the two sets
*/
function union(setA, setB) {
return new Set(function *() {
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not function* (?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the generator-star-spacing rule configured to enforce this spacing.

Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, interesting choice

* @param {Set} setB The second set
* @returns {Set} The union of the two sets
*/
function union(setA, setB) {
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a super clever way to merge Sets, i like it 💯

@not-an-aardvark
Copy link
Member Author

not-an-aardvark commented Oct 22, 2016

CI is failing due to this line and this line. Both of these seem to be valid linting errors, but why are they only getting reported on AppVeyor? Travis didn't report any issues, and I can't reproduce the linting error when running npm test locally.

edit: Actually, it doesn't seem to report any linting errors in ast-utils for me, even after clearing the cache. Maybe it's a Windows filepath thing?

@eslintbot
Copy link

LGTM

@@ -105,7 +105,7 @@ module.exports = {

const tokenAfterArrowBody = sourceCode.getTokenAfter(arrowBody);

if (tokenAfterArrowBody && tokenAfterArrowBody.type === "Punctuator" && /^[(\[\/`+-]/.test(tokenAfterArrowBody.value)) {
if (tokenAfterArrowBody && tokenAfterArrowBody.type === "Punctuator" && /^[([\/`+-]/.test(tokenAfterArrowBody.value)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the \/ in this example also be flagged by the rule? It's in a character class but I don't know if the slash token is consumed greedily by the tokenizer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, looks like you're right; I had thought / always needed to be escaped, but apparently it only needs to be escaped in character classes.

Sidenote: This implies that /[/]/ is a valid regex, but oddly, /[/]/.toString() returns /[\/]/ (i.e. it auto-escapes the slash, at least in Chrome). It doesn't seem to do this for other escaped characters.

platinumazure
platinumazure previously approved these changes Oct 22, 2016
Copy link
Member

@platinumazure platinumazure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for extracting that function.

I left one question, but I'm half certain of the answer (and that it would require no changes).

@eslintbot
Copy link

LGTM

1 similar comment
@eslintbot
Copy link

LGTM

@not-an-aardvark not-an-aardvark force-pushed the no-useless-escape-character-classes branch from 1e32241 to 6d9bea0 Compare October 23, 2016 00:06
@eslintbot
Copy link

LGTM

@platinumazure platinumazure dismissed their stale review October 23, 2016 00:06

New open questions on the best approach for this rule (should we put this behind an option, should we enhance with exception lists, etc.). Possible large ecosystem impact.

@platinumazure
Copy link
Member

As we find new problems, there are concerns about ecosystem impact. The rule is correct in reporting all the new useless escapes, but there may be some cases where users might want to intentionally allow (or prefer) some useless escapes for readability/maintainability. So we may need to consider carefully how best to release this.

@not-an-aardvark
Copy link
Member Author

I think we should do the following:

  • Fix the default behavior by merging this PR. The current behavior is unambiguously a bug.
  • Introduce an option such as ignoreCharacterClasses: true to revert to the previous behavior, for anyone that wants to allow some useless escapes. This option should be false by default.
  • Ideally, it would be best if this bugfix and new option were both released in the same version, but I don't think this is a requirement; we can consider the bugfix and the new option independently.

}
}
return {
charList: state.charList.concat({text: char, escaped: state.escapeNextChar, inCharClass: state.inCharClass, index}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to create an array instance for each character? It looks to generate many useless copies.

@eslintbot
Copy link

LGTM

Copy link
Member

@mysticatea mysticatea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me.
But there is a few matter about unnecessary escapes.

"\\",
".",
"-",
"^",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The escape of ^ in character class is unnecessary except the 1st character.

/[\^a]/; // this escape is necessary; it becomes `not a` if the `\` is removed.
/[a\^]/; // this escape is unnecessary; the `^` is a character.

"]",
"|",
"(",
")",
"b",
"B",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The escape of B in character class is unnecessary.

/[\B]/; // this escape is unnecessary; the `B` is a character.

On the other hand, the escape of \b is necessary. \b in character class is meaning a backspace character. (\b outside of character class is a word boundary)

@eslintbot
Copy link

LGTM

@not-an-aardvark not-an-aardvark force-pushed the no-useless-escape-character-classes branch from 35ec446 to 4611d81 Compare October 24, 2016 06:39
@eslintbot
Copy link

LGTM

Copy link
Member

@mysticatea mysticatea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, awesome!

@not-an-aardvark not-an-aardvark added the tsc agenda This issue will be discussed by ESLint's TSC at the next meeting label Oct 26, 2016
@not-an-aardvark
Copy link
Member Author

TSC Summary: no-useless-escape currently has a false negative in regex character classes. This is clearly a bug in the rule, but there is reason to believe that the fix might have an unusually large impact on the ecosystem (there are 11 existing violations in the ESLint codebase that were not caught due to the bug). In addition, it's plausible that a user might be okay with useless escapes in character classes for readability, even though these escapes are useless according to the rule's definition. The current proposal is to accept this PR as a bugfix (since the rule is currently not working as intended), and add an opt-out option to ignore character classes (something like #7455).

TSC question: Should we merge this PR for the upcoming release and consider the opt-out option separately? If not, how should we handle this fix?

@alberto alberto removed the tsc agenda This issue will be discussed by ESLint's TSC at the next meeting label Oct 27, 2016
@alberto
Copy link
Member

alberto commented Oct 27, 2016

TSC Resolution: Merge as is. Opt-out option could be considered in the future.

@not-an-aardvark not-an-aardvark merged commit c675d7d into master Oct 27, 2016
@not-an-aardvark not-an-aardvark deleted the no-useless-escape-character-classes branch October 27, 2016 21:50
@platinumazure
Copy link
Member

@eslint/eslint-tsc @not-an-aardvark Thanks very much for taking the time to carefully deliberate this.

@eslint-deprecated eslint-deprecated bot locked and limited conversation to collaborators Feb 6, 2018
@eslint-deprecated eslint-deprecated bot added the archived due to age This issue has been archived; please open a new issue for any further discussion label Feb 6, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion archived due to age This issue has been archived; please open a new issue for any further discussion bug ESLint is working incorrectly rule Relates to ESLint's core rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants