Update: improve error message in `no-control-regex` #6839

ljharb · 2016-08-04T00:14:41Z

What issue does this pull request address?
Fixes #6293 - this makes the no-control-regex error message more useful by including the control characters that were found.

What changes did you make? (Give an overview)
I changed the "has control characters" boolean to a "get control characters" list, and converted that list of characters into a display string (ie, "\x00" to "\\x00".

Is there anything you'd like reviewers to focus on?
Nothing I can think of.

mention-bot · 2016-08-04T00:14:42Z

@ljharb, thanks for your PR! By analyzing the annotation information on this pull request, we identified @efegurkan, @mysticatea and @ilyavolodin to be potential reviewers

eslintbot · 2016-08-04T00:14:43Z

Thanks for the pull request, @ljharb! I took a look to make sure it's ready for merging and found some changes are needed:

Pull requests with code require an issue to be mentioned at the end of the commit summary, such as (fixes #1234). Please update the commit summary with an issue (file a new issue if one doesn't already exist).

Can you please update the pull request to address these?

(More information can be found in our pull request guide.)

eslintbot · 2016-08-04T00:15:39Z

LGTM

eslintbot · 2016-08-04T00:16:10Z

LGTM

ilyavolodin · 2016-08-04T00:16:49Z

lib/rules/no-control-regex.js

+                    const controlCharacters = getControlCharacters(computedValue);
+
+                    if (controlCharacters.length > 0) {
+                        context.report(node, "Unexpected control character(s) in regular expression: " + controlCharacters.join(", ") + ".");


Since you know exact position of the character that caused an issue, could you set position correct in here?

I suppose I could - you want the index? Wouldn't users be grepping for the control character, not counting characters in the string to determine where they were located?

Any chance this could use new-style reporting?

context.report({ node, message: "Unexpected control character(s) in regular expression: {{controlChars}}.", data: { controlChars: controlCharacters.join(", ") } });

In the editor integrations the character in question will just be highlighted, no need to grep anything. Also, if the same character shows up more then one time in a string, you will have trouble finding it with just one error message.

@ilyavolodin If we do index or column, we need to report per character, not per literal. One of the test cases would need to have two errors. Not saying we can't do that, but it'd be great to make sure we agree on what @ljharb should implement here.

Personally I think displaying the literal text is already a huge win.

Let me see if I understand you correctly. You are saying that if we report position, and the same control character appears twice in the same literal, we should report two errors? Hmm... that's a good point. Maybe we can merge this in as it is, and address positioning later?

@ilyavolodin Yes. I left a note in the tests, there are two test cases which report one lint message but have two characters in the message. We can only report one line/column (or a line/column start and line/column end, but still one range). Which column would be correct then, bearing in mind that the characters could be non-consecutive?

I would say the only way to safely report positions of violating characters is to report once per bad character. And as I said earlier, I don't think it's necessary just yet-- we can wait for end-user demand.

Updated with the new reporting style.

Personally I don't find location that interesting, unless you mean, providing that info to eslint itself?

@ljharb All lint errors have a location (even if you happen not to provide one explicitly), which is the start of the node's first token to the end of its last token. But we can override the location on a per-message basis if we want to flag a particular part of the node. Then formatters and editor integrations can point to the location we provide, which can make the user's life easier. I don't consider it necessary here, but it could be useful in some cases.

Example: brace-style could report on an IfStatement node, but use a location pointing to the opening curly brace or closing curly brace as needed.

eslintbot · 2016-08-04T00:19:26Z

LGTM

platinumazure · 2016-08-04T00:28:17Z

tests/lib/rules/no-control-regex.js

-        { code: "var regex = RegExp('\\x1f')", errors: [{ message: "Unexpected control character in regular expression.", type: "Literal"}] }
+        { code: "var regex = " + /\x1f/, errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f.", type: "Literal"}] }, // eslint-disable-line no-control-regex
+        { code: "var regex = " + /\\\x1f\\x1e/, errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f, \\x1e.", type: "Literal"}] }, // eslint-disable-line no-control-regex
+        { code: "var regex = new RegExp('\\x1f\\x1e')", errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f, \\x1e.", type: "Literal"}] },


For reference: The tests on lines 33-34 would need to show two errors if we want to add line/col information to the report messages. Characters are not guaranteed to be consecutive.

eslintbot · 2016-08-04T00:42:20Z

LGTM

platinumazure · 2016-08-04T00:43:57Z

lib/rules/no-control-regex.js

+                const hasControlChars = possibleEscapeCharacters === null || !(possibleEscapeCharacters[0].length % 2);
+
+                if (hasControlChars) {
+                    stringControlChars = regexStr.slice(subStrIndex, -1).replace(doubleSlashes, ",\\").split(",");


Could you please explain the replace choice (",\\")? I don't quite follow.

I'm replacing double backslashes with a single backslash - and I'm adding in the comma to use as a delimiter immediately following, in the split. Alternatively, I could split by double backslashes, but then i'd have to .map and prefix with a single backslash. Either way is fine, though.

Either way is fine, but a comment couldn't hurt. 😄

If a comment is needed, then the code isn't good enough :-) I'll change it to make the code clearer, rather than adding comments ie cruft.

Updated in a separate commit - I can update and/or squash upon request.

eslintbot · 2016-08-04T00:56:01Z

LGTM

platinumazure · 2016-08-04T00:57:09Z

lib/rules/no-control-regex.js

+
+        const controlChar = /[\x00-\x1f]/g; // eslint-disable-line no-control-regex
+        const multipleSlashes = /\\+/g;
+        const multipleSlashesAtEnd = /\\+$/g;


Technically these both match one or more backslashes, right? Or is the goal to match 2 or more?

The goal is specifically to match 2 or more, since we're trying to weed out patterns that aren't control characters but merely look like them.

Sorry for being dense, but I guess I still don't understand. From what I can tell, the regexes are matching one regex or more (\\ is one backslash since \ escapes the next character). Are we asserting that there is another backslash before/after the pattern match location to get our "2 or more" result? Or am I missing something?

EDIT: I think I see on line 78, we are slicing from the string start to the pattern match location (which presumably has matched \X for some X) and trying to determine if there is one or more backslashes before it. But I'm still not sure the variable name above is as clear as it could be.

Sorry, you're right - I wasn't being clear. https://github.com/eslint/eslint/pull/6839/files#r73994452 is where the check is done (in master already) - I'm just moving the regex higher. In this line, \\+ matches one or more backslashes, so that the groups can later be checked.

Per our discussion, I'll look forward to a new push later tonight which will hopefully cover everything. Thanks for your patience @ljharb.

platinumazure · 2016-08-04T00:59:23Z

@ljharb Thanks, this looks loads better. Now we can wait for the issue to get accepted 😄

michaelficarra · 2016-08-05T14:43:38Z

lib/rules/no-control-regex.js

            }

-            return hasControlChars;
+            return controlChars.map(function(x) {
+                return "\\x" + ("00" + x.charCodeAt(0).toString(16)).slice(-2);


Just "0" is enough. Number.prototype.toString never returns the empty string.

sadly i can't just use .padStart here yet :-p

eslintbot · 2016-08-05T17:20:43Z

LGTM

platinumazure · 2016-08-09T01:29:57Z

Issue is accepted, removing "do not merge" label.

platinumazure · 2016-08-09T02:02:27Z

tests/lib/rules/no-control-regex.js

+        { code: "var regex = " + /\x1f/, errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f.", type: "Literal"}] }, // eslint-disable-line no-control-regex
+        { code: "var regex = " + /\\\x1f\\x1e/, errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f, \\x1e.", type: "Literal"}] }, // eslint-disable-line no-control-regex
+        { code: "var regex = new RegExp('\\x1f\\x1e')", errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f, \\x1e.", type: "Literal"}] },
+        { code: "var regex = RegExp('\\x1f')", errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f.", type: "Literal"}] }


Just to make sure everything is working as it's supposed to, could we get some nonconsecutive character tests (e.g., using regex /\x1fFOO\x00/ where FOO is some non-control character pattern)? Thanks!

Sure, no problem!

Thanks, these new test cases caught a bug :-)

platinumazure · 2016-08-09T02:03:58Z

@ljharb I've gone over this a couple of times and most of my concerns have been addressed. Here are my last concerns (also noted in inline comments):

Still some ambiguity over variable names for the regexes for detecting multiple backslashes
It would be good to add some test cases for non-consecutive control characters, to make sure we get one error with the correct characters reported

Thanks for your patience!

ljharb · 2016-08-09T03:50:15Z

lib/rules/no-control-regex.js


-                hasControlChars = possibleEscapeCharacters === null || !(possibleEscapeCharacters[0].length % 2);
+                const hasControlChars = possibleEscapeCharacters === null || !(possibleEscapeCharacters[0].length % 2);


This is where we check for an even number of slashes - note that this check happens in master, I'm just moving the regex higher up.

eslintbot · 2016-08-09T05:49:48Z

LGTM

eslintbot · 2016-08-09T06:07:27Z

LGTM

eslintbot · 2016-08-09T06:10:18Z

LGTM

ljharb · 2016-08-09T20:08:05Z

I'm not sure why the appveyor build failed - it looks like it timed out. I'll rebase once more to see if that will fix it.

eslintbot · 2016-08-09T20:08:13Z

LGTM

gyandeeps · 2016-08-09T20:09:48Z

@ljharb Because of this #6870 . If you rebase then u shd be good.

platinumazure · 2016-08-10T15:26:40Z

LGTM, but would like another review from someone else.

ilyavolodin · 2016-08-10T15:56:51Z

LGTM. Thanks for contributing!

platinumazure · 2016-08-10T15:59:23Z

Thanks very much for sticking with this, @ljharb!

Update: improve error message in no-control-regex (fixes eslint#6293)

1c4910a

jquerybot added the CLA: Valid label Aug 4, 2016

ljharb force-pushed the ljharb/regex_control_message branch from f78532f to 781d93d Compare August 4, 2016 00:15

ljharb force-pushed the ljharb/regex_control_message branch from 781d93d to 90b19dd Compare August 4, 2016 00:16

ljharb mentioned this pull request Aug 4, 2016

no-control-regex has unnecessary code #6438

Closed

ilyavolodin reviewed Aug 4, 2016
View reviewed changes

ljharb force-pushed the ljharb/regex_control_message branch from 90b19dd to 25df5cc Compare August 4, 2016 00:19

platinumazure reviewed Aug 4, 2016
View reviewed changes

ljharb force-pushed the ljharb/regex_control_message branch from 25df5cc to e8b7bd2 Compare August 4, 2016 00:42

platinumazure reviewed Aug 4, 2016
View reviewed changes

fixup: this will be squashed before the PR is merged

aa37112

platinumazure reviewed Aug 4, 2016
View reviewed changes

platinumazure added the do not merge This pull request should not be merged yet label Aug 4, 2016

michaelficarra reviewed Aug 5, 2016
View reviewed changes

fixup: this will be squashed before the PR is merged

107b4df

ljharb force-pushed the ljharb/regex_control_message branch from aa83bfb to cfb9aea Compare August 5, 2016 17:20

platinumazure removed the do not merge This pull request should not be merged yet label Aug 9, 2016

platinumazure reviewed Aug 9, 2016
View reviewed changes

ljharb reviewed Aug 9, 2016
View reviewed changes

ljharb added 2 commits August 8, 2016 22:48

fixup: Add more test cases; fix a bug.

4e22f86

fixup: try to improve variable names.

b92dfd9

ljharb force-pushed the ljharb/regex_control_message branch from cfb9aea to 5443554 Compare August 9, 2016 05:49

ljharb force-pushed the ljharb/regex_control_message branch from 5443554 to 889c3f9 Compare August 9, 2016 06:07

ljharb force-pushed the ljharb/regex_control_message branch from 889c3f9 to c4e191c Compare August 9, 2016 06:10

ljharb force-pushed the ljharb/regex_control_message branch from c4e191c to b92dfd9 Compare August 9, 2016 20:08

ilyavolodin merged commit 1ecd2a3 into eslint:master Aug 10, 2016

ljharb deleted the ljharb/regex_control_message branch August 10, 2016 16:01

eslint-deprecated bot locked and limited conversation to collaborators Feb 6, 2018

eslint-deprecated bot added the archived due to age This issue has been archived; please open a new issue for any further discussion label Feb 6, 2018


		hasControlChars = possibleEscapeCharacters === null \|\| !(possibleEscapeCharacters[0].length % 2);
		const hasControlChars = possibleEscapeCharacters === null \|\| !(possibleEscapeCharacters[0].length % 2);

Update: improve error message in no-control-regex #6839

Update: improve error message in no-control-regex #6839

Conversation

ljharb commented Aug 4, 2016

mention-bot commented Aug 4, 2016

eslintbot commented Aug 4, 2016

eslintbot commented Aug 4, 2016

eslintbot commented Aug 4, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

platinumazure Aug 4, 2016 • edited

Choose a reason for hiding this comment

eslintbot commented Aug 4, 2016

Choose a reason for hiding this comment

eslintbot commented Aug 4, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eslintbot commented Aug 4, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

platinumazure Aug 9, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

platinumazure commented Aug 4, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eslintbot commented Aug 5, 2016

platinumazure commented Aug 9, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

platinumazure commented Aug 9, 2016

Choose a reason for hiding this comment

eslintbot commented Aug 9, 2016

eslintbot commented Aug 9, 2016

eslintbot commented Aug 9, 2016

ljharb commented Aug 9, 2016

eslintbot commented Aug 9, 2016

gyandeeps commented Aug 9, 2016

platinumazure commented Aug 10, 2016

ilyavolodin commented Aug 10, 2016

platinumazure commented Aug 10, 2016

Update: improve error message in `no-control-regex` #6839

Update: improve error message in `no-control-regex` #6839

platinumazure Aug 4, 2016 •

edited

platinumazure Aug 9, 2016 •

edited