New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update: improve error message in no-control-regex
#6839
Update: improve error message in no-control-regex
#6839
Conversation
@ljharb, thanks for your PR! By analyzing the annotation information on this pull request, we identified @efegurkan, @mysticatea and @ilyavolodin to be potential reviewers |
Thanks for the pull request, @ljharb! I took a look to make sure it's ready for merging and found some changes are needed:
Can you please update the pull request to address these? (More information can be found in our pull request guide.) |
f78532f
to
781d93d
Compare
LGTM |
781d93d
to
90b19dd
Compare
LGTM |
const controlCharacters = getControlCharacters(computedValue); | ||
|
||
if (controlCharacters.length > 0) { | ||
context.report(node, "Unexpected control character(s) in regular expression: " + controlCharacters.join(", ") + "."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you know exact position of the character that caused an issue, could you set position correct in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I could - you want the index? Wouldn't users be grepping for the control character, not counting characters in the string to determine where they were located?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance this could use new-style reporting?
context.report({
node,
message: "Unexpected control character(s) in regular expression: {{controlChars}}.",
data: {
controlChars: controlCharacters.join(", ")
}
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the editor integrations the character in question will just be highlighted, no need to grep anything. Also, if the same character shows up more then one time in a string, you will have trouble finding it with just one error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilyavolodin If we do index or column, we need to report per character, not per literal. One of the test cases would need to have two errors. Not saying we can't do that, but it'd be great to make sure we agree on what @ljharb should implement here.
Personally I think displaying the literal text is already a huge win.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me see if I understand you correctly. You are saying that if we report position, and the same control character appears twice in the same literal, we should report two errors? Hmm... that's a good point. Maybe we can merge this in as it is, and address positioning later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilyavolodin Yes. I left a note in the tests, there are two test cases which report one lint message but have two characters in the message. We can only report one line/column (or a line/column start and line/column end, but still one range). Which column would be correct then, bearing in mind that the characters could be non-consecutive?
I would say the only way to safely report positions of violating characters is to report once per bad character. And as I said earlier, I don't think it's necessary just yet-- we can wait for end-user demand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with the new reporting style.
Personally I don't find location that interesting, unless you mean, providing that info to eslint itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ljharb All lint errors have a location (even if you happen not to provide one explicitly), which is the start of the node's first token to the end of its last token. But we can override the location on a per-message basis if we want to flag a particular part of the node. Then formatters and editor integrations can point to the location we provide, which can make the user's life easier. I don't consider it necessary here, but it could be useful in some cases.
Example: brace-style could report on an IfStatement node, but use a location pointing to the opening curly brace or closing curly brace as needed.
90b19dd
to
25df5cc
Compare
LGTM |
{ code: "var regex = RegExp('\\x1f')", errors: [{ message: "Unexpected control character in regular expression.", type: "Literal"}] } | ||
{ code: "var regex = " + /\x1f/, errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f.", type: "Literal"}] }, // eslint-disable-line no-control-regex | ||
{ code: "var regex = " + /\\\x1f\\x1e/, errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f, \\x1e.", type: "Literal"}] }, // eslint-disable-line no-control-regex | ||
{ code: "var regex = new RegExp('\\x1f\\x1e')", errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f, \\x1e.", type: "Literal"}] }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference: The tests on lines 33-34 would need to show two errors if we want to add line/col information to the report messages. Characters are not guaranteed to be consecutive.
25df5cc
to
e8b7bd2
Compare
LGTM |
const hasControlChars = possibleEscapeCharacters === null || !(possibleEscapeCharacters[0].length % 2); | ||
|
||
if (hasControlChars) { | ||
stringControlChars = regexStr.slice(subStrIndex, -1).replace(doubleSlashes, ",\\").split(","); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain the replace choice (",\\"
)? I don't quite follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm replacing double backslashes with a single backslash - and I'm adding in the comma to use as a delimiter immediately following, in the split
. Alternatively, I could split by double backslashes, but then i'd have to .map
and prefix with a single backslash. Either way is fine, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way is fine, but a comment couldn't hurt. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a comment is needed, then the code isn't good enough :-) I'll change it to make the code clearer, rather than adding comments ie cruft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in a separate commit - I can update and/or squash upon request.
LGTM |
|
||
const controlChar = /[\x00-\x1f]/g; // eslint-disable-line no-control-regex | ||
const multipleSlashes = /\\+/g; | ||
const multipleSlashesAtEnd = /\\+$/g; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically these both match one or more backslashes, right? Or is the goal to match 2 or more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is specifically to match 2 or more, since we're trying to weed out patterns that aren't control characters but merely look like them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being dense, but I guess I still don't understand. From what I can tell, the regexes are matching one regex or more (\\
is one backslash since \
escapes the next character). Are we asserting that there is another backslash before/after the pattern match location to get our "2 or more" result? Or am I missing something?
EDIT: I think I see on line 78, we are slicing from the string start to the pattern match location (which presumably has matched \X
for some X) and trying to determine if there is one or more backslashes before it. But I'm still not sure the variable name above is as clear as it could be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, you're right - I wasn't being clear. https://github.com/eslint/eslint/pull/6839/files#r73994452 is where the check is done (in master already) - I'm just moving the regex higher. In this line, \\+
matches one or more backslashes, so that the groups can later be checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per our discussion, I'll look forward to a new push later tonight which will hopefully cover everything. Thanks for your patience @ljharb.
@ljharb Thanks, this looks loads better. Now we can wait for the issue to get accepted 😄 |
} | ||
|
||
return hasControlChars; | ||
return controlChars.map(function(x) { | ||
return "\\x" + ("00" + x.charCodeAt(0).toString(16)).slice(-2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just "0"
is enough. Number.prototype.toString
never returns the empty string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sadly i can't just use .padStart
here yet :-p
aa83bfb
to
cfb9aea
Compare
LGTM |
Issue is accepted, removing "do not merge" label. |
{ code: "var regex = " + /\x1f/, errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f.", type: "Literal"}] }, // eslint-disable-line no-control-regex | ||
{ code: "var regex = " + /\\\x1f\\x1e/, errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f, \\x1e.", type: "Literal"}] }, // eslint-disable-line no-control-regex | ||
{ code: "var regex = new RegExp('\\x1f\\x1e')", errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f, \\x1e.", type: "Literal"}] }, | ||
{ code: "var regex = RegExp('\\x1f')", errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f.", type: "Literal"}] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure everything is working as it's supposed to, could we get some nonconsecutive character tests (e.g., using regex /\x1fFOO\x00/
where FOO is some non-control character pattern)? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, no problem!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, these new test cases caught a bug :-)
@ljharb I've gone over this a couple of times and most of my concerns have been addressed. Here are my last concerns (also noted in inline comments):
Thanks for your patience! |
|
||
hasControlChars = possibleEscapeCharacters === null || !(possibleEscapeCharacters[0].length % 2); | ||
const hasControlChars = possibleEscapeCharacters === null || !(possibleEscapeCharacters[0].length % 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where we check for an even number of slashes - note that this check happens in master, I'm just moving the regex higher up.
cfb9aea
to
5443554
Compare
LGTM |
5443554
to
889c3f9
Compare
LGTM |
889c3f9
to
c4e191c
Compare
LGTM |
I'm not sure why the appveyor build failed - it looks like it timed out. I'll rebase once more to see if that will fix it. |
c4e191c
to
b92dfd9
Compare
LGTM |
LGTM, but would like another review from someone else. |
LGTM. Thanks for contributing! |
Thanks very much for sticking with this, @ljharb! |
What issue does this pull request address?
Fixes #6293 - this makes the
no-control-regex
error message more useful by including the control characters that were found.What changes did you make? (Give an overview)
I changed the "has control characters" boolean to a "get control characters" list, and converted that list of characters into a display string (ie,
"\x00"
to"\\x00"
.Is there anything you'd like reviewers to focus on?
Nothing I can think of.