Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update: improve error message in no-control-regex #6839

Merged
merged 5 commits into from Aug 10, 2016

Conversation

ljharb
Copy link
Sponsor Contributor

@ljharb ljharb commented Aug 4, 2016

What issue does this pull request address?
Fixes #6293 - this makes the no-control-regex error message more useful by including the control characters that were found.

What changes did you make? (Give an overview)
I changed the "has control characters" boolean to a "get control characters" list, and converted that list of characters into a display string (ie, "\x00" to "\\x00".

Is there anything you'd like reviewers to focus on?
Nothing I can think of.

@mention-bot
Copy link

@ljharb, thanks for your PR! By analyzing the annotation information on this pull request, we identified @efegurkan, @mysticatea and @ilyavolodin to be potential reviewers

@eslintbot
Copy link

Thanks for the pull request, @ljharb! I took a look to make sure it's ready for merging and found some changes are needed:

  • Pull requests with code require an issue to be mentioned at the end of the commit summary, such as (fixes #1234). Please update the commit summary with an issue (file a new issue if one doesn't already exist).

Can you please update the pull request to address these?

(More information can be found in our pull request guide.)

@eslintbot
Copy link

LGTM

@ljharb ljharb force-pushed the ljharb/regex_control_message branch from 781d93d to 90b19dd Compare August 4, 2016 00:16
@eslintbot
Copy link

LGTM

const controlCharacters = getControlCharacters(computedValue);

if (controlCharacters.length > 0) {
context.report(node, "Unexpected control character(s) in regular expression: " + controlCharacters.join(", ") + ".");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you know exact position of the character that caused an issue, could you set position correct in here?

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I could - you want the index? Wouldn't users be grepping for the control character, not counting characters in the string to determine where they were located?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance this could use new-style reporting?

context.report({
    node,
    message: "Unexpected control character(s) in regular expression: {{controlChars}}.",
    data: {
        controlChars: controlCharacters.join(", ")
    }
});

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the editor integrations the character in question will just be highlighted, no need to grep anything. Also, if the same character shows up more then one time in a string, you will have trouble finding it with just one error message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilyavolodin If we do index or column, we need to report per character, not per literal. One of the test cases would need to have two errors. Not saying we can't do that, but it'd be great to make sure we agree on what @ljharb should implement here.

Personally I think displaying the literal text is already a huge win.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see if I understand you correctly. You are saying that if we report position, and the same control character appears twice in the same literal, we should report two errors? Hmm... that's a good point. Maybe we can merge this in as it is, and address positioning later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilyavolodin Yes. I left a note in the tests, there are two test cases which report one lint message but have two characters in the message. We can only report one line/column (or a line/column start and line/column end, but still one range). Which column would be correct then, bearing in mind that the characters could be non-consecutive?

I would say the only way to safely report positions of violating characters is to report once per bad character. And as I said earlier, I don't think it's necessary just yet-- we can wait for end-user demand.

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with the new reporting style.

Personally I don't find location that interesting, unless you mean, providing that info to eslint itself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ljharb All lint errors have a location (even if you happen not to provide one explicitly), which is the start of the node's first token to the end of its last token. But we can override the location on a per-message basis if we want to flag a particular part of the node. Then formatters and editor integrations can point to the location we provide, which can make the user's life easier. I don't consider it necessary here, but it could be useful in some cases.

Example: brace-style could report on an IfStatement node, but use a location pointing to the opening curly brace or closing curly brace as needed.

@ljharb ljharb force-pushed the ljharb/regex_control_message branch from 90b19dd to 25df5cc Compare August 4, 2016 00:19
@eslintbot
Copy link

LGTM

{ code: "var regex = RegExp('\\x1f')", errors: [{ message: "Unexpected control character in regular expression.", type: "Literal"}] }
{ code: "var regex = " + /\x1f/, errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f.", type: "Literal"}] }, // eslint-disable-line no-control-regex
{ code: "var regex = " + /\\\x1f\\x1e/, errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f, \\x1e.", type: "Literal"}] }, // eslint-disable-line no-control-regex
{ code: "var regex = new RegExp('\\x1f\\x1e')", errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f, \\x1e.", type: "Literal"}] },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference: The tests on lines 33-34 would need to show two errors if we want to add line/col information to the report messages. Characters are not guaranteed to be consecutive.

@ljharb ljharb force-pushed the ljharb/regex_control_message branch from 25df5cc to e8b7bd2 Compare August 4, 2016 00:42
@eslintbot
Copy link

LGTM

const hasControlChars = possibleEscapeCharacters === null || !(possibleEscapeCharacters[0].length % 2);

if (hasControlChars) {
stringControlChars = regexStr.slice(subStrIndex, -1).replace(doubleSlashes, ",\\").split(",");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain the replace choice (",\\")? I don't quite follow.

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm replacing double backslashes with a single backslash - and I'm adding in the comma to use as a delimiter immediately following, in the split. Alternatively, I could split by double backslashes, but then i'd have to .map and prefix with a single backslash. Either way is fine, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way is fine, but a comment couldn't hurt. 😄

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a comment is needed, then the code isn't good enough :-) I'll change it to make the code clearer, rather than adding comments ie cruft.

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in a separate commit - I can update and/or squash upon request.

@eslintbot
Copy link

LGTM


const controlChar = /[\x00-\x1f]/g; // eslint-disable-line no-control-regex
const multipleSlashes = /\\+/g;
const multipleSlashesAtEnd = /\\+$/g;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically these both match one or more backslashes, right? Or is the goal to match 2 or more?

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is specifically to match 2 or more, since we're trying to weed out patterns that aren't control characters but merely look like them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being dense, but I guess I still don't understand. From what I can tell, the regexes are matching one regex or more (\\ is one backslash since \ escapes the next character). Are we asserting that there is another backslash before/after the pattern match location to get our "2 or more" result? Or am I missing something?

EDIT: I think I see on line 78, we are slicing from the string start to the pattern match location (which presumably has matched \X for some X) and trying to determine if there is one or more backslashes before it. But I'm still not sure the variable name above is as clear as it could be.

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, you're right - I wasn't being clear. https://github.com/eslint/eslint/pull/6839/files#r73994452 is where the check is done (in master already) - I'm just moving the regex higher. In this line, \\+ matches one or more backslashes, so that the groups can later be checked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our discussion, I'll look forward to a new push later tonight which will hopefully cover everything. Thanks for your patience @ljharb.

@platinumazure
Copy link
Member

@ljharb Thanks, this looks loads better. Now we can wait for the issue to get accepted 😄

@platinumazure platinumazure added the do not merge This pull request should not be merged yet label Aug 4, 2016
}

return hasControlChars;
return controlChars.map(function(x) {
return "\\x" + ("00" + x.charCodeAt(0).toString(16)).slice(-2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just "0" is enough. Number.prototype.toString never returns the empty string.

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sadly i can't just use .padStart here yet :-p

@ljharb ljharb force-pushed the ljharb/regex_control_message branch from aa83bfb to cfb9aea Compare August 5, 2016 17:20
@eslintbot
Copy link

LGTM

@platinumazure platinumazure removed the do not merge This pull request should not be merged yet label Aug 9, 2016
@platinumazure
Copy link
Member

Issue is accepted, removing "do not merge" label.

{ code: "var regex = " + /\x1f/, errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f.", type: "Literal"}] }, // eslint-disable-line no-control-regex
{ code: "var regex = " + /\\\x1f\\x1e/, errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f, \\x1e.", type: "Literal"}] }, // eslint-disable-line no-control-regex
{ code: "var regex = new RegExp('\\x1f\\x1e')", errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f, \\x1e.", type: "Literal"}] },
{ code: "var regex = RegExp('\\x1f')", errors: [{ message: "Unexpected control character(s) in regular expression: \\x1f.", type: "Literal"}] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure everything is working as it's supposed to, could we get some nonconsecutive character tests (e.g., using regex /\x1fFOO\x00/ where FOO is some non-control character pattern)? Thanks!

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, no problem!

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, these new test cases caught a bug :-)

@platinumazure
Copy link
Member

@ljharb I've gone over this a couple of times and most of my concerns have been addressed. Here are my last concerns (also noted in inline comments):

  • Still some ambiguity over variable names for the regexes for detecting multiple backslashes
  • It would be good to add some test cases for non-consecutive control characters, to make sure we get one error with the correct characters reported

Thanks for your patience!


hasControlChars = possibleEscapeCharacters === null || !(possibleEscapeCharacters[0].length % 2);
const hasControlChars = possibleEscapeCharacters === null || !(possibleEscapeCharacters[0].length % 2);
Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where we check for an even number of slashes - note that this check happens in master, I'm just moving the regex higher up.

@ljharb ljharb force-pushed the ljharb/regex_control_message branch from cfb9aea to 5443554 Compare August 9, 2016 05:49
@eslintbot
Copy link

LGTM

@ljharb ljharb force-pushed the ljharb/regex_control_message branch from 5443554 to 889c3f9 Compare August 9, 2016 06:07
@eslintbot
Copy link

LGTM

@ljharb ljharb force-pushed the ljharb/regex_control_message branch from 889c3f9 to c4e191c Compare August 9, 2016 06:10
@eslintbot
Copy link

LGTM

@ljharb
Copy link
Sponsor Contributor Author

ljharb commented Aug 9, 2016

I'm not sure why the appveyor build failed - it looks like it timed out. I'll rebase once more to see if that will fix it.

@ljharb ljharb force-pushed the ljharb/regex_control_message branch from c4e191c to b92dfd9 Compare August 9, 2016 20:08
@eslintbot
Copy link

LGTM

@gyandeeps
Copy link
Member

@ljharb Because of this #6870 . If you rebase then u shd be good.

@platinumazure
Copy link
Member

LGTM, but would like another review from someone else.

@ilyavolodin
Copy link
Member

LGTM. Thanks for contributing!

@ilyavolodin ilyavolodin merged commit 1ecd2a3 into eslint:master Aug 10, 2016
@platinumazure
Copy link
Member

Thanks very much for sticking with this, @ljharb!

@ljharb ljharb deleted the ljharb/regex_control_message branch August 10, 2016 16:01
@eslint-deprecated eslint-deprecated bot locked and limited conversation to collaborators Feb 6, 2018
@eslint-deprecated eslint-deprecated bot added the archived due to age This issue has been archived; please open a new issue for any further discussion label Feb 6, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
archived due to age This issue has been archived; please open a new issue for any further discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants