New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lint regular expressions #373
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add ESLint [1] with the eslint-plugin-regexp plugin [2] to start linting and enforcing styles for regular expressions. Given the heavy reliance of this package on regular expression, this should improve the package as well as remove any ambiguity on how regular expression _should_ be written. -- 1. https://eslint.org/ 2. https://github.com/ota-meshi/eslint-plugin-regexp
ESLint is configured explicitly to only lint with the "regexp" plugin (i.e. eslint-plugin-regexp [1]), which is entirely manually configured to ensure 1) that everything is enabled (^a) and 2) it's clear and explicit what is being linted for. ^a: The "regexp/prefer-named-capture-group" is disabled because it's (currently) not desired, unnamed groups seem okay for now. -- 1. https://github.com/ota-meshi/eslint-plugin-regexp
Fix all regular expression linting errors that are "trivial" - in essence those that can be fixed automatically. One rule was excluded from this, namely 'regexp/require-unicode-regexp', because of the significant amount of changes (essentially all regular expression in the project) combined with historical problems with the `u` flag. Therefore, this rule will be evaluated separately.
- Ungroup unused capturing groups (`no-unused-capturing-group`). Also explicitly disable automated fixes for this rule as it can be error prone, so manual review is preferred. - Use `u` flag on all regular expressions (`require-unicode-regexp`). According to [1, 2] this > Make[s] the regular expression [handle] UTF-16 surrogate pairs > correctly. [And] make[s] the regular expression [throw] syntax > errors early [by] disabling Annex B extensions. This change was also tested with Stryker following the removal of this flag in d4f4c3a due to a bug in Stryker 6.0.1 when used in Node.js 18. This change is still to be subject to the CI-based compatibility test in case support for this flag isn't as expected in all supported versions. -- 1. https://ota-meshi.github.io/eslint-plugin-regexp/rules/require-unicode-regexp.html 2. https://eslint.org/docs/latest/rules/require-unicode-regexp
Fix the warning by 'eslint-plugin-regexp' [1] about a potentially exploitable polynomial backtracking regular expression in the source code: > The quantifier '[^]*?' can exchange characters with '[^]*?'. Using any > string accepted by /[,.]+/, this can be exploited to cause at least > polynomial backtracking. The only affected regular expression in the package code is for escaping for bash with the `interpolation` option set to `true`. -- 1. https://github.com/ota-meshi/eslint-plugin-regexp
Fix the warning about potential polynomial backtracking in regular expressions used for fuzzing: > The quantifier '\0*' can exchange characters with '\0*'. Using any > string accepted by /\0+/, this can be exploited to cause at least > polynomial backtracking In particular, the second `\0*` group is unnecessary because the first will always capture it if present in "valid" strings.
Fix the warning by 'eslint-plugin-regexp' [1] about a potentially
exploitable quadratic runtime regular expression in the source code
(line 55):
> Any attack string /\{+/ plus some rejecting suffix will cause
> quadratic runtime because of this quantifier
and (line 56):
> Any attack string /:+/ plus some rejecting suffix will cause
> quadratic runtime because of this quantifier
by simplifying the regular expressions. As a result, the respective
characters will be escaped in cases where it's not necessary. This is
not a problem because the string received by the program will still be
correct. This is beneficial as it avoids the quadratic runtime.
The only affected regular expression in the package code is for escaping
for bash with the `interpolation` option set to `true`.
The test fixtures for Bash were updated accordingly and improved upon
with new test cases for previously untested strings that should be
escaped that were identified while trying to adjust the corresponding
regular expressions to avoid quadratic runtime.
Similarly, a new entry was added to the fuzz corpus that was found after
the regular expression on line 55 was changed.
--
1. https://github.com/ota-meshi/eslint-plugin-regexp
Update the ESLint configuration to eslint-plugin-regexp to report not only certain violations but also potential violations. This was chosen because currently there's no potential violations, indicating a stability level of these rules that's compatible with this project. If this turns out to be incorrect in the future, these changes may be reverted.
Fix the quadratic runtime regular expressions in the common fuzzing logic by adding negative look-behinds for the target being matched. This ensures that the match fails immediately at character `n+1` if it failed at character `n` (because character `n` is in the character range). Also simplify the regular expression for PowerShell by consolidating all the individual whitespace characters into the `\s` matcher. Note that `\s` excludes `\u0085` so it MUST be included separately.
Revert the simplification to the regular expression for escaping curly braces for Bash and use a negative lookbehind on a curly brace instead to avoid quadratic runtime.
| .replace(/([()<>])/gu, "\\$1") | ||
| .replace(/(["'`])/gu, "\\$1") | ||
| .replace(/(?<!\{)\{+(?=(?:[^{][^,.]*)?[,.][^}]*\})/gu, (curlyBraces) => | ||
| curlyBraces.replace(/\{/gu, "\\{") |
Check failure
Code scanning / CodeQL
Incomplete string escaping or encoding High
This does not escape backslash characters in the input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backslash characters are handled separately (in the first replace of this call chain).
2 tasks
This was referenced May 8, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
dependencies
Changes to the project's dependencies
performance
Changes to improve performance
refactor
Changes existing code without changing functionality
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
No description provided.