Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lint regular expressions #373

Merged
merged 13 commits into from Aug 21, 2022
Merged

Lint regular expressions #373

merged 13 commits into from Aug 21, 2022

Conversation

ericcornelissen
Copy link
Owner

No description provided.

Add ESLint [1] with the eslint-plugin-regexp plugin [2] to start linting
and enforcing styles for regular expressions. Given the heavy reliance
of this package on regular expression, this should improve the package
as well as remove any ambiguity on how regular expression _should_ be
written.

--
1. https://eslint.org/
2. https://github.com/ota-meshi/eslint-plugin-regexp
ESLint is configured explicitly to only lint with the "regexp" plugin
(i.e. eslint-plugin-regexp [1]), which is entirely manually configured
to ensure 1) that everything is enabled (^a) and 2) it's clear and
explicit what is being linted for.

^a: The "regexp/prefer-named-capture-group" is disabled because it's
(currently) not desired, unnamed groups seem okay for now.

--
1. https://github.com/ota-meshi/eslint-plugin-regexp
Fix all regular expression linting errors that are "trivial" - in
essence those that can be fixed automatically.

One rule was excluded from this, namely 'regexp/require-unicode-regexp',
because of the significant amount of changes (essentially all regular
expression in the project) combined with historical problems with the
`u` flag. Therefore, this rule will be evaluated separately.
@ericcornelissen ericcornelissen added dependencies Changes to the project's dependencies refactor Changes existing code without changing functionality labels Aug 20, 2022
- Ungroup unused capturing groups (`no-unused-capturing-group`). Also
  explicitly disable automated fixes for this rule as it can be error
  prone, so manual review is preferred.
- Use `u` flag on all regular expressions (`require-unicode-regexp`).
  According to [1, 2] this

  > Make[s] the regular expression [handle] UTF-16 surrogate pairs
  > correctly. [And] make[s] the regular expression [throw] syntax
  > errors early [by] disabling Annex B extensions.

  This change was also tested with Stryker following the removal of this
  flag in d4f4c3a due to a bug in
  Stryker 6.0.1 when used in Node.js 18.

  This change is still to be subject to the CI-based compatibility test
  in case support for this flag isn't as expected in all supported
  versions.

--
1. https://ota-meshi.github.io/eslint-plugin-regexp/rules/require-unicode-regexp.html
2. https://eslint.org/docs/latest/rules/require-unicode-regexp
Fix the warning by 'eslint-plugin-regexp' [1] about a potentially
exploitable polynomial backtracking regular expression in the source
code:

> The quantifier '[^]*?' can exchange characters with '[^]*?'. Using any
> string accepted by /[,.]+/, this can be exploited to cause at least
> polynomial backtracking.

The only affected regular expression in the package code is for escaping
for bash with the `interpolation` option set to `true`.

--
1. https://github.com/ota-meshi/eslint-plugin-regexp
Fix the warning about potential polynomial backtracking in regular
expressions used for fuzzing:

> The quantifier '\0*' can exchange characters with '\0*'. Using any
> string accepted by /\0+/, this can be exploited to cause at least
> polynomial backtracking

In particular, the second `\0*` group is unnecessary because the first
will always capture it if present in "valid" strings.
Fix the warning by 'eslint-plugin-regexp' [1] about a potentially
exploitable quadratic runtime regular expression in the source code
(line 55):

> Any attack string /\{+/ plus some rejecting suffix will cause
> quadratic runtime because of this quantifier

and (line 56):

>  Any attack string /:+/ plus some rejecting suffix will cause
>  quadratic runtime because of this quantifier

by simplifying the regular expressions. As a result, the respective
characters will be escaped in cases where it's not necessary. This is
not a problem because the string received by the program will still be
correct. This is beneficial as it avoids the quadratic runtime.

The only affected regular expression in the package code is for escaping
for bash with the `interpolation` option set to `true`.

The test fixtures for Bash were updated accordingly and improved upon
with new test cases for previously untested strings that should be
escaped that were identified while trying to adjust the corresponding
regular expressions to avoid quadratic runtime.

Similarly, a new entry was added to the fuzz corpus that was found after
the regular expression on line 55 was changed.

--
1. https://github.com/ota-meshi/eslint-plugin-regexp
Update the ESLint configuration to eslint-plugin-regexp to report not
only certain violations but also potential violations. This was chosen
because currently there's no potential violations, indicating a
stability level of these rules that's compatible with this project. If
this turns out to be incorrect in the future, these changes may be
reverted.
Fix the quadratic runtime regular expressions in the common fuzzing
logic by adding negative look-behinds for the target being matched.
This ensures that the match fails immediately at character `n+1` if
it failed at character `n` (because character `n` is in the character
range).

Also simplify the regular expression for PowerShell by consolidating all
the individual whitespace characters into the `\s` matcher. Note that
`\s` excludes `\u0085` so it MUST be included separately.
Revert the simplification to the regular expression for escaping curly
braces for Bash and use a negative lookbehind on a curly brace instead
to avoid quadratic runtime.
src/unix.js Dismissed Show resolved Hide resolved
.replace(/([()<>])/gu, "\\$1")
.replace(/(["'`])/gu, "\\$1")
.replace(/(?<!\{)\{+(?=(?:[^{][^,.]*)?[,.][^}]*\})/gu, (curlyBraces) =>
curlyBraces.replace(/\{/gu, "\\{")

Check failure

Code scanning / CodeQL

Incomplete string escaping or encoding High

This does not escape backslash characters in the input.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backslash characters are handled separately (in the first replace of this call chain).

@ericcornelissen ericcornelissen marked this pull request as ready for review August 21, 2022 18:48
@ericcornelissen ericcornelissen added the performance Changes to improve performance label Aug 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Changes to the project's dependencies performance Changes to improve performance refactor Changes existing code without changing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant