chore: add regexp related eslint plugins #202

kurtextrem · 2023-10-07T09:37:41Z

As discussed, this adds a few eslint plugins.

The IP/IPv6 regexes get a bit bigger, because they have a lot of capture groups that were unused, so every capture group now is no longer capturing (via ?:).

By the way, while going over the ruleset of the unicorn plugin, I've found two rules:

no-array-reduce - in my experience reduce is indeed slower than doing manual loops. Have you tested replacing the reduce calls already?
no-useless-spread - I see some spread calls, e.g. in src/methods/strict. Is immutability needed here? (e.g. what if instead of spreading it, we'd use something like { _schema: schema, _parse: () => (...) }. Other functions then could check e.g. !(key in schema._schema.object))

And additionally, currently it looks to me like regexes are created every time a schema with e.g. ipv4 is validated. Creating regexps is expensive and caching them would be faster. This is also true for serverless, where you're billed for run costs (and not startup costs). So moving the regex creation outside, moves the regex compilation time to the startup time (so regex is compiled only once instead of every invocation).

…in-security, eslint-plugin-unicorn

netlify · 2023-10-07T09:37:44Z

✅ Deploy Preview for valibot ready!

Name	Link
🔨 Latest commit	`e1969e3`
🔍 Latest deploy log	https://app.netlify.com/sites/valibot/deploys/6532bb8df6d2af0007602a26
😎 Deploy Preview	https://deploy-preview-202--valibot.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

fabian-hiller · 2023-10-07T15:53:19Z

Thank you for your contribution!

The IP/IPv6 regexes get a bit bigger, because they have a lot of capture groups that were unused, so every capture group now is no longer capturing (via ?:).

That's fine!

By the way, while going over the ruleset of the unicorn plugin, I've found two rules:

no-array-reduce - in my experience reduce is indeed slower than doing manual loops. Have you tested replacing the reduce calls already?

I have not tested it yet, but I am open to this change. It should have little effect on the bundle size due to compression. Do you want to take a look at it or should I do it?

no-useless-spread - I see some spread calls, e.g. in src/methods/strict. Is immutability needed here? (e.g. what if instead of spreading it, we'd use something like { _schema: schema, _parse: () => (...) }. Other functions then could check e.g. !(key in schema._schema.object))

The problem is that the methods that currently use spread copy the schema and only modify the ._parse method. Since schemas can have different keys, this must be done dynamically. Modifying the existing schema directly is unfortunately not an option, as this would lead to bugs. Also, _schema: schema is not an alternative as it would break compatibility.

And additionally, currently it looks to me like regexes are created every time a schema with e.g. ipv4 is validated. Creating regexps is expensive and caching them would be faster. This is also true for serverless, where you're billed for run costs (and not startup costs). So moving the regex creation outside, moves the regex compilation time to the startup time (so regex is compiled only once instead of every invocation).

Great catch! I tend to check with a benchmark if your assumption is correct and if so, I would initialize the regex lazy, because then only those are initialized that are actually used.

fabian-hiller

Thanks again for you contribution!

library/package.json

library/src/schemas/special/special.test.ts

library/src/validations/email/email.ts

fabian-hiller · 2023-10-07T16:05:15Z

library/src/validations/imei/imei.ts

@@ -12,8 +12,8 @@ import { getOutput, getPipeIssues, isLuhnAlgo } from '../../utils/index.ts';
 */
 export function imei<TInput extends string>(error?: ErrorMessage) {
  return (input: TInput): PipeResult<TInput> =>
-    !/^\d{2}[ |/|-]?\d{6}[ |/|-]?\d{6}[ |/|-]?\d$/.test(input) ||
-    !isLuhnAlgo(input)
+    // eslint-disable-next-line security/detect-unsafe-regex -- false positive according to https://devina.io/redos-checker


If this rule is so buggy, we should consider removing it in general or create an issue on GitHub.

Yeah.. this is one of the longest standing issues of the security plugin: eslint-community/eslint-plugin-security#28

Two alternatives exist:

https://makenowjust-labs.github.io/recheck/docs/usage/as-eslint-plugin/ (discussed in that thread as well and wasn't picked because of some slow runs; this might have been fixed with the async run introduced recently?)

https://github.com/tjenkinson/eslint-plugin-redos-detector (not mentioned in the thread)

For my personal config I've opted with sticking with eslint-plugin-security until they've sorted out what to use, but I wasn't aware of eslint-plugin-redos-detector until now. Maybe worth trying out. I'll also mention that one in the thread above

As long as there is no fix, I prefer to disable or replace this linting rule globally.

I'll play with the two plugins and update the PR accordingly

colinhacks/zod#2849 the zod PR is also interesting for this PR I'd say

Yes we should keep an eye on this

I decided to go with redos-detector, since it seems to find more (non-false positives) of redos regexps.

kurtextrem · 2023-10-07T17:52:23Z

Do you want to take a look at it or should I do it?

If you got time, feel free to, as you can probably judge better how much impact it has. In any case, I can provide this benchmark: https://www.measurethat.net/Benchmarks/Show/2472/0/arrayreduce-vs-for-loop-vs-arrayforeach, where reduce is much slower. By removing reduce, we also get rid of on closure per execution (e.g. in comparable), so double win.

Also, _schema: schema is not an alternative as it would break compatibility.

Backwards compat you mean?

Great catch! I tend to check with a benchmark if your assumption is correct and if so, I would initialize the regex lazy, because then only those are initialized that are actually used.

Through tree-shaking regexps that aren't used shouldn't execute anyway, or am I missing something?
so e.g. the following should work:

const regex = /.../;

export const isABC = () => () => regex.test(...)

if isABC is used somewhere in the application, regex will compile at startup time.
As alternative, if you're concerned a lot of regexps are in the bundle but aren't directly on the execution path, we could use https://github.com/sindresorhus/lazy-value, but that then adds a small overhead (as instead of reading a variable it becomes a function call). However, based on https://v8.dev/blog/regexp-tier-up, I'd say we should pick the first approach and not overcomplicate.
For global regexps, we need to make sure to reset the .lastIndex property (or use String#replaceAll, which is one of the eslint rules, currently commented out in the PR)

To be sure, we could check the flamegraph of the startup to figure out if the simple approach significantly slows down the startup. Back in 2019, big regexps definitely did slow down: garycourt/uri-js#40. However, the v8 blog above is from half a year later, so I do think there is a chance this is much better nowadays.

Unfortunately the uri-js issue made me realize the current regexp used for checking ipv6 in valibot might not be correct, as per garycourt/uri-js#40 (comment). The fastify team is now also using a different approach to parse ipv6 addresses: https://github.com/fastify/fast-uri/blob/main/lib/utils.js#L27. And they have a fast path for rejecting invalid ipv4's (https://github.com/fastify/fast-uri/blob/main/lib/utils.js#L6).
Example invariant for valibot's regex: fe80::3:bEFf:5b:%3NG (generated via https://marketplace.visualstudio.com/items?itemName=Nixinova.inlay-regex). It validates to true with valibot's ipv6 regex, but not with the one from uri-js.

fabian-hiller · 2023-10-07T18:29:38Z

If you got time, feel free to, as you can probably judge better how much impact it has...

Yes, I will do that. I have created an issue: #203

Also, _schema: schema is not an alternative as it would break compatibility.

Backwards compat you mean?

No, the point is that schemas must be able to be passed to other schemas and methods in a type-safe manner. This requires a consistent approach. Therefore, we cannot simply change the structure. Also, the modification methods should only modify and return a schema instead of producing a new schema or format.

Through tree-shaking regexps that aren't used shouldn't execute anyway, or am I missing something? so e.g. the following should work:
const regex = /.../;

export const isABC = () => () => regex.test(...)

The problem here is that if isABC is used in a schema, but the regex is never used because a user is just clicking through a website, for example, there is still an initial cost for initializing the regex. My approach would be to check if the regex is already initialized when the validation is executed. If not, it will be initialized before it is used.

But as you already mentioned, we would check if all the extra work really improves the performance.

Unfortunately the uri-js issue made me realize the current regexp...

Can you create an issue or PR?

kurtextrem · 2023-10-08T09:39:29Z

Yes, I will do that. I have created an issue: #203

Thank you, I've also created an issue for the ipv6 issue: #206

No, the point is that schemas must be able to be passed to other schemas and methods in a type-safe manner. This requires a consistent approach. Therefore, we cannot simply change the structure. Also, the modification methods should only modify and return a schema instead of producing a new schema or format.

As far as I can tell we can control every schema, so if we were to rework every schema to use _schema instead of spreading, we'd be consistent. I might be missing something though, I'll maybe try experimenting with it and make a draft PR when I get to some results.

The problem here is that if isABC is used in a schema, but the regex is never used because a user is just clicking through a website, for example, there is still an initial cost for initializing the regex. My approach would be to check if the regex is already initialized when the validation is executed. If not, it will be initialized before it is used.

Agreed.
Speaking of, maybe that's also useful for schema creation (lazy evaluation). If parse is never called on the schema, why pay the cost of creating the schema?
I'm not sure how much impact this has in the end though, you can probably tell better how expensive schema creation could turn out.

fabian-hiller · 2023-10-08T17:13:15Z

As far as I can tell we can control every schema, so if we were to rework every schema to use _schema instead of spreading, we'd be consistent. I might be missing something though, I'll maybe try experimenting with it and make a draft PR when I get to some results.

Feel free to take a look at it. For now, I think spread is fine here. It is called only once for initialization. The difference without spread should not be noticeable in the real world, away from specific benchmarks.

Agreed. Speaking of, maybe that's also useful for schema creation (lazy evaluation). If parse is never called on the schema, why pay the cost of creating the schema? I'm not sure how much impact this has in the end though, you can probably tell better how expensive schema creation could turn out.

Either this is not possible due to dynamic arguments of a schema or I don't understand what you mean. Maybe you can explain it in more detail and provide a minimal code example.

fabian-hiller · 2023-10-08T17:58:01Z

To get a faster merge, I can offer you to remove eslint-config-canonical and eslint-plugin-unicorn and disable security/detect-unsafe-regex globally. If eslint-plugin-redos-detector is a replacement for security/detect-unsafe-regex and works, you can add it. For everything else where we are not sure yet like eslint-config-canonical and eslint-plugin-unicorn, you can create another PR.

kurtextrem · 2023-10-08T18:01:35Z

Sounds like a plan! That's fine for me.

ref: https://github.com/colinhacks/zod/pull/2849/files Co-Authored-By: Tom Jenkinson <3259993+tjenkinson@users.noreply.github.com>

kurtextrem · 2023-10-15T13:04:04Z

Either this is not possible due to dynamic arguments of a schema or I don't understand what you mean. Maybe you can explain it in more detail and provide a minimal code example.

I think I was a bit confused, I meant more like the schema itself. Think about React lazy useState.

Example (from readme):

// Create login schema with email and password
const LoginSchema = object({
  email: string([email()]),
  password: string([minLength(8)]),
});

// later on user interaction:
parse(LoginSchema, { email: '', password: '' });

vs. lazy:

// Create login schema with email and password
const LoginSchema = () => object({
  email: string([email()]),
  password: string([minLength(8)]),
});

// later on user interaction
parse(LoginSchema, { email: '', password: '' });

In the 2nd code example, the LoginSchema would only be "generated" once the parse actually needs the schema. So if e.g. a user never submits the form where email & password are validated, their browser never has to create the schema.

kurtextrem · 2023-10-15T13:11:21Z

For the avoid spread topic, I played around a bit and I'm actually confident it would work (but is not backwards compatible). Example strict.ts:

return {
    ...schema,
    _parse(input, info) {
      const result = schema._parse(input, info);
      return !result.issues &&
        Object.keys(input as object).some((key) => !(key in schema.object)) // <-- ℹ️ note the use of `schema` here
        ? getSchemaIssues(
            info,
            'object',
            'strict',
            error || 'Invalid keys',
            input
          )
        : result;
    }
}

The returned object has spread schema into the returned schema, however, the _parse function uses the original schema, which means now two objects live inside the memory heap (original schema and cloned one, next to the newly returned schema).
So when changing it to e.g.:

return {
    _schema: schema,
    _parse(input, info) {
      const result = schema._parse(input, info);
      return !result.issues &&
        Object.keys(input as object).some((key) => !(key in schema.object)) // <-- ℹ️ we're still referencing the argument `schema`
        ? getSchemaIssues(
            info,
            'object',
            'strict',
            error || 'Invalid keys',
            input
          )
        : result;
    }
}

The code keeps working as intended. Only if someone called strict(object(...)).async = true, then it'd no longer work and it should be updated to strict(object(...))._schema.async = true (so this would be semver major, breaking).

That all being said, I think lazy evaluation instead of avoiding spreads is probably more beneficial.

fabian-hiller · 2023-10-15T16:43:01Z

In the 2nd code example, the LoginSchema would only be "generated" once the parse actually needs the schema. So if e.g. a user never submits the form where email & password are validated, their browser never has to create the schema.

Compared to Zod and ArkType, Valibot's initialization is very efficient. So I would not focus on it at the moment. But in the long run, we can look at it again.

The code keeps working as intended. Only if someone called strict(object(...)).async = true, then it'd no longer work and it should be updated to strict(object(...))._schema.async = true (so this would be semver major, breaking).

I would not go down this path at this time. I think copying a schema with the spread syntax is just right in this use case. With your approach the return value is now no longer of the type ObjectSchema. This has the consequence that the schema can no longer be used in other schemas or methods without restrictions.

fabian-hiller · 2023-10-15T16:46:17Z

Thank you for your contribution. I will try to review and merge this PR next week.

fabian-hiller

Rest looks good and ready to merge. Thank you!

fabian-hiller · 2023-10-16T04:12:10Z

library/src/validations/ipv4/ipv4.ts

 *
 * @param error The error message.
 *
 * @returns A validation function.
 */
 export function ipv4<TInput extends string>(error?: ErrorMessage) {
  return (input: TInput): PipeResult<TInput> =>
-    !/^((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}$/.test(input)
+    !/^(?:(?:(?=(25[0-5]))\1|(?=(2[0-4]\d))\2|(?=(1\d{2}))\3|(?=(\d{1,2}))\4)\.){3}(?:(?=(25[0-5]))\5|(?=(2[0-4]\d))\6|(?=(1\d{2}))\7|(?=(\d{1,2}))\8)$/u.test(


What exactly was changed in this regex that it is now so much longer and more complicated? Did you do this by hand?

Nope, from the zod PR: colinhacks/zod#2849. The previous regex is vulnerable to redos attacks.

I checked our current regex on devina.io and redosdetector.com and it seems to be safe. Can you check it again?

https://redosdetector.com/?pattern=%5E%28%2825%5B0-5%5D%7C%282%5B0-4%5D%7C1%5Cd%7C%5B1-9%5D%7C%29%5Cd%29%5C.%3F%5Cb%29%7B4%7D%24&caseInsensitive=false&unicode=true

redosdetector is sensitive to / (devina.io isn't), maybe you copy pasted the leading and trailing slash in?

After taking a manual look, it appears like the regex is not vulnerable (tested also via Rescue and regexploit) and top voted on stackoverflow.

with capture groups, the regex is much faster: https://esbench.com/bench/6532596e7ff73700a4debb6a than the one from e.g. zod or fastify

redosdetector says that both, the new and the old regex, are insecure: https://redosdetector.com/?pattern=%5E%28%3F%3A%28%3F%3A25%5B0-5%5D%7C%28%3F%3A2%5B0-4%5D%7C1%5Cd%7C%5B1-9%5D%29%3F%5Cd%29%5C.%3F%5Cb%29%7B4%7D%24&caseInsensitive=false&unicode=false

Therefore, I am unsure how to proceed. Basically, I could merge the PR anyway and we'll look at that in a separate issue or PR.

Yeah, only the one from the zod PR isn't flagged by redosdetector (the one you commented on). As mentioned in my comment above, I'd say let's ignore it for now

Hey!

So the reason it’s flagging it is because it’s hitting the max backtrack limit given there are so many permutations. You can see if you change the number of groups to {2} it’s fine and {3} is if you increase the number of steps with redos-detector check --unicode "^(?:(?:25[0-5]|(?:2[0-4]|1\d|[1-9])?\d)\.?\b){3}$" --maxBacktracks -1 --maxSteps -1. With {4} though it would take too long to finish.

The larger regex that pollyfills atomic groups does technically make a difference. E.g. if the input was 255.2! I think it will backtrack with the following:

255.2 = 25[0-5]\.2

255.2 = 25[0-5]\.[1-9]

255.2 = 25[0-5]\.\d

25 = 2[0-4]

2 = 2

2 = [1-9]

2 = \d

Whereas with the pollyfilled atomic group it would be just:

255.2 = 25[0-5]\.2

255.2 = 25[0-5]\.[1-9]

255.2 = 25[0-5]\.\d

With 255.255.255.25! there would be a bigger difference (although probably not a concerning level).

Also if we look at one part in reality it wouldn’t be that high because it doesn’t group the results into ones that can match the same string. E.g. with ^(1|1|2|2)$ the tool will currently report 2 backtracks even though there can only be 1.

Definitely some room for improvement there. Created tjenkinson/redos-detector#445

Thank you for checking in!

library/src/validations/ipv6/ipv6.ts

library/src/validations/ulid/ulid.ts

library/tsconfig.json

fastest variant

kurtextrem added 2 commits October 7, 2023 11:05

chore: add eslint-config-canonical, eslint-plugin-regexp, eslint-plug…

b104b29

…in-security, eslint-plugin-unicorn

apply --fix & manually fix some errors

dafc3be

correct typo in comment

4f34bad

fabian-hiller reviewed Oct 7, 2023

View reviewed changes

fabian-hiller self-assigned this Oct 7, 2023

fabian-hiller added enhancement New feature or request priority This has priority labels Oct 7, 2023

fabian-hiller mentioned this pull request Oct 7, 2023

Change .reduce to for...of loop to improve performance #203

Closed

kazizi55 mentioned this pull request Oct 9, 2023

ci: add lint and format check #208

Merged

kurtextrem and others added 2 commits October 15, 2023 14:27

remove canonical & unicorn

059c471

update ipv4 regex to the one from @tjenkinson

d52acfa

ref: https://github.com/colinhacks/zod/pull/2849/files Co-Authored-By: Tom Jenkinson <3259993+tjenkinson@users.noreply.github.com>

kurtextrem changed the title ~~chore: add more eslint plugins~~ chore: add regexp related eslint plugins Oct 15, 2023

kurtextrem mentioned this pull request Oct 15, 2023

chore: add eslint unicorn plugin #213

Draft

fabian-hiller requested changes Oct 16, 2023

View reviewed changes

kurtextrem added 4 commits October 16, 2023 18:23

undo tsconfig change

00a212b

use i mod to make regex shorter

e937d58

Update ipv4 regex

c8fc071

fastest variant

update ipv4 regex & disable redos-detector

7ce200c

fabian-hiller approved these changes Oct 20, 2023

View reviewed changes

fabian-hiller and others added 2 commits October 20, 2023 13:36

Restructure ESLint config and update changelog

31ac6d3

Merge branch 'main' into eslint

e1969e3

fabian-hiller merged commit 79dc7f3 into fabian-hiller:main Oct 20, 2023
1 check passed

tjenkinson mentioned this pull request Oct 21, 2023

Add eslint plugin for detecting redos vulnerabilities and tweak regex ip regex to prevent backtracks colinhacks/zod#2849

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: add regexp related eslint plugins #202

chore: add regexp related eslint plugins #202

kurtextrem commented Oct 7, 2023 •

edited

Loading

netlify bot commented Oct 7, 2023 •

edited

Loading

fabian-hiller commented Oct 7, 2023

fabian-hiller left a comment

fabian-hiller Oct 7, 2023

kurtextrem Oct 7, 2023

fabian-hiller Oct 7, 2023 •

edited

Loading

kurtextrem Oct 8, 2023

kurtextrem Oct 8, 2023

fabian-hiller Oct 8, 2023

kurtextrem Oct 15, 2023

kurtextrem commented Oct 7, 2023

fabian-hiller commented Oct 7, 2023

kurtextrem commented Oct 8, 2023

fabian-hiller commented Oct 8, 2023

fabian-hiller commented Oct 8, 2023

kurtextrem commented Oct 8, 2023 •

edited

Loading

kurtextrem commented Oct 15, 2023

kurtextrem commented Oct 15, 2023 •

edited

Loading

fabian-hiller commented Oct 15, 2023

fabian-hiller commented Oct 15, 2023 •

edited

Loading

fabian-hiller left a comment

fabian-hiller Oct 16, 2023

kurtextrem Oct 16, 2023

fabian-hiller Oct 20, 2023

kurtextrem Oct 20, 2023

kurtextrem Oct 20, 2023

fabian-hiller Oct 20, 2023

kurtextrem Oct 20, 2023

tjenkinson Oct 21, 2023

kurtextrem Oct 21, 2023

chore: add regexp related eslint plugins #202

chore: add regexp related eslint plugins #202

Conversation

kurtextrem commented Oct 7, 2023 • edited Loading

netlify bot commented Oct 7, 2023 • edited Loading

✅ Deploy Preview for valibot ready!

fabian-hiller commented Oct 7, 2023

fabian-hiller left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabian-hiller Oct 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kurtextrem commented Oct 7, 2023

fabian-hiller commented Oct 7, 2023

kurtextrem commented Oct 8, 2023

fabian-hiller commented Oct 8, 2023

fabian-hiller commented Oct 8, 2023

kurtextrem commented Oct 8, 2023 • edited Loading

kurtextrem commented Oct 15, 2023

kurtextrem commented Oct 15, 2023 • edited Loading

fabian-hiller commented Oct 15, 2023

fabian-hiller commented Oct 15, 2023 • edited Loading

fabian-hiller left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kurtextrem commented Oct 7, 2023 •

edited

Loading

netlify bot commented Oct 7, 2023 •

edited

Loading

fabian-hiller Oct 7, 2023 •

edited

Loading

kurtextrem commented Oct 8, 2023 •

edited

Loading

kurtextrem commented Oct 15, 2023 •

edited

Loading

fabian-hiller commented Oct 15, 2023 •

edited

Loading