-
-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: add regexp related eslint plugins #202
Conversation
…in-security, eslint-plugin-unicorn
✅ Deploy Preview for valibot ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Thank you for your contribution!
That's fine!
I have not tested it yet, but I am open to this change. It should have little effect on the bundle size due to compression. Do you want to take a look at it or should I do it?
The problem is that the methods that currently use spread copy the schema and only modify the
Great catch! I tend to check with a benchmark if your assumption is correct and if so, I would initialize the regex lazy, because then only those are initialized that are actually used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for you contribution!
library/src/validations/imei/imei.ts
Outdated
@@ -12,8 +12,8 @@ import { getOutput, getPipeIssues, isLuhnAlgo } from '../../utils/index.ts'; | |||
*/ | |||
export function imei<TInput extends string>(error?: ErrorMessage) { | |||
return (input: TInput): PipeResult<TInput> => | |||
!/^\d{2}[ |/|-]?\d{6}[ |/|-]?\d{6}[ |/|-]?\d$/.test(input) || | |||
!isLuhnAlgo(input) | |||
// eslint-disable-next-line security/detect-unsafe-regex -- false positive according to https://devina.io/redos-checker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this rule is so buggy, we should consider removing it in general or create an issue on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah.. this is one of the longest standing issues of the security plugin: eslint-community/eslint-plugin-security#28
Two alternatives exist:
- https://makenowjust-labs.github.io/recheck/docs/usage/as-eslint-plugin/ (discussed in that thread as well and wasn't picked because of some slow runs; this might have been fixed with the async run introduced recently?)
- https://github.com/tjenkinson/eslint-plugin-redos-detector (not mentioned in the thread)
For my personal config I've opted with sticking with eslint-plugin-security until they've sorted out what to use, but I wasn't aware of eslint-plugin-redos-detector
until now. Maybe worth trying out. I'll also mention that one in the thread above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as there is no fix, I prefer to disable or replace this linting rule globally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll play with the two plugins and update the PR accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
colinhacks/zod#2849 the zod PR is also interesting for this PR I'd say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should keep an eye on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to go with redos-detector, since it seems to find more (non-false positives) of redos regexps.
If you got time, feel free to, as you can probably judge better how much impact it has. In any case, I can provide this benchmark: https://www.measurethat.net/Benchmarks/Show/2472/0/arrayreduce-vs-for-loop-vs-arrayforeach, where reduce is much slower. By removing
Backwards compat you mean?
Through tree-shaking regexps that aren't used shouldn't execute anyway, or am I missing something? const regex = /.../;
export const isABC = () => () => regex.test(...) if To be sure, we could check the flamegraph of the startup to figure out if the simple approach significantly slows down the startup. Back in 2019, big regexps definitely did slow down: garycourt/uri-js#40. However, the v8 blog above is from half a year later, so I do think there is a chance this is much better nowadays. Unfortunately the uri-js issue made me realize the current regexp used for checking ipv6 in valibot might not be correct, as per garycourt/uri-js#40 (comment). The fastify team is now also using a different approach to parse ipv6 addresses: https://github.com/fastify/fast-uri/blob/main/lib/utils.js#L27. And they have a fast path for rejecting invalid ipv4's (https://github.com/fastify/fast-uri/blob/main/lib/utils.js#L6). |
Yes, I will do that. I have created an issue: #203
No, the point is that schemas must be able to be passed to other schemas and methods in a type-safe manner. This requires a consistent approach. Therefore, we cannot simply change the structure. Also, the modification methods should only modify and return a schema instead of producing a new schema or format.
The problem here is that if But as you already mentioned, we would check if all the extra work really improves the performance.
Can you create an issue or PR? |
Thank you, I've also created an issue for the ipv6 issue: #206
As far as I can tell we can control every schema, so if we were to rework every schema to use
Agreed. |
Feel free to take a look at it. For now, I think spread is fine here. It is called only once for initialization. The difference without spread should not be noticeable in the real world, away from specific benchmarks.
Either this is not possible due to dynamic arguments of a schema or I don't understand what you mean. Maybe you can explain it in more detail and provide a minimal code example. |
To get a faster merge, I can offer you to remove |
Sounds like a plan! That's fine for me. |
ref: https://github.com/colinhacks/zod/pull/2849/files Co-Authored-By: Tom Jenkinson <3259993+tjenkinson@users.noreply.github.com>
I think I was a bit confused, I meant more like the schema itself. Think about React lazy Example (from readme): // Create login schema with email and password
const LoginSchema = object({
email: string([email()]),
password: string([minLength(8)]),
});
// later on user interaction:
parse(LoginSchema, { email: '', password: '' }); vs. lazy: // Create login schema with email and password
const LoginSchema = () => object({
email: string([email()]),
password: string([minLength(8)]),
});
// later on user interaction
parse(LoginSchema, { email: '', password: '' }); In the 2nd code example, the LoginSchema would only be "generated" once the parse actually needs the schema. So if e.g. a user never submits the form where email & password are validated, their browser never has to create the schema. |
For the avoid spread topic, I played around a bit and I'm actually confident it would work (but is not backwards compatible). Example return {
...schema,
_parse(input, info) {
const result = schema._parse(input, info);
return !result.issues &&
Object.keys(input as object).some((key) => !(key in schema.object)) // <-- ℹ️ note the use of `schema` here
? getSchemaIssues(
info,
'object',
'strict',
error || 'Invalid keys',
input
)
: result;
}
} The returned object has spread return {
_schema: schema,
_parse(input, info) {
const result = schema._parse(input, info);
return !result.issues &&
Object.keys(input as object).some((key) => !(key in schema.object)) // <-- ℹ️ we're still referencing the argument `schema`
? getSchemaIssues(
info,
'object',
'strict',
error || 'Invalid keys',
input
)
: result;
}
} The code keeps working as intended. Only if someone called That all being said, I think lazy evaluation instead of avoiding spreads is probably more beneficial. |
Compared to Zod and ArkType, Valibot's initialization is very efficient. So I would not focus on it at the moment. But in the long run, we can look at it again.
I would not go down this path at this time. I think copying a schema with the spread syntax is just right in this use case. With your approach the return value is now no longer of the type |
Thank you for your contribution. I will try to review and merge this PR next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest looks good and ready to merge. Thank you!
library/src/validations/ipv4/ipv4.ts
Outdated
* | ||
* @param error The error message. | ||
* | ||
* @returns A validation function. | ||
*/ | ||
export function ipv4<TInput extends string>(error?: ErrorMessage) { | ||
return (input: TInput): PipeResult<TInput> => | ||
!/^((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}$/.test(input) | ||
!/^(?:(?:(?=(25[0-5]))\1|(?=(2[0-4]\d))\2|(?=(1\d{2}))\3|(?=(\d{1,2}))\4)\.){3}(?:(?=(25[0-5]))\5|(?=(2[0-4]\d))\6|(?=(1\d{2}))\7|(?=(\d{1,2}))\8)$/u.test( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly was changed in this regex that it is now so much longer and more complicated? Did you do this by hand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, from the zod PR: colinhacks/zod#2849. The previous regex is vulnerable to redos attacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked our current regex on devina.io and redosdetector.com and it seems to be safe. Can you check it again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redosdetector is sensitive to /
(devina.io isn't), maybe you copy pasted the leading and trailing slash in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After taking a manual look, it appears like the regex is not vulnerable (tested also via Rescue and regexploit) and top voted on stackoverflow.
with capture groups, the regex is much faster: https://esbench.com/bench/6532596e7ff73700a4debb6a than the one from e.g. zod or fastify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redosdetector says that both, the new and the old regex, are insecure: https://redosdetector.com/?pattern=%5E%28%3F%3A%28%3F%3A25%5B0-5%5D%7C%28%3F%3A2%5B0-4%5D%7C1%5Cd%7C%5B1-9%5D%29%3F%5Cd%29%5C.%3F%5Cb%29%7B4%7D%24&caseInsensitive=false&unicode=false
Therefore, I am unsure how to proceed. Basically, I could merge the PR anyway and we'll look at that in a separate issue or PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, only the one from the zod PR isn't flagged by redosdetector (the one you commented on). As mentioned in my comment above, I'd say let's ignore it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey!
So the reason it’s flagging it is because it’s hitting the max backtrack limit given there are so many permutations. You can see if you change the number of groups to {2}
it’s fine and {3}
is if you increase the number of steps with redos-detector check --unicode "^(?:(?:25[0-5]|(?:2[0-4]|1\d|[1-9])?\d)\.?\b){3}$" --maxBacktracks -1 --maxSteps -1
. With {4}
though it would take too long to finish.
The larger regex that pollyfills atomic groups does technically make a difference. E.g. if the input was 255.2!
I think it will backtrack with the following:
- 255.2 =
25[0-5]\.2
- 255.2 =
25[0-5]\.[1-9]
- 255.2 =
25[0-5]\.\d
- 25 =
2[0-4]
- 2 =
2
- 2 =
[1-9]
- 2 =
\d
Whereas with the pollyfilled atomic group it would be just:
- 255.2 =
25[0-5]\.2
- 255.2 =
25[0-5]\.[1-9]
- 255.2 =
25[0-5]\.\d
With 255.255.255.25!
there would be a bigger difference (although probably not a concerning level).
Also if we look at one part in reality it wouldn’t be that high because it doesn’t group the results into ones that can match the same string. E.g. with ^(1|1|2|2)$
the tool will currently report 2 backtracks even though there can only be 1.
Definitely some room for improvement there. Created tjenkinson/redos-detector#445
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for checking in!
As discussed, this adds a few eslint plugins.
The IP/IPv6 regexes get a bit bigger, because they have a lot of capture groups that were unused, so every capture group now is no longer capturing (via
?:
).By the way, while going over the ruleset of the unicorn plugin, I've found two rules:
reduce
calls already?{ _schema: schema, _parse: () => (...) }
. Other functions then could check e.g.!(key in schema._schema.object)
)And additionally, currently it looks to me like regexes are created every time a schema with e.g.
ipv4
is validated. Creating regexps is expensive and caching them would be faster. This is also true for serverless, where you're billed for run costs (and not startup costs). So moving the regex creation outside, moves the regex compilation time to the startup time (so regex is compiled only once instead of every invocation).