-
-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider replacing Regex with simpler scanner #228
Comments
@cloneable this seems like a great idea! Any chance you'd like to open a PR on it? |
Ok, I'll give it a shot. Seems a good chance to improve my Rust in this area. |
@cloneable Great to hear! |
Great, thanks! The current pattern for an env var name is this: (Btw, I believe the pipes are matched too. You probably meant to use parentheses or no pipes in brackets. Also, the period loses it's special meaning in brackets and doesn't need to be escaped. The first \w doesn't need to be in brackets.) Question: \w actually implies a lot, including \d. I guess the original intent was to match a letter at the beginning and then zero or more letters, digits, periods and underscores. One could argue that it's now too late to restrict this, but let me know if you want to do that as is would simplify my code. https://docs.rs/regex/1.5.4/regex/#perl-character-classes-unicode-friendly |
@cloneable I agree with your assessment, I think the logic should basically be completely replaced with just a dumb find which scans from "$ENV{" to the first "}". Whether the thing inside the "$ENV{...}" is valid I dont think we really care. |
OK, I'll adjust the draft PR. |
@estk, allowing anything between { } means we allow $ENV{} too. Users might think it's possible to use nested $ENVs, like e.g. $ENV{endpoint-$ENV{instance_number}}. I think it's best to replicate what the regular expression currently allows. Or tighten that to ascii and additionally allow current behavior via feature. Wdyt? |
Sounds good to me, certainly no need to do recursive lookup.
…On Sat, Aug 7, 2021 at 04:36 Folke Behrens ***@***.***> wrote:
@estk <https://github.com/estk>, allowing anything between { } means we
allow $ENV{} too. Users might think it's possible to use nested $ENVs, like
e.g. $ENV{endpoint-$ENV{instance_number}}.
The new code would try to look for an env var
"endpoint-$ENV{instance_number". (I can try to turn the code into recursive
lookup, but I kinda don't want this to turn into a rabbit hole.)
I think it's best to replicate what the regular expression currently
allows. Or tighten that to ascii and additionally allow current behavior
via feature. Wdyt?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#228 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAC7HSTV2KN57UKZUWJHTRTT3ULFPANCNFSM5BJ5EPHA>
.
|
I was checking a binary with cargo bloat and to my surprise I noticed that log4rs pulled in regex which turned out to be 4x as large as log4rs itself. So I was curious what log4rs is using regex for. I could only find one line with a reg exp matcher that's not even that complex.
So I'm wondering if you have more plans for regex crate or if you would be willing to replace this one use with a handwritten scanner?
The text was updated successfully, but these errors were encountered: