-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provide a helper routine for building a static regex using std::lazy #709
Comments
I don't understand your feature request and I don't understand your diff. Please include more details. Maybe show how the feature you're requesting is intended to be used? |
SyncOnceCell is nightly std feature. I don't think regex crate need to introduce a macro that depends on nightly or lazy_static or so. |
OK, so I didn't even know that In any case, we've gone years without this and I see no reason to get impatient now. Once And to head off the question that I keep getting from people: no |
JFYI, this exists as a crate: https://crates.io/crates/regex-macro |
I'd rather wait to revisit this until With that said, it's not clear to me that this is something we want to do. It's pretty painless already to compose the static RE_GROUP: Lazy<Regex> =
Lazy::new(|| Regex::new(r"^[-A-Za-z0-9]+$").unwrap()); That's already pretty good IMO. So I think if someone wants to propose a new regex-specific API here, whatever they come up with is going to need to have a meaningful advantage over the above. |
I'd maybe push back against this tiny bit. To give a concrete example from git-branchless, Both this: lazy_static! {
static ref RE: Regex = Regex::new(r"^([^ ]+) (.+)$").unwrap();
};
match RE.captures(line) { And this: static RE: Lazy<Regex> = Lazy::new(|| Regex::new(r"^([^ ]+) (.+)$").unwrap());
match RE.captures(line) { do look to me meaningfully worse than a hypothetical: match regex!(r"^([^ ]+) (.+)$").captures(line) { Fore these kind of code, the use of At the same time, yes, absolutely, at the moment this issue isn't really actionable on the regex side, because the underlying lazy evaluation primitive isn't yet available in std. And also the implementation is really tiny and is available on crates.io and even in once-cell docs (https://docs.rs/once_cell/latest/once_cell/#lazily-compiled-regex), so there's no practical reason for |
Yeah I'd love to re-open this and revisit it. I do personally like the The other angle to put on this here is that |
|
EDIT: I updated this comment on 2024-01-23. I would love to hear folks thoughts on this. OK, so now that
I believe the macro would be defined as this: macro_rules! regex {
($re:literal $(,)?) => {{
use {std::sync::OnceLock, regex::Regex};
static RE: OnceLock<Regex> = OnceLock::new();
RE.get_or_init(|| Regex::new($re).unwrap())
}};
} Its return type is a I have some other concerns too that I've thought of:
fn regex(pattern: &'static str) -> &'static regex::Regex {
use std::sync::OnceLock;
static RE: OnceLock<regex::Regex> = OnceLock::new();
RE.get_or_init(|| regex::Regex::new(pattern).unwrap())
}
(A function doesn't work. This has to be a macro. See @matklad's rebuttal below.) Note that if you're reading this and wondering "why not just make compile time regex instead," please see #1076 and #1012. This macro is more about codifying a very common pattern that is used to avoid the footgun of compiling a regex in a loop. (A compile time regex would also solve that problem, but as the issues linked explain, there is a lot more to it than the simple macro proposed here that a bunch of folks are already using in one form or another.) |
I was also thinking that we can assuage some semantics concerns here by some creative naming. Something like regex::compile_lazy!("[a-z]+")
regex::compile_once!("[a-z]+") might read nice and self-explanatory. |
Yeah that might be a more precise name, but is annoyingly verbose IMO. :) |
@BurntSushi wrote:
What about adding a feature flag to enable the macro, documenting that if you set that feature flag the MSRV increases, and then making the macro unconditionally available once the crate's overall MSRV increases? |
That sadly works for exactly one regex :-) let foo = regex("foo");
let foo_again = regex("bar"); I could imagine something like fn regex<const pattern: &str>() -> &'static regex::Regex {
// Somehow ask the compiler to monomorphise a version of RE for each
// instantiation of regex
static RE: OnceLock<regex::Regex> = OnceLock::new();
...
} maybe working in Rust one day, but that seems unlikely. |
Derp. Right.
I suppose, but I'm not sure it's worth it. And that feature will have to stick around forever (or until |
I think this is an awesome idea! With regards to naming, I think
Regarding verbosity, I don't think it should be that big of a factor. I mean anything will be better than the current pattern, which is a minimum of a few lines to import OnceCell/OnceLock/lazy_static, set it up, and use it. If more clarity can exist in the api through a descriptive name, then I think prioritizing that over aesthetics isn't so bad. |
This would be the main feature I would want from a This could be achieved by defining a procedural macro that runs Of course, people always associate long compile times with procedural macros. The described macro could be very light though: the only tricky part is to get the value from the string literal (unescaping), which is implemented in Summary: I think your concerns about "compile time checking is expected" are valid and I think that should be the end-goal of a |
Pretty strongly opposed. For the following reasons:
|
I would disagree. Take this argument a few steps further and it sounds like "why type checking if you have unit tests". Your tests would need to hit the branch using that regex for this to work. When reviewing a PR where regexes are changed, a compile-time-checked regex would give me confidence that all still compile, without having to check whether all these regexes are covered by tests. But: your other points make total sense to me and I certainly don't want to derail this discussion. I didn't know about the clippy lint and yes, that would give me all the "PR review confidence" I described above. I still think |
But that's not my argument. I didn't take it a few steps further. I looked at this specific thing in context. My argument doesn't generalize. Consider also, for example, "is this regex syntax valid," is usually far less interesting of a question to ask than "does this regex have the semantics I intend." If you're testing out the latter, then you've already automatically cleared the hurdle of the former.
I'm still not a fan of a verbose name for this macro. :-\ I like Also, every time I define this macro for myself, I call it |
Introduce a macro that lazily initialise a static regex expression, call it
lazy_regex
.This is helpful whenever you want to avoid the compile-at-each-iteration pattern when using regexes across threads.
The code is simply,
or
Which is only a slight alteration from the one provided in the
once_cell
documentation. There are many possible variations of this --- there are many ways to lazily initialise a static variable. I think this is something that is common enough that it should be included.Thank you.
The text was updated successfully, but these errors were encountered: