Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: compile-time verification of the regular expressions #607

Closed
vorner opened this issue Aug 8, 2019 · 18 comments
Closed

Idea: compile-time verification of the regular expressions #607

vorner opened this issue Aug 8, 2019 · 18 comments

Comments

@vorner
Copy link

vorner commented Aug 8, 2019

Hello

I know the compile-time regular expressions that were provided by regex-macros are no longer supported. I found them quite an interesting thing.

I was wondering how much work it would be to provide a subset of that functionality. I'm not really worried about runtime allocation or the performance of building the regex at runtime. But being able to validate the regular expression syntax at compile time was a nice thing.

So I was wondering, with proc-macros stabilized, how much work would it be to have just the verification? Basically, the macro would take the string, try to compile it and then spit out the very ordinary Regex::new(string) code in its place.

Does that idea make some sense?

@BurntSushi
Copy link
Member

Does the invalid_regex Clippy lint help here instead?

Aside from that, it's a nice idea, but I don't think it needs to be maintained in this repo? It would need to be a separate crate anyway I think? There are some complexities to this too. Namely, whether building the regex succeeds or not doesn't just depend on the concrete syntax, but also on the options given to regex construction (such as whether the pattern is permitted to match invalid UTF-8). Clippy doesn't get this quite right---I think they are just using default options and making sure the regex is a valid parse. This is probably good enough to cover most cases though. Technically, it goes even further than this: even if the regex parses correctly, it's still not guaranteed to compile, since it may exceed size limits or other errors enforced only in the compiler.

@vorner
Copy link
Author

vorner commented Aug 8, 2019

Oh, I didn't know about the clippy lint. It never even crossed my mind a generic linter would know about specific crates, but I guess it makes some sense.

but I don't think it needs to be maintained in this repo?

It wouldn't necessarily need to live here, no. But I asked here first for several reasons:

  • Many crates that have some companion procmacros for them (serde-derive, structopt-derive) have them in the same repository and sometimes (maybe conditionally) depend on them.
  • If there was such a crate, it would make some sense to consider recycling the regex-macros crate name and that one still points to this repository.
  • People around this repository would probably have the best knowledge about what already exists, if there are any other plans in that direction, etc.

Thanks for the pointers about the complexities, something to watch out for.

As somewhat related question ‒ how far is Rust support of const fn away from being able to actually parse and build the regular expression during compile time (or some other similar mechanism, so one could have the benefits of compile-time compilation without too much trouble).

@BurntSushi
Copy link
Member

BurntSushi commented Aug 8, 2019

Yeah those are good points. In principle, I'm not opposed to bringing a hypothetical crate for this into this repo and maintaining it, but I just don't think I have the bandwidth for it right now. I could in principle maintain a minimal first version of it, but it wouldn't do much more than what Clippy gives you now. My current focus is on improving the implementation of regex.

how far is Rust support of const fn away from being able to actually parse and build the regular expression during compile time (or some other similar mechanism, so one could have the benefits of compile-time compilation without too much trouble).

I don't know. :-( I haven't been following const fn progress too closely. The last time I looked, it was still pretty primitive---I don't think it supports conditional flow yet for example? To give you some context, I'd say around 2/3 of all code in the regex source tree is dedicated to compilation time, so const fn probably would need to evolve to a point where it supports arbitrary non-I/O Rust code, including dynamic memory allocation and interior mutability. I talked with eddyb about this a long time ago, and he seemed pretty convinced that it was possible. But honestly, I'd be pretty surprised if it all Just Worked.

@vorner
Copy link
Author

vorner commented Aug 8, 2019

In principle, I'm not opposed to bringing a hypothetical crate for this into this repo and maintaining it, but I just don't think I have the bandwidth for it right now.

I can't promise the bandwidth for writing it right now either, but if I found the motivation and wrote it, I guess I would feel morally obliged to take some care of it myself (like solving bugs, etc), no matter if it lived in a separate repo or here. But then again, how much maintainership could be in the minimal version anyway?

I wonder if it could, eventually, evolve into the full-featured version of doing the work in compile time ‒ but that's maybe not worth the trouble.

The last time I looked, it was still pretty primitive---I don't think it supports conditional flow yet for example?

I see. In that case this is definitely not any time soon. This is something C++ is definitely ahead of Rust right now :-(.

@BurntSushi
Copy link
Member

Sorry, it looks like your comment either didn't make it to my inbox, or it fell through the cracks of my email triage. :-)

But then again, how much maintainership could be in the minimal version anyway?

It used to be a lot more maintenance when regex_macros was using an unstable compiler API, but I don't think that will be the case any more. Basically, it will come down to filing bugs and folks asking for more features. I've found that to be a pretty recurring pattern: start by adding a feature that one thinks is small, but it actually winds up being a launching point for folks wanting more.

I would definitely see if you could leverage Clippy for this, honestly. If not, yeah, I'd start it as a different repo. I'd be open to having it rolled into this repo at some point in the future.

@BatmanAoD
Copy link
Member

Now that control-flow is available in const fn, is it possible that compile-time regex compilation could be implemented just by applying const throughout the compiler where appropriate?

@BurntSushi
Copy link
Member

Not even close unfortunately. I addressed this above:

so const fn probably would need to evolve to a point where it supports arbitrary non-I/O Rust code, including dynamic memory allocation and interior mutability

@BatmanAoD
Copy link
Member

Apologies for not noticing that sentence before posting!

@vorner
Copy link
Author

vorner commented Jun 5, 2021

It seems like someone has already done it.

https://crates.io/crates/lazy-regex

@BurntSushi
Copy link
Member

I'm going to close this out primarily because I think the Clippy lint is probably sufficient for most cases here. In particular, this issue is "merely" asking for verification, and Clippy can just about almost give you that. It is possible for the Clippy lint to pass but for the regex to fail. Notably, if your regex blows the default compilation limits. For example, \pL{50000} will fail to compile despite it being syntactically valid. But I still think the Clippy lint gets the most bang for your buck here.

There is a separate issue of "compile time regexes," but that's an entirely different can of worms.

@BurntSushi BurntSushi closed this as not planned Won't fix, can't repro, duplicate, stale Aug 27, 2022
@schneems
Copy link

Does clippy only check the regex crate or any one that has a Regex::new() like fancy_regex?

@BurntSushi
Copy link
Member

BurntSushi commented Jan 15, 2023

I don't know. That's a question for the Clippy project...

@schneems
Copy link

I checked manually (it didn't occur to me earlier, sorry for the question spam). The answer for everyone else is: No, it's not checked. I opened an issue against clippy to explore what it might take to get that functionality.

@mcandre
Copy link

mcandre commented Mar 31, 2023

Notably, Go has compile time validated regexes.

While clippy is helpful, not all Rust coders use clippy. Would love to support for compile time regex validation in the Rust regex crate, in order to catch mistakes earlier in the SDLC.

@BurntSushi
Copy link
Member

@mcandre Can you show me a Go program with "compile time validated regexes"?

While clippy is helpful, not all Rust coders use clippy. Would love to support for compile time regex validation in the Rust regex crate, in order to catch mistakes earlier in the SDLC.

If you want to catch mistakes earlier in the SDLC, then Clippy sounds like a great way to do it!

@mcandre
Copy link

mcandre commented Mar 31, 2023

Oh, maybe I misread the Go docs.

https://pkg.go.dev/regexp#MustCompile

Looks like Go may not validate at compile time, but rather panics. So Rust is at feature parity with Go there.

Can we do better?

@BurntSushi
Copy link
Member

BurntSushi commented Mar 31, 2023

Indeed, that's what I thought.

If you want compile time verification, use Clippy. Otherwise, I don't think the benefit is worth the development, maintenance and API cost for this. Compile time verification would require a whole separate API, probably proc-macro based, for building a regex. I don't see it happening. Especially when Clippy exists and does this for you already.

Now, some day, compile time regexes might happen, and a side effect of that is compile time verification. But that's a lot more work, however, it has some potentially very interesting benefits. The design space for it is quite large though, so it isn't happening any time soon.

@kaj
Copy link

kaj commented Apr 1, 2023

Yes, rust can do compile-time verification of regular expressions!

https://docs.rs/lazy-regex/latest/lazy_regex/

This crate uses a macro that wraps the regex in a once-cell, so it is only compiled once at runtime, but also compiles the the macro at build-time just to check for errors. The actual regex compilation (both runtime and compile-time) is still done by the regex crate.

This is still not "real" compile-time compilation of the regexes (it still links with the entire regex crate even if only a very simple regex is used). But the compile-time checks, both of the actual regex syntax and that the correct number of capture groups is used, is very useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants