-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storing mapping from names to group indices into Regex #158
Conversation
@defuz I like it. It kind of makes me wonder why I didn't do it this way initially---recreating a hashmap on each match seems a little crazy! My best guess is that I thought it'd be nice to not create the hash map if we didn't need it (e.g., never ask for captures or don't have any named groups), but compatively speaking, one (usually small) hashmap seems pretty harmless to me. Regardless, we could lazily build it once using interior mutability, but I don't think that's necessary. With all that said, this is undoubtedly breaking and backwards incompatible. You're pretty much guaranteed this when you add a lifetime parameter to a public type, because any downstream use of that type now requires specifying the additional lifetime. (Unless, of course, lifetime elision kicks in.) In this case, I doubt there would be many breaking changes since one rarely utters the types you've changed explicitly, and instead they're usually inferred. What do you think of keeping this PR open and putting it on the 1.0 milestone? There is already at least one other breaking change cooking (#151), so I like the idea of batching them all up to reduce churn in the ecosystem. In any case, thank you! |
I think one thing this makes impossible is keeping captures around after the regex has gone out of scope, but that seems a little weird to me. |
You're right, I was too optimistic about backward compatibility. :)
This sounds reasonable to me.
Yes, this is the price for the fact that we do not create Moreover, I think we can use |
@defuz Sounds good to me. Moving the HashMap into N.B. This branch will need to be rebased before 1.0 because of changes I'm making. I'll ping you then or I can do it! Thanks! |
I think this improvement can be added backwards compatibly with |
@pczarn Probably we will need @BurntSushi I thought that if this is a backward incompatible change, can we go further? We can separate features of impl Regex {
...
pub fn captures<'t>(&self, text: &'t str) -> Option<Captures<'t>> { ... }
pub fn named_captures<'r, 't>(&'r self, text: &'t str) -> Option<NamedCaptures<'r, 't>> { ... }
...
}
impl Captures<'t> {
// Like today, but there are no more field `named` and methods `names` and `iter_names`
}
struct NamedCaptures<'r, 't> {
regex: &'r Regex,
captures: Captures<'t>
}
impl<'r, 't> NamedCaptures<'r, 't> {
fn captures(&self) -> &Captures<'t> { ... }
fn name(&self, name: &str) -> Option<&'t str> { ... }
fn iter(&self) -> SubCapturesNamed<'r, 't> { ... }
} The main thing that we are doing here is moving methods
|
@defuz I am honestly not a huge fan of increasing the API surface. IMO, there are already too many methods with too many decisions that the caller needs to make. I would be fine with an |
422a4b4
to
f9cc5d6
Compare
I made some updates:
The problems that I see now:
@BurntSushi what you think? I also do not quite understand the reason of duplicate |
use std::fmt; | ||
use std::ops::Index; | ||
#[cfg(feature = "pattern")] | ||
use std::str::pattern::{Pattern, Searcher, SearchStep}; | ||
use std::str::FromStr; | ||
use std::collections::HashMap; | ||
use std::sync::Arc; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit, but I try to order imports by:
- Category. (Group by
std
, then group by third party crates and then group by internal imports.) - Alphabetical.
@defuz Lookin' pretty good! I had some nits, and I would like to leave out breaking changes so that we can get the
Hmm, I don't think any of those fields are in the There may be some other refactoring to be done here, but I'd prefer holding off because my multi branch is going to mess around with it in quite a big way. (To support matching multiple regexes and reporting overlapping matches in a single pass.) |
@BurntSushi Perhaps we can release fixes for lifetimes before 1.0? According to RFC 1105 (API evolution) generalizing of function is minor change:
|
ce598d7
to
c9682d0
Compare
@defuz Right, but we're not just changing function parameters. It's adding a new lifetime parameter to a public type. In theory, it's possible to add a new type parameter if we give it a default, but you can't do that with lifetime parameters. My goal is to march towards 1.0 so that we can fix these and other issues. Hopefully it happens in the next couple months. |
03046e6
to
38fdf98
Compare
38fdf98
to
0144613
Compare
@BurntSushi Oh, you're totally right, it seems that I'm blindness to incompatible changes. :( I reverted backward incompatible changes but keep renaming of the lifetimes and clarifying in the documentation. Now everything looks good to me -- let me know if you have any comments. |
@defuz This looks exquisite! Really nice work! As soon as CI comes back this is good to merge. Thank you. :-) |
Storing mapping from names to group indices into Regex
The main goal I wanted to achieve is to get rid of creation
HashMap
that storing group name to group index mapping every time when creating a new instance ofCaptures
.Now this map is stored directly into
Regex
. ForRegex::Dynamic
it is stored asHashMap
and forRegex::Native
it is stored as sorted vectorVec<(&'static str, usize)>
, where the first element of the tuple is the group name, and the second element is group index.This change doesn't add backward incompatible changes, but changes lifetime parameters for some structs, like
Captures<'r, 't>
, where'r
refers toRegex
instance.Also, this PR is lacking documentation updates and I'm not sure about the correctness of lifetime parameters (I think we can reduce counts for some structs), so please consider this request more like a draft.
@BurntSushi what you think?