Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upadd regexp crate to Rust distribution (implements RFC 7) #13700
Conversation
This comment has been minimized.
This comment has been minimized.
UtherII
commented
Apr 23, 2014
|
Maybe a silly question, but wouldn't it make sense to put Unicode character classes support into the standard rust string library? |
This comment has been minimized.
This comment has been minimized.
|
Possibly. But I'm not sure. What would they be used for in Note that the matching algorithm depends on those Unicode classes to be available in sorted non-overlapping order, so that they are amenable to binary search. One possible path forward is to leave them in |
BurntSushi
referenced this pull request
Apr 23, 2014
Closed
Expand macros before looking for string literal #2
alexcrichton
reviewed
Apr 23, 2014
| //! | ||
| //! ## Matching one character | ||
| //! | ||
| //! <pre class="rust"> |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Apr 23, 2014
Member
We've generally tried to not use html tags in our documentation, this is done to not run the test/lexer over the contents? You may be able to get away with a notrust tag after three backticks.
This comment has been minimized.
This comment has been minimized.
BurntSushi
Apr 23, 2014
Author
Member
Actually, the reasoning is more insidious: I was unable to write a plain \ character in a fenced code block, so I resorted to the simpler solution of just writing the HTML. (I wasn't able to determine if this was a bug in the sundown parser or elsewhere...)
This comment has been minimized.
This comment has been minimized.
alexcrichton
reviewed
Apr 23, 2014
| //! | ||
| //! <pre class="rust"> | ||
| //! (exp) numbered capture group (indexed by opening parenthesis) | ||
| //! (?P<name>exp) named (also numbered) capture group (allowed chars: [_0-9a-zA-Z]) |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Apr 23, 2014
Member
You may want to double check this, but I don't think the html-escapes are necessary if this is in a backtick-enclosed block.
alexcrichton
reviewed
Apr 23, 2014
| html_root_url = "http://static.rust-lang.org/doc/master")] | ||
|
|
||
| #![feature(macro_rules, phase)] | ||
| #![deny(missing_doc)] |
This comment has been minimized.
This comment has been minimized.
alexcrichton
reviewed
Apr 23, 2014
| /// syntax extension. Do not rely on it. | ||
| /// | ||
| /// See the comments for the `program` module in `lib.rs` for a more | ||
| /// detailed explanation for what `regexp!` requires. |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Apr 23, 2014
Member
In an ideal world we could make each field as #[experimental] to have the compiler generate warnings. This is certainly ok for now though.
alexcrichton
reviewed
Apr 23, 2014
| } | ||
| } | ||
|
|
||
| impl Regexp { |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
alexcrichton
reviewed
Apr 23, 2014
| /// ```rust | ||
| /// # #![feature(phase)] | ||
| /// # extern crate regexp; #[phase(syntax)] extern crate regexp_macros; | ||
| /// # use regexp::NoExpand; fn main() { |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
alexcrichton
reviewed
Apr 23, 2014
| new.push_str(rep.reg_replace(&cap).as_slice()); | ||
| last_match = e; | ||
| } | ||
| new.push_str(unsafe { raw::slice_bytes(text, last_match, text.len()) }); |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Apr 23, 2014
Member
Did you see a good perf improvement from using unsafe slice_bytes methods? I would have figured that the allocation going on would dominate the bounds checking.
This comment has been minimized.
This comment has been minimized.
BurntSushi
Apr 23, 2014
Author
Member
I suspect you're right. I think I had that in there because I was mimicing the std replace, but that's not a good reason.
I just removed them and I cannot produce a benchmark that can tell the difference.
They've been removed now. Less unsafe. Woohoo.
alexcrichton
reviewed
Apr 23, 2014
| // The following is based on the code in slice::from_iter, but | ||
| // shortened since we know we're dealing with bytes. The key is that | ||
| // we already have a Vec<u8>, so there's no reason to re-collect it | ||
| // (which is what from_iter currently does). |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Apr 23, 2014
Member
Can you tag this with a FIXME pointing at #12938? You may also want to mention that this should look exactly like:
new.into_owned()
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
thestinger
Apr 23, 2014
Contributor
I don't understand why ~str is being returned here at all. It will have overhead when DST lands too, as the capacity is being lost and it will need to shrink the allocation. The convention in Rust is to return the type you have directly, rather than making non-free boxing choices for your caller that they can do themselves.
This comment has been minimized.
This comment has been minimized.
BurntSushi
Apr 23, 2014
Author
Member
Well, the reason why it's returning ~str is that str::str::replace also returns ~str, so I figured it'd be good to stay consistent.
Also, if it returned a StrBuf, it would be difficult for the caller to safely and efficiently transform it into a ~str. And I don't mean avoiding the shrinking, but avoiding the redundant collect in the from_iter implementation of ~[].
Could we leave it as ~str with the note to revisit it once DST happens?
This comment has been minimized.
This comment has been minimized.
thestinger
Apr 23, 2014
Contributor
The StrBuf type is more useful to the caller than ~str. It has the same functionality available along with the ability to be resized. The only reason std::str::replace returns ~str is that it's a legacy function. There's no need to be consistent with legacy design decisions pre-dating StrBuf.
This will be inefficient and unidiomatic when the DST changes happen too. You have a StrBuf internally, so you should be returning it to the caller to do as they wish with it. There's no advantage to discarding the capacity and forcing shrinking of the allocation. It's the same anti-pattern as ~T when the callee has T internally. If the caller wants to lose the excess capacity, they can do it themselves.
This comment has been minimized.
This comment has been minimized.
thestinger
Apr 23, 2014
Contributor
It's totally unnecessary because it can return StrBuf here. This avoids the unsafe code and will avoid other costs in the future from dropping the excess capacity. I don't think it's acceptable to sneak in unsafe code to push your view on the string/vector issue. I'm strongly against this and will do everything I can to stop this from landing in the current form. There's a clear and simple way to do it without any unsafe code and you're only in favour of using it because it enshrines returning ~str in the API.
This comment has been minimized.
This comment has been minimized.
alexcrichton
Apr 23, 2014
Member
You are singling out one use case of where StrBuf could be returned, but it isn't. In today's rust, it is consistent to return ~str, not StrBuf. Regardless of what your opinion about what it should be is, that is the current state of affairs.
If you would like to change values to returning StrBuf, then I recommend you do so in a separate issue or PR which discusses all return values, not just this one use case in an experimental library that hasn't been merged yet. Focusing on this one case is not very helpful.
I can understand you being strongly against return ~str where a StrBuf is available, but I do not believe that this is the PR to make that decision.
Please do not take my comments as an endorsement of returning ~str. That is a misconception of what I am saying. I would like to merge this library because it will have significant benefit to all users of rust. Blocking this over an ongoing discussion which has no current resolution is not really helping anyone.
This comment has been minimized.
This comment has been minimized.
thestinger
Apr 23, 2014
Contributor
I'm not singling out one use case. I'm reviewing a pull request adding new code to the standard library, and am strongly opposed to merging it while it has completely unnecessary unsafe code doing the opposite of optimization. There is no rationale for why this unsafe code is used rather than returning the StrBuf and there is no rationale for why a performance hit should be taken in the future to return it. The burden of proof rests on the person proposing we add more unsafe code, not me.
Please do not take my comments as an endorsement of returning
~str.
You already endorsed it by misrepresenting your view on the topic as the established consensus in your post to the mailing list.
That is a misconception of what I am saying. I would like to merge this library because it will have significant benefit to all users of rust. Blocking this over an ongoing discussion which has no current resolution is not really helping anyone.
It should not go in as long as it's going to great lengths with unsafe code to back up a minority opinion on the Vec<T> issue. It could simply return the StrBuf it has internally instead of using a convoluted unsafe workaround.
This comment has been minimized.
This comment has been minimized.
BurntSushi
Apr 24, 2014
Author
Member
The only reason it's returning a ~str is because other parts of std also return ~str even when a StrBuf could be returned. I made this decision because it's consistent. The unsafe code followed from that. I did not make this decision because of an opinion on the Vec<T> issue.
With that said, I'm happy to change to StrBuf. (I'd also change the regex-dna benchmark to use a StrBuf. This would actually avoid a copy for each replacement done, so it'd probably improve performance.)
I'm not familiar with your governance model, so I'll otherwise keep quiet. But I just wanted to make sure that my point of view was clear.
This comment has been minimized.
This comment has been minimized.
thestinger
Apr 24, 2014
Contributor
The rest of the standard library uses ~str because StrBuf never existed until recently and ~str used to be resizeable. If this case had a choice between ~str and StrBuf without requiring a conversion between them, then I wouldn't have mentioned anything.
However, at the moment it's going to great lengths to avoid simply returning the StdBuf that's inside the function. It's hurting performance in the caller and it's adding unnecessary unsafety.
alexcrichton
reviewed
Apr 23, 2014
| /// The `'a` lifetime refers to the lifetime of a borrowed string when | ||
| /// a new owned string isn't needed (e.g., for `NoExpand`). | ||
| fn reg_replace<'a>(&'a self, caps: &Captures) -> MaybeOwned<'a>; | ||
| } |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
alexcrichton
reviewed
Apr 23, 2014
| /// Returns an iterator over all the non-overlapping capture groups matched | ||
| /// in `text`. This is operationally the same as `find_iter` (except it | ||
| /// yields information about submatches). | ||
| pub fn captures_iter<'r, 't>(&'r self, text: &'t str) |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Apr 23, 2014
Member
If you have time (certainly not a blocker), could you add a small example to this and the above few methods? I think I understand how to use them, but examples are always super helpful!
(again, not a blocker, just a cherry on top)
This comment has been minimized.
This comment has been minimized.
alexcrichton
reviewed
Apr 23, 2014
| self.text.slice(s, e) | ||
| } | ||
| } | ||
| } |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Apr 23, 2014
Member
I'm curious if this type could one day implement the Index trait (basically leverage the foo[bar] syntax). Do you know which of these methods would be most appropriate for that?
It seems a bit odd to me that pos has the same return value for an empty match and an out-of-bounds index (and that kinds leaks over to at as well). Did you find precedent in other regex engines? Just something to think about, I'm ok with it as-is due to the len() method being available.
This comment has been minimized.
This comment has been minimized.
BurntSushi
Apr 23, 2014
Author
Member
Honestly, I stayed away from the Index trait because there seems to be a lot of buzz about it being removed or substantially changed. I figured that if we're going to live with Vec not having index notation, than we should probably also live with Captures not having it either. (For the time being.) It would be nice if it could support caps[1] and caps["name"] (corresponding to the at and name methods), but I don't think that's currently possible? I didn't dig too much.
RE pos: Yes, it is a bit odd. The only alternatives I can think of are to assert that the index is in range or to encode the failure in the type. I don't think I really checked precedent for this in other libraries. Python seems to raise an IndexError. Similarly for asking for a named capture group that doesn't exist.
At the moment, I'm thinking that handling out-of-bounds like the rest of the standard lib does might be the best way to go (and this would, e.g., be consistent with Python). My least favorite option is to encode the failure into the return type.
This comment has been minimized.
This comment has been minimized.
thestinger
Apr 23, 2014
Contributor
I don't think any new Index implementations should be added, because they're likely all going to need to be removed before landing the new traits. It's just going to create unnecessary churn.
This comment has been minimized.
This comment has been minimized.
alexcrichton
Apr 23, 2014
Member
Oh no, I do not think that this should implement Index now, I was merely wondering about the future and how this may leverage it.
Let's leave these as-is. This is why the crate is experimental, I was just musing.
alexcrichton
reviewed
Apr 23, 2014
| Some(i) => self.at(i).to_owned(), | ||
| } | ||
| }); | ||
| text.replace("$$", "$") |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Apr 23, 2014
Member
regexes used to implement regexes! (I thought bootstrapping a compiler was hard!)
This comment has been minimized.
This comment has been minimized.
alexcrichton
reviewed
Apr 23, 2014
| DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY | ||
| THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | ||
| (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | ||
| OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
alexcrichton
reviewed
Apr 23, 2014
| return StepMatch | ||
| } | ||
| Submatches => { | ||
| unsafe { groups.copy_memory(caps) } |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Apr 23, 2014
Member
Did you see that manual loops didn't optimize to a mempcy? I would expect something like this to optimize to a memcpy:
for (slot, val) in groups.mut_iter().zip(caps.iter()) {
*slot = *val;
}
This comment has been minimized.
This comment has been minimized.
BurntSushi
Apr 23, 2014
Author
Member
I did not realize that! Awesome. I can't seem to produce any significant and consistent change in benchmark results. I've removed all 4 unsafe blocks for using copy_memory.
alexcrichton
reviewed
Apr 23, 2014
| } | ||
| (false, Submatches) => unsafe { | ||
| t.groups.as_mut_slice().copy_memory(groups) | ||
| } |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
alexcrichton
reviewed
Apr 23, 2014
| v.set_len(elts); | ||
| ::std::ptr::copy_nonoverlapping_memory( | ||
| v.as_mut_ptr(), groups.as_ptr(), elts); | ||
| } |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Apr 23, 2014
Member
I'm curious why unsafe was used here rather than an iterator and a collect?
This comment has been minimized.
This comment has been minimized.
BurntSushi
Apr 23, 2014
Author
Member
Removed! Same as before: no difference in benchmarks when using, e.g., groups.iter().map(|x| *x).collect(). Awesome.
There are now only three uses of unsafe: two for using unitialized memory for sparse sets and one for reducing allocation in string replacement. (Which will hopefully be removed at some point.)
alexcrichton
reviewed
Apr 23, 2014
| return StepMatch | ||
| } | ||
| Submatches => { | ||
| unsafe { groups.copy_memory(caps.as_slice()) } |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Apr 23, 2014
Member
This seems to generate a good bit of unsafe blocks. Did you not see the common idioms optimized to essentially what the unsafe blocks are doing?
If necessary, it would be nice to have some comments about why unsafe is necessary in these locations.
This comment has been minimized.
This comment has been minimized.
alexcrichton
reviewed
Apr 23, 2014
| } | ||
| } | ||
|
|
||
| #[inline(always)] |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Apr 23, 2014
Member
We're generally trying to avoid inline(always) annotations, did run into problems if these were tagged with #[inline]?
This comment has been minimized.
This comment has been minimized.
BurntSushi
Apr 23, 2014
Author
Member
I have no idea why I used inline(always).
Changed all of them to inline. No perf difference. Fixed.
This comment has been minimized.
This comment has been minimized.
|
This looks even better than I thought it was going to be, amazing work, and thank you so much! |
This comment has been minimized.
This comment has been minimized.
|
Ah, one more small thing, we're trying to ensure that commits can be traced back to the RFC they implemented, so could you make sure that this shows up at the bottom of the first commit message (you can wait to rebase until later)
|
zkamsler
reviewed
Apr 23, 2014
| None => "", | ||
| Some(ref h) => { | ||
| match h.find(&name.to_owned()) { | ||
| None => "", |
This comment has been minimized.
This comment has been minimized.
zkamsler
Apr 23, 2014
Contributor
Could you use h.find_equiv(name) here in order to avoid allocating an owned string?
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton Thanks! And thanks very much for all your comments so far. Very helpful. I will make sure to add Also, when I rebase, won't it change my commit history? I assume I'll have to force push. (Just want to make sure that's what's expected.) |
This comment has been minimized.
This comment has been minimized.
|
@BurntSushi: Yeah, you'll have to force push. |
alexcrichton
reviewed
Apr 23, 2014
| // except according to those terms. | ||
|
|
||
| // ignore-stage1 | ||
| // ignore-cross-compile #12102 |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Apr 23, 2014
Member
The most recently landed PR actually makes this so ignore-cross-compile isn't necessary. The stack of commits will need to get rebase anyway, so just something to include in the rebasing.
alexcrichton
reviewed
Apr 23, 2014
| Threads { | ||
| which: which, | ||
| queue: unsafe { ::std::mem::uninit() }, | ||
| sparse: unsafe { ::std::mem::uninit() }, |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Apr 23, 2014
Member
How come this uninit is needed? It seems quite unsafe. If it's necessary for perf, can you add a comment explaining why?
This comment has been minimized.
This comment has been minimized.
BurntSushi
Apr 23, 2014
Author
Member
They are not needed, but these unsafe blocks actually do make a performance difference. The trick being used here is to represent sparse sets using uninitialized memory. It's described in more detail here: http://research.swtch.com/sparse
In this case, I can actually produce evidence. The first column is without unsafe and the second column is the code as you see it:
anchored_literal_long_match 264 ns/iter (+/- 2) 165 ns/iter (+/- 4)
anchored_literal_long_non_match 5867 ns/iter (+/- 8) 5822 ns/iter (+/- 45)
anchored_literal_short_match 232 ns/iter (+/- 8) 161 ns/iter (+/- 2)
anchored_literal_short_non_match 495 ns/iter (+/- 1) 424 ns/iter (+/- 3)
easy0_1K 1808 ns/iter (+/- 111) = 566 MB/s 1277 ns/iter (+/- 170) = 801 MB/s
easy0_32 330 ns/iter (+/- 2) = 96 MB/s 276 ns/iter (+/- 3) = 115 MB/s
easy0_32K 48878 ns/iter (+/- 650) = 670 MB/s 33323 ns/iter (+/- 968) = 983 MB/s
easy1_1K 1881 ns/iter (+/- 556) = 544 MB/s 1794 ns/iter (+/- 684) = 570 MB/s
easy1_32 391 ns/iter (+/- 93) = 81 MB/s 341 ns/iter (+/- 70) = 93 MB/s
easy1_32K 49735 ns/iter (+/- 2484) = 658 MB/s 48367 ns/iter (+/- 2864) = 677 MB/s
hard_1K 47163 ns/iter (+/- 268) = 21 MB/s 35070 ns/iter (+/- 169) = 29 MB/s
hard_32 1840 ns/iter (+/- 38) = 17 MB/s 1389 ns/iter (+/- 17) = 23 MB/s
hard_32K 1497950 ns/iter (+/- 5921) = 21 MB/s 1112845 ns/iter (+/- 2605) = 29 MB/s
literal 142 ns/iter (+/- 2) 131 ns/iter (+/- 0)
match_class 1403 ns/iter (+/- 6) 1394 ns/iter (+/- 6)
match_class_in_range 1448 ns/iter (+/- 3) 1347 ns/iter (+/- 4)
medium_1K 17310 ns/iter (+/- 255) = 59 MB/s 17475 ns/iter (+/- 166) = 58 MB/s
medium_32 888 ns/iter (+/- 29) = 36 MB/s 835 ns/iter (+/- 34) = 38 MB/s
medium_32K 542510 ns/iter (+/- 2595) = 60 MB/s 550793 ns/iter (+/- 2491) = 59 MB/s
no_exponential 274104 ns/iter (+/- 466) 278257 ns/iter (+/- 906)
not_literal 1104 ns/iter (+/- 5) 1080 ns/iter (+/- 4)
one_pass_long_prefix 548 ns/iter (+/- 5) 379 ns/iter (+/- 3)
one_pass_long_prefix_not 520 ns/iter (+/- 2) 409 ns/iter (+/- 2)
one_pass_short_a 1326 ns/iter (+/- 18) 1291 ns/iter (+/- 8)
one_pass_short_a_not 1945 ns/iter (+/- 21) 1585 ns/iter (+/- 29)
one_pass_short_b 913 ns/iter (+/- 3) 816 ns/iter (+/- 8)
one_pass_short_b_not 1242 ns/iter (+/- 7) 1401 ns/iter (+/- 9)
replace_all 1353 ns/iter (+/- 13) 1291 ns/iter (+/- 11)
My guess as to what's happening---particularly in the hard benchmarks---is that the mem::uninit saves a lot of time by not initializing threads that never need to be initialized, particularly with larger regexps (like the hard benchmark) with a lot of instructions.
I've included a justification in a comment and a link to Russ Cox's article.
This comment has been minimized.
This comment has been minimized.
|
Just a few small nits left, and otherwise this looks fantastic. After a rebasing, I think this is good to go! |
This comment has been minimized.
This comment has been minimized.
|
Argh, I didn't notice that when RFC 7 was accepted that it kept the name |
This comment has been minimized.
This comment has been minimized.
|
@BurntSushi There still remains the question of |
This comment has been minimized.
This comment has been minimized.
|
I prefer |
This comment has been minimized.
This comment has been minimized.
|
Rust convention is CamelCase for types. |
This comment has been minimized.
This comment has been minimized.
|
@seanmonstar Depends on whether you consider |
This comment has been minimized.
This comment has been minimized.
|
If we have |
This comment has been minimized.
This comment has been minimized.
Count me for that one, I think |
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
You meant the crate as |
This comment has been minimized.
This comment has been minimized.
|
@chris-morgan yes absolutely! Nice catch. Edited. |
This comment has been minimized.
This comment has been minimized.
|
OK, I've changed the name of the crate to |
bors
added a commit
that referenced
this pull request
Apr 24, 2014
sfackler
reviewed
Apr 24, 2014
| }; | ||
|
|
||
| /// For the `regex!` syntax extension. Do not use. | ||
| #[macro_registrar] |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
Apr 24, 2014
bors
added a commit
that referenced
this pull request
Apr 24, 2014
bors
added a commit
that referenced
this pull request
Apr 24, 2014
BurntSushi
added some commits
Apr 25, 2014
This comment has been minimized.
This comment has been minimized.
alexcrichton
commented on 09a8b38
Apr 25, 2014
|
r+ |
This comment has been minimized.
This comment has been minimized.
|
saw approval from alexcrichton |
This comment has been minimized.
This comment has been minimized.
|
merging BurntSushi/rust/regexp = 09a8b38 into auto |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
Apr 25, 2014
This comment has been minimized.
This comment has been minimized.
alexcrichton
commented on 7269bc7
Apr 25, 2014
|
r+ |
This comment has been minimized.
This comment has been minimized.
|
saw approval from alexcrichton |
This comment has been minimized.
This comment has been minimized.
|
merging BurntSushi/rust/regexp = 7269bc7 into auto |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
fast-forwarding master to auto = eea4909 |
bors
added a commit
that referenced
this pull request
Apr 25, 2014
bors
closed this
Apr 25, 2014
bors
merged commit 7269bc7
into
rust-lang:master
Apr 25, 2014
2 checks passed
BurntSushi
deleted the
BurntSushi:regexp
branch
Apr 25, 2014
This comment has been minimized.
This comment has been minimized.
|
Nice work @BurntSushi! |
1 similar comment
This comment has been minimized.
This comment has been minimized.
pyros2097
commented
Nov 21, 2015
|
Nice work @BurntSushi! |
BurntSushi commentedApr 23, 2014
Implements RFC 7 and will hopefully resolve #3591. The crate is marked as experimental. It includes a syntax extension for compiling regexps to native Rust code.
Embeds and passes the
basic,nullsubexprandrepetitiontests from Glenn Fowler's (slightly modified by Russ Cox for leftmost-first semantics) testregex test suite. I've also hand written a plethora of other tests that exercise Unicode support, the parser, public API, etc. Also includes aregex-dnabenchmark for the shootout.I know the addition looks huge at first, but consider these things:
regexp!) make up the rest.