Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Expose libregex's parsing/compiling internals #29

Closed
rust-highfive opened this issue Jan 25, 2015 · 4 comments · Fixed by #87
Closed

[feature request] Expose libregex's parsing/compiling internals #29

rust-highfive opened this issue Jan 25, 2015 · 4 comments · Fixed by #87

Comments

@rust-highfive
Copy link

Issue by andrew-d
Thursday Nov 06, 2014 at 19:54 GMT

For earlier discussion, see rust-lang/rust#18710

This issue was labelled with: in the Rust repository


I was looking at implementing something similar to this - a trigram-index-aided search. I'd rather not reproduce the code necessary to parse the regex, considering it already lives in libregex. It'd be nice if the parsing/compiling was exposed for use - perhaps similar to how Go does it with their regexp and regexp/syntax packages.

cc @BurntSushi

@andrew-d
Copy link

Should be closed - see #23

@BurntSushi BurntSushi reopened this Apr 1, 2015
@BurntSushi
Copy link
Member

So I've gotten more requests to do this and I'm also somewhat interested in it myself. Regex parsing is not easy and it has uses beyond the matching supported in crate.

@alexcrichton I know you were kind of down on this last time we talked about it, but what if we split the parsing into its own separate crate? (In this repository.) That way, we can keep the main regex crate stable even if the parser has a breaking change. The major version on the parser might increase more rapidly than regex, but I think I'd be OK with that since it would not be as widely used as regex proper.

@alexcrichton
Copy link
Member

Creating a separate crate for the parser sounds like a great idea to me!

@BurntSushi
Copy link
Member

I'm currently working on this in the new-parser branch. I'm rewriting the parser and tweaking the AST a bit before releasing it as a separate crate.

BurntSushi added a commit that referenced this issue May 25, 2015
This commit introduces a new `regex-syntax` crate that provides a
regular expression parser and an abstract syntax for regular
expressions. As part of this effort, the parser has been rewritten and
has grown a substantial number of tests.

The `regex` crate itself hasn't changed too much. I opted for the
smallest possible delta to get it working with the new regex AST.
In most cases, this simplified code because it no longer has to deal
with unwieldy flags. (Instead, flag information is baked into the AST.)

Here is a list of public facing non-breaking changes:

* A new `regex-syntax` crate with a parser, regex AST and lots of tests.
  This closes #29 and fixes #84.
* A new flag, `x`, has been added. This allows one to write regexes with
  insignificant whitespace and comments.
* Repetition operators can now be directly applied to zero-width
  matches. e.g., `\b+` was previously not allowed but now works.
  Note that one could always write `(\b)+` previously. This change
  is mostly about lifting an arbitrary restriction.

And a list of breaking changes:

* A new `Regex::with_size_limit` constructor function, that allows one
  to tweak the limit on the size of a compiled regex. This fixes #67.
  The new method isn't a breaking change, but regexes that exceed the
  size limit (set to 10MB by default) will no longer compile. To fix,
  simply call `Regex::with_size_limit` with a bigger limit.
* Capture group names cannot start with a number. This is a breaking
  change because regexes that previously compiled (e.g., `(?P<1a>.)`)
  will now return an error. This fixes #69.
* The `regex::Error` type has been changed to reflect the better error
  reporting in the `regex-syntax` crate, and a new error for limiting
  regexes to a certain size. This is a breaking change. Most folks just
  call `unwrap()` on `Regex::new`, so I expect this to have minimal
  impact.

Closes #29, #67, #69, #79, #84.

[breaking-change]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants