New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement regexes #174
Comments
Oh, and we'll want to do this behind a feature flag before it's nice and stable. |
Do you have a list of links I could browse to have any idea how to parse/execute regexps? |
The links here are relevant to the "execute" side of your question: Implementing Regular Expressions. But the expressions they discuss are traditional Unix-style regular expressions, not the Perl 6 style used here. For the "parse" side then, we need a grammar of Perl 6 grammars (modulo whatever differences there end up being between those and 007 grammars). The potentially big issue in applying the "Implementing" link I gave is that Perl 6 grammars are strictly more expressive than traditional regular expressions -- that is, there are patterns you can specify in a Perl 6 grammar that you cannot specify in a true regular expression. How to implement them for 007 is going to depend where 007 regexps fall on that scale of expressiveness. |
The issue is not parsing regexps. That's "easy" to do. Dealing with the matching itself sounds a bit harder (backtracking etc. Though not too hard standing alone) |
I don't see us needing backtracking, really. On Aug 17, 2016 20:16, "ven" notifications@github.com wrote:
|
Allow me to add to @eritain++'s tips by wholeheartedly recommending Regular Expression Matching Can Be Simple And Fast. It makes excellent points about performance (which are relevant to us) and backtracking (or the lack thereof), and it has actual implementation which is clear and helps understanding. An apposite addendum to that text is that I imagine 007 regexes as favoring simplicity of implementation. That is, we might not want to head down the NFA route with regexes themselves. We can take some amount of performance hit if it gives us a simpler regex implementation. "Oh, a We do want to go the whole NFA hog with grammars, though. It's a later concern, but it's worth pointing out already now. In the end, this will also gain us simplicity; thinking properly in terms of language braids essentially pushes one towards an NFA-based point of view. The grammar is a directed graph on which things match; sometimes nodes come and go due |
By the way, nice volunteering! Looking forward to collaborating on this. In the interests of encouraging @vendethiel and like-minded people, here's what I'm looking for in a first PR:
|
Right. In this case, it does not look too different from what I did in faux-combinator... It "simply" works on text rather than tokens. (it doesn't really have backtracking. Though it has a simple system of storing the parser state before trying an optional match to restore them if the match fails somewhere along). Exemple of faux-combinator/perl5: sub subrule($parser) {
$parser->one_of({ $parser->expect('a'); }, { $parser->any_of('b'); });
$parser->try({ $parser->expect(','); });
}
my $parser = FauxCombinator::Parser::new([@tokens]);
$parser->expect('*');
$parser->many_of(&subrule); which would parse
I have so much on my plate ($work, $school, $memoir, and my little rust vm) I'm certainly not going to specify an ETA. But it does look like an interesting project! |
I'm tempted to close this one now, as we technically have regexes in 007 (under a feature flag). The rest is just a Simple Matter of Programming. 😄 Alternatively, I can keep it open until something like |
Ok, let's to
|
I'd say
|
Then later, we can add and it means to match a regex you "just" need an |
Maybe 007-ified names for the above would be:
And
Just suggestions, but I think these fit the implicit naming scheme we have. |
Yes, that was more about structure than actual names :). Seems good to me. |
Seeing as we're reaching some complexity level, should this be its own |
Hm, not as part of this issue, I'd say. Usually I'd jump at suggestions to bring out more structure and separating into files, but right now the force keeping all the Q types in the same file is pretty strong. |
Duely noted :-). I'll just be adding some fences (... not). |
I got some stuff done: https://gist.github.com/vendethiel/4ce172f656bdec64c1d2b09857d382c6 Two questions:
|
I presume so. A
Sorry, which file are we talking about? 😄 |
Sorry, Val.pm |
Right now in Val.pm, there are two strategies for using Q:: types:
If we need something more than that, I'd consider it a signal that something is off in the design and needs us to go back to the drawing-board a little. |
Well, I have a |
My design brain is missing, but — try it and see? My biggest objection for now is I don't see a bit use of such I seem to vaguely recall that nqp solves this by having a single regex node type that can take on all possible regex-fragment roles, like a chameleon. But don't quote me on that. |
Made a lot of progress today. I didn't decide yet on a dispatch mechanism. For now the code uses This regex parses and works: I'll have lots of cleanup to do, and I need to writeup some kind of helper to test those (or maybe just test many regexes in a single 007 test). |
I didn't yet implement identifier or call, I need to take a look at how .eval on an identifier works. |
Wohoo! Just want to say how nice it is to see progress happen on this! 🎉 Getting to where we can do |
For reference: #239. Will push alternations shortly. |
#239 has merged. We are almost at a point (being able to implement #163) where we can close this one. What's missing concretely:
I started experimenting with |
It's also probably missing captures. |
I have a feeling the best way to proceed from here is "pull-based", that is, start from a use case such as #163, run it, find the most proximal reason it doesn't work, implement that, lather, rinse, repeat. |
Or we set the sights a little higher and just make a list of the features missing, and pick a sensible dependency order to tick them off. To wit:
I guess that's it. |
I hereby commit to championing this issue through to completion. I have this idea that I can prototype things in pure 007, and then bring them back in a nice way in the implementation. |
I'd like to write a small note about three "modes" of regex matching that I've noticed:
So, unexpectedly, invoking a regex on a bit of text seems to return a |
We're going to need them soon if we want to do
is parsed
stuff.Let's start very simple. What are the things needed to get #163 to work?
/ "abc" /
/ <EXPR> /
(Update: but see also Regex rule call wishlist #348.)/ <.ws> /
(or something)There are various other things I foresee we need pretty soon after that (quantifiers, code blocks, maybe assertions, lookaheads/lookbehinds?), but the above is a modest start. I'm not sure we'll ever need backtracking, since people will mostly be implementing parser fragments with these regexes. At the very least, backtracking should be off by default.
But some care would also need to go into the APIs around a new
Val::Regex
type.infix:<~~>
andinfix:<!~~>
for this... but that'd take it outside of its current role, which is to type-match. It'd turn into a more generic smartmatch operator. Maybe not such a bad thing..matcher
which creates an iterator-likeMatcher
object. We're not doing that.infix:<~~>
currently returns for type matches. (Modulo Introduce a Val::Bool type #157.) But that won't allow us to get capture group information out of the match...Match
object? But if we want to useinfix:<~~>
inif
statements (and who doesn't?), then someMatch
objects would need to be falsy, and there is currently no boolification protocol to allow that. Or should we returnNone
when things don't match? (Python does.)The text was updated successfully, but these errors were encountered: