Merge subpattern references #18

Open
wants to merge 134 commits into
from

2 participants

@nbtrap

Subpattern references enable the matching of self-similar strings by
way of recursion. Unlike backreferences, which refer to the string
matched by a register, subpattern references refer to the pattern
contained within the register and cause the regex engine to recurse,
as though by an actual function call, to the referenced subpattern.

SYNTAX

A subpattern reference node has the form

(:SUBPATTERN-REFERENCE <ref>)

where is a positive fixnum denoting a register number or a
string or symbol denoting a register name.

Using the Perl syntax, a subpattern reference looks like

(?N)

or
(?&NAME)

where N is a positive (decimal) integer and NAME is a register name.

API CHANGES

There are no API changes.

KNOWN ISSUES

Perl Incompatibilities
======================
The semantics of subpattern references (or "sub calls") in Perl are
not well defined. In particular, as of version 5.19.9, the
interaction between subpattern references and backreferences is
inconsistent. This issue was recently raised on the p5p mailing
list, and the Perl devs seem to be seriously considering adopting
the semantics implemented here. See
https://rt.perl.org/Public/Bug/Display.html?id=121299 for details.

Embedded Modifiers
==================
The interaction between subpattern references and embedded modifiers
(e.g. :CASE-INSENSITIVE-P) is undefined for now and will be
addressed in a future release.

AllegroCL Compatibility Mode
===========================
So far as I know, the AllegroCL compatibility mode (enabled by
adding :USE-ACL-REGEXP2-ENGINE to FEATURES before compiling) does
not support this feature.

Other Bugs
==========
Several outstanding bugs are known to at least indirectly affect
subpattern references. Cf. #17 and #12, for example.

IMPLEMENTATION DETAILS

During the match phase, the subpattern reference closure calls the
register closure, passing it an extra argument: the match
continuation.

When the register closure sees that it has been called with an extra
argument, it knows that it has been entered via subpattern
reference. At this point, it saves the state of the local
registers' offsets and creates new dynamic "bindings" for them.
Then it calls the register's inner matcher, restoring the register
offsets state upon return therefrom. If the inner matcher has
succeeded, the subpattern reference's continuation is called.

The presence of one or more subpattern references precludes certain
optimizations. However, the performance for existing code (i.e.,
for regular expressions not containing subpattern references) should
be unaffected hereby.

OTHER CHANGES

The testing code has been overhauled. Of note:

1. The Perl script that generates many of tests has been modified,
among other things, to print results for as many capture groups as
are defined by each regex, but no more.  (The purpose of this was
to support tests involving arbitrarily many capture groups.)
Thus, much of perltestdata file seems to have changed, but it's
mostly superficial.

2. Perl tests are now run with *ALLOW-NAMED-REGISTERS* bound to T
when the regex contains one or more named registers.

3. Many more tests have been added to *TESTS-TO-SKIP*.  These are
by and large a result of Perl's undefined behavior vis-a-vis
subpattern references.
nbtrap added some commits Dec 27, 2013
@nbtrap nbtrap Add RECURSIVE-SUBPATTERN class. 90ebc6b
@nbtrap nbtrap Add lexer tokens for (?R) and (?&NAME) sequences. bf19675
@nbtrap nbtrap Use CASE option instead of IF expression. 9bf3bf0
@nbtrap nbtrap Optionally make PARSE-REGISTER-NAME-AUX parse from within (?&name) co…
…nstruct.
35c05dd
@nbtrap nbtrap Make (?<digit>+) and (?&NAME) constructs <legal-token>s. 5adc50c
@nbtrap nbtrap Parse subpattern references correctly. ca8f723
@nbtrap nbtrap Make note to support (?0) and (?R). da2bba6
@nbtrap nbtrap Rename RECURSIVE-SUBPATTERN -> SUBPATTERN-REFERENCE. 6a493f8
@nbtrap nbtrap Beginnings of code to check for invalid subpattern references. 95c400a
@nbtrap nbtrap Add to doc comment of CONVERT-AUX. b4697b6
@nbtrap nbtrap Add shell of method for handling subpattern reference parse-tree nodes. ceecaf8
@nbtrap nbtrap Give flesh to CONVERT-COMPOUND-PARSE-TREE.
Be sure to keep track of named subpattern references as well as the
highes numbered subpattern reference encountered.
561228d
@nbtrap nbtrap Rename variable NAMED-SUBPATTERN-REFS-SEEN -> NAMED-SUBPATTERN-REFS. 82fdd76
@nbtrap nbtrap Convert named subpattern refs to numbered subpattern refs inside CONV…
…ERT.

Also, keep track of which registers have been referenced by number.
514cb45
@nbtrap nbtrap Return a fifth value from CONVERT, namely, the list of numbers of sub…
…patterns referenced in the regex.
a2a9632
@nbtrap nbtrap Create binding of special variable REFERENCED-REGISTER-MATCHERS-PLIST. 8910783
@nbtrap nbtrap Define the closure that matches subpattern references, and modify the…
… register closure.

This required several things that may not have been necessary and will
have to be revisited.  First of all, for every register, we now create
two inner matchers: one that matches the contents of the register and
what follows the register, and one that only matches the contents of
the register.  Also, we now stop accumulating into STARTS-WITH once we
encounter a register or subpattern reference.

With this patch, subpattern references seem to work for the most part.
They do not yet work with repetitions.
eb3d9ec
@nbtrap nbtrap Define REGEX-LENGTH on SUBPATTERN-REFERENCE. ce1b0fe
@nbtrap nbtrap Define COPY-REGEX and COMPUTE-OFFSETS on SUBPATTERN-REFERENCE.
At this point, one thing that doesn't work quite right is the
determination of register offsets for registers accessed indirectly by
subpattern references.  For example:

  (cl-ppcre:scan "(\\([^()]*((?1)\\)|\\)))" "((()))")

says that the second register is at position (3, 6), though it should
be (1, 6).  Fixing this will require binding a special variable from
subpattern reference closures that tells register closures not to
touch the register offsets.
972dba5
@nbtrap nbtrap Be sure not to touch register offsets when matching a register indire…
…ctly from a subpattern reference.

With this patch, the following invocation:
  (cl-ppcre:scan "(\\([^()]*((?1)\\)|\\)))" "((()))")
gives the correct offset values for the second register as (1,6).

One problem that remains is the danger of infinite recursion during
backtracking.  The following invocation:
  (cl-ppcre:scan "(?1)(?2)(a|b|(?1))(c)" "acba")

causes a stack overflow because the second (?1) is called endlessly
during backtracking without the match position advancing through the
string.  Such behavior may be able to be remedied by having the
subpattern reference's closure keep track of where in *STRING* it has
been called before.
b7ab328
@nbtrap nbtrap Don't backtrack through subpattern references ad infinitum.
This is going to be reverted immediately, since apparently Perl isn't
smart enough to do this and will itself overflow the stack.
69f0d7c
@nbtrap nbtrap Revert "Don't backtrack through subpattern references ad infinitum."
This reverts commit 69f0d7c.
55a48e0
@nbtrap nbtrap Use COND instead of multi-branch IF. 38986bd
@nbtrap nbtrap Merge branch 'master' into subpattern-references d79af78
@nbtrap nbtrap Add tests for subpattern references.
1634 and 1635 currently don't work.
5de7565
@nbtrap nbtrap Remove unused variable declaration. fa60aff
@nbtrap nbtrap Skip end of string optimizations with subpattern references. 09625d5
@nbtrap nbtrap Actually, only skip the "skip" optimization when optimizing ends of s…
…trings in patterns containing subpattern references.
a4e1eaa
@nbtrap nbtrap Add forgotten type to ETYPECASE expression. e100f55
@nbtrap nbtrap Add more tests for subpattern references.
Current, the following tests fail: 1638, 1639, 1641, 1642, 1643, 1644,
1645, 1646.
ada178a
@nbtrap nbtrap Use SETF instead of SETQ. e7d4eff
@nbtrap nbtrap Make sure INSIDE-SUBPATTERN-REFERENCE gets set to NIL when NEXT-FN is…
… not called.

The NEXT-FN (or OTHER-FN) parameter is only called when the register's
inner matcher succeeds.  When the inner matcher failed,
INSIDE-SUBPATTERN-REFERENCE was not being unset, though it should have
been.
0309c76
@nbtrap nbtrap Number tests correctly. 806857a
@nbtrap nbtrap Create a temporary set of registers for each pass through a subpatter…
…n reference.

This patch fixes tests 1643-1646.
047c17e
@nbtrap nbtrap Detect whether perl tess regexes contain named registers. 22c26a8
@nbtrap nbtrap Add some additional (trivial) cases to ETYPECASE in CONVERT-NAMED-SUB…
…PATTERN-REFS.
3bbe139
@nbtrap nbtrap Add some tests for named subpattern references. b05a808
@nbtrap nbtrap Get rid of unused INSIDE-SUBPATTERN-REFERENCE variable. c70985f
@nbtrap nbtrap Only compute INNER-MATCHER-WITHOUT-NEXT-FN when it's needed.
It's actually not clear that this is faster, though it probably is.
cdb513c
@nbtrap nbtrap Remove FIXME comment from closures.lisp, and rename one of the variab…
…les.

There is another way to go about this, but there's really no telling
which way would be faster.  The advantage to contructing two matchers
is that it only happens once and compilation time.
4fe74d0
@nbtrap nbtrap Remove FIXME comment from CREATE-MATCHER-AUX for SUBPATTERN-REFERENCE.
The optimization suggested there is trivial.
56243a1
@nbtrap nbtrap Remove more FIXME comments.
Perl does not allow whitespace around numbers/names in subpattern
references.
8591bd4
@nbtrap nbtrap Make named subpattern references refer to the first subpattern with t…
…he given name, as in Perl.

This currently doesn't work for forward references.  E.g.:

(let ((ppcre::*allow-named-registers* t))
  (ppcre:scan (ppcre:parse-string "(?&foo)(?<foo>f)(?<foo>o)") "ffo"))

returns NIL.
8572c7a
@nbtrap nbtrap Don't declare type of variable that may be either FUNCTION or NIL. 4d1c609
@nbtrap nbtrap Add two tests (1652 and 1675) for testing forward subpattern referenc…
…es for special case.

The special case is where the forward reference refers to the
beginning of the constant end of string.  These tests currently fail.
e4abae6
@nbtrap nbtrap Reorder the subpattern reference tests.
The tests are now so ordered that every numbered subpattern reference
test is followed by a corresponding named subpattern reference test.
36acbdd
@nbtrap nbtrap Don't use START-OF-END-STRING-P optimization when subpattern referenc…
…es are present.
87c9afc
@nbtrap nbtrap Add tests for patterns containing illegal whitespace in subpattern re…
…ferences.
6620354
@nbtrap nbtrap Bind *ALLOW-NAMED-REGISTERS* to NIL before running simple tests.
If the user has bound this variable to T, the simple tests will not
all pass.
09ad0d4
@nbtrap nbtrap Add two more tests for dealing with START-OF-END-STRING-P and subpatt…
…ern references.
68d4215
@nbtrap nbtrap Add two more tests.
These tests make sure that CL-PPCRE uses the correct named register
when multiple registers have the same name.
ffca226
@nbtrap nbtrap Remove FIXME comment about disambiguating named subpattern references.
This question has been answered an implemented in a previous commit.
e9c94db
@nbtrap nbtrap Change FIXME comment in COMPUTE-OFFSETS method on SUBPATTERN-REFERENCE. d7d8941
@nbtrap nbtrap Remove another FIXME comment.
As with their offsets, determing the minimum lengths subpattern
references is only feasible for patterns that don't really need
subpattern references to begin with.
c5e06f7
@nbtrap nbtrap Remove the FIXME comment from the REGEX-LENGTH method for SUBPATTERN-…
…REFERENCE.
e04ccda
@nbtrap nbtrap Remove unused sub from perltest.pl eval. 1f1bba8
@nbtrap nbtrap Make perltest.pl handle arbitrarily large and variable numbers of reg…
…isters.

The perltestdata file produced by this updated perltest.pl only
reports results for registers that are actually contained in the
corresponding pattern.
0b70e23
@nbtrap nbtrap Add two tests for double- and triple-digit register number subpattern…
… references.
7bd7c22
@nbtrap nbtrap Remove comment about possibly supporting (?0) and (?R).
These are not worth the time, especially considering that they're
trivially easy to simulate.
f4f8137
@nbtrap nbtrap Remove unneeded variable NAMED-REG-SEEN.
We know a named reg has been seen when one or more of the elements of
REG-NAMES is true.
a37bf58
@nbtrap nbtrap Add some failing tests taken from the pcre distribution. 8678c2b
@nbtrap nbtrap Add HAS-SUBPATTERN-REF-P function to convert.lisp. 41cbc80
@nbtrap nbtrap Don't needlessly stop accumulating for string-beginning optimization.
Specifically, once we see a register, continue building the constant
string beginning unless the regex contains a subpattern reference.
8cdc21b
@nbtrap nbtrap Remove specific test references from comment.
The test numbers are no longer correct and are subject to change further.
7c501f0
@nbtrap nbtrap Fix indentation. e4dfb25
@nbtrap nbtrap Don't create a separate matcher for matching registers from subpatter…
…n references.

Instead, every time we descend into a register from a subpattern
reference, we first push a value onto a register-specific list and pop
it off once we return.  When STORE-END-OF-REG sees that there is a
value on the list, it knows we entered the register from a subpattern
reference and doesn't try to match the part of the regex following the
register.

It's hard to say whether this improves or degrades performance and
readability.  But it does seem simpler.
49878b0
@nbtrap nbtrap Use IF instead of COND for simple condition. da8f474
@nbtrap nbtrap Add some more tests that check for correct backtracking through subpa…
…ttern references.

These currently fail.
2805231
@nbtrap nbtrap Backtrack correctly into subpattern references.
This enables correct matching for calls such as:

  (ppcre:scan "(?1)(o(?1)?)" "oo")
5c0f673
@nbtrap nbtrap Clean up CREATE-MATCHER-AUX method for REGISTER.
Add some comments, and rename a variable.
987c5b4
@nbtrap nbtrap Clarify and remove some comments. bab5e70
@nbtrap nbtrap Add some more subpattern reference tests.
These were taken from PCRE's file testdata/testinput2 with slight
modifications.
e32cf51
@nbtrap nbtrap Add more tests for subpattern references.
These tests were taken from PCRE's testdata/testinput2 with slight
modifications.
6f0ad13
@nbtrap nbtrap Add some more tests for subpattern references.
These were taken from PCRE's testdata/testinput2 with slight modifications.
1b51a27
@nbtrap nbtrap Remove FIXME comment from CONVERT-COMPOUND-PARSE-TREE method on :SUBP…
…ATTERN-REFERENCE.
82414b7
@nbtrap nbtrap Rename REFERENCED-REGISTER-MATCHERS -> REGISTER-MATCHERS. da706f6
@nbtrap nbtrap Reformat comments in the style of other comments in the package.
(No periods, no capitalized sentences.)
5f7af85
@nbtrap nbtrap Reformat comments added in changes adding subpattern references to 70…
… columns.
6d1bee4
@nbtrap nbtrap Add FILTER and WORD-BOUNDARY to the default ETYPECASE clause in CONVE…
…RT-NAMED-SUBPATTERN-REF.

I believe this exhausts all possibilities that need to be covered.
2852e76
@nbtrap nbtrap Convert ETYPECASE -> TYPECASE, since all possibilities are accounted …
…for.

Also, add a test for using subpattern references with the :FILTER
feature.
7d1ed2b
@nbtrap nbtrap Add a test for handling back-references within subpattern references …
…referring to registers outside the referenced subpattern.

This currently fails.  When entering into a subpattern reference, Perl
only creates new registers for those capture groups located inside the
subpattern reference.  The capture groups outside the subpattern
reference retain their values.
eed3e27
@nbtrap nbtrap Add some more tests verifying correct behavior of subpattern- and bac…
…k-reference cooperation.
10a6af6
@nbtrap nbtrap Begin transitioning to the new register offsets storage model.
Instead of storing the beginning/end of register offsets directly in
the corresponding arrays, we will now store lists where the car of
each list is the offset.  The reason for this is that when we descend
into a subpattern reference, instead of making a new array where all
offsets are reset, we will push new offsets onto the front of the
lists corresponding only to those registers contained within the
register we're entering via the subpattern reference.  In other words,
we'll only be resetting certain registers.  This will fix tests 1783
and 1785.

Which registers contain which other registers will be computed during
regex compilation.
c3c5e06
@nbtrap nbtrap Continue transitioning to the new register offsets storage model. 3891ee5
@nbtrap nbtrap Add CONTAINING-REGISTERS slot to REGISTER class. 2b40341
@nbtrap nbtrap Compute CONTAINING-REGISTERS slot of REGISTER instances. 6c6771a
@nbtrap nbtrap Continue transition to new register offsets storage model.
At this points, I had expected everything to work as well as with the
old model, but there are still many tests that are failing, so
apparently there are some bugs left to iron out.
474b91a
@nbtrap nbtrap Add some test cases that illumine one of the current register offsets…
… storage model's defects.
d938c89
@nbtrap nbtrap Don't store possible register offset of register entered via subpatte…
…rn reference.

This was causing many tests to fail with bizarre error messages.  It
wasn't caught using the old register offsets storage model since the
array the value was beging stored in was a temporary array to begin
with.

With this commit, the only tests run by RUN-ALL-TESTS that fail are
those that Perl itself gets wrong.
ffeff74
@nbtrap nbtrap Tidy up the comments in CREATE-MATCHER-AUX method specialized on REGI…
…STER.
c3a7c2a
@nbtrap nbtrap Rename CONTAINING-REGISTERS -> SUBREGISTERS. 1627892
@nbtrap nbtrap Fix test 1798.
It was missing the definition of regThree.
9d63ac7
@nbtrap nbtrap Remove some redundant code in CREATE-MATCHER-AUX specialized on REGIS…
…TER.

The redundant code was moved into LABELS function definitions.
d517e82
@nbtrap nbtrap Inline POP-OFFSETS and PUSH-OFFSETS. bfe5c92
@nbtrap nbtrap Use LOOP instead of MAPCAR/MAPC for pushing/popping register offsets. eb26c38
@nbtrap nbtrap Disable some tests in test/perltestdata, but add them to test/simple.
These are tests that even Perl gets wrong.
1da422d
@nbtrap nbtrap Record SUBREGISTER-COUNT instead of a list of SUBREGISTERS.
The numbers of registers nested within a register must be contiguous
with each other and with that of the parent register, so there's no
need to compute a list of subregisters--we just need to know how many
are nested within, and we can compute the rest therefrom.
0b1b87e
@nbtrap nbtrap Merge branch 'nested-refs' into subpattern-references d3a1dee
@nbtrap nbtrap Add two more tests that currently fail.
Each fails for a different reason.  The first fails because it matches
where Perl does not.  (It's not clear whose bug this is.)  The second
fails because we're not waiting until the regex has finished compiling
before validating register names.
8db37a9
@nbtrap nbtrap Add two more tests that fail.
These are like the previous two tests added in the previous commit,
except they deal with "self-referential" backreferences rather than
forward backreferences.
f08f4e2
@nbtrap nbtrap Document subpattern references in the html documentation. f064a0c
@nbtrap nbtrap Add subpattern reference commentary to docs on *ALLOW-NAMED-REGISTERS*
Also, add some FIXME comments to return to later.
0de65d3
@nbtrap nbtrap Restore the original docstrings to *REG-STARTS*, etc.
This begins a third attempt at a register offsets storage model.

The problem with the current model is that it changed the
implementation of *REG-STARTS*, etc., variables that I didn't realize
were part of the api.  This latest attempt will restore the original
semantics to those variables and store offset stacks (for recursive
subpattern references) in separate variables.
adb9156
@nbtrap nbtrap Add three new special variables for holding the register offsets stacks. 07c8d92
@nbtrap nbtrap Bind new special variables when scanning. b1d2ec7
@nbtrap nbtrap Make *REG-STARTS*, etc., have NIL as initial value instead of (list n…
…il).
ad4d39f
@nbtrap nbtrap Access offset values directly in *REG-STARTS*, etc., instead of thoru…
…gh CAR.
00c2ca5
@nbtrap nbtrap Finish implementing the newest register offsets storage model. e64e5ba
@nbtrap nbtrap Declare type once instead of using THE repeatedly. 9aaa4da
@nbtrap nbtrap Merge branch 'nested-refs2' into subpattern-references 92b459e
@nbtrap nbtrap Add more tests to *TESTS-TO-SKIP*
These are tests that CL-PPCRE gets right but Perl gets wrong, except
fot 1812, which is caused by validating backreference names too soon.
f77686c
@nbtrap nbtrap Move more tests (1809-1812) into test/simple.
These are tests that Perl gets wrong.
f50d7a6
@nbtrap nbtrap Add url reference to comment about Perl mishandling certain construct…
…ions.
53cdefe
@nbtrap nbtrap Add FIXME comment to come back to later.
Does Perl try to match more than one capture group if more than one
have the same name?
2f4c042
@nbtrap nbtrap Add more tests for subpattern-/back-reference cooperation.
These currently fail due to Perl's incorrect behavior.  See Perl's RT
8f838eb
@nbtrap nbtrap Move more tests from test/perltestdata into test/simple. 2fc37a6
@nbtrap nbtrap Create new bindings for the referenced register upon entry to subpatt…
…ern-reference.

When entering a register x via subpattern-reference, the registers
local to x receive new dynamic bindings, which shadow the old bindings
for the duration of the subpattern call.  Previously, "local" did not
include the register itself--x in this case.  With this patch, the
referenced register now receives a new binding as well.

It's not entirely clear that this is the appropriate behavior.  In a
regex like "(.\1?)(?1)", the back-reference to '\1' now will always
fail, rather than potentially matching according to what was matched
in the first pass through the first register.
9b97c91
@nbtrap nbtrap Remove FIXME comment from convert.lisp.
The issue referred to there is one of the the subjects of #17.
46a078c
@nbtrap nbtrap Use "recurse" instead of "refer" to describe the action associated wi…
…th subpattern references.
9b0a9c0
@nbtrap nbtrap Convert calls to PUSH-OFFSETS and POP-OFFSETS to fewer calls to more …
…specific functions.
81be1e4
@nbtrap nbtrap Update comment. a941400
@nbtrap nbtrap Clarify comments in CREATE-MATCHER-AUX method specialized over regist…
…ers.
846c22e
@nbtrap nbtrap Get rid of useless declaration. b5b406c
@nbtrap nbtrap Wrap docstrings to 70 columns. 72d020e
@nbtrap nbtrap Fix lexical/special binding bug.
This went undetected for so long because of a bug in SBCL (and ECL,
apparently).  The way it was written, it shouldn't have worked, but it
did--except on CLISP, which is how the bug was caught.
498e3f0
@nbtrap nbtrap Merge tag 'v2.0.7' into subpattern-references 544fcd1
@nbtrap

Tested on SBCL, ECL, and CLISP.

The documentation says that subpattern refs were added in version 2.1.0, so you might want to change that if you don't bump the version number like that.

I wrote the pull request as plain text (ignoring GitHub's "markdown") so it could be used as the merge's commit message.

@hanshuebner hanshuebner commented on an outdated diff Mar 1, 2014
closures.lisp
- (funcall next-fn start-pos)))
+ (declare (special register-matchers))
+ (let ((num (num register))
+ (subregister-count (subregister-count register))
+ ;; a place to store the next function to call when we arrive
+ ;; here via a subpattern reference
+ subpattern-ref-continuations)
+ (declare (fixnum num subregister-count)
+ (list subpattern-ref-continuations))
+ (labels
+ ((push-registers-state (new-starts new-maybe-starts new-ends)
+ (declare (list new-starts new-maybe-starts new-ends))
+ ;; only push the register states for this register and registers
+ ;; local to it
+ (loop for idx from num upto (+ num subregister-count) do
+ (let ()
@hanshuebner
Common Lisp Libraries by Edi Weitz member
hanshuebner added a line comment Mar 1, 2014

Remove this gratitious let.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@hanshuebner hanshuebner and 1 other commented on an outdated diff Mar 1, 2014
closures.lisp
+ ;; local to it
+ (loop for idx from num upto (+ num subregister-count) do
+ (let ()
+ (declare (fixnum idx))
+ (push (svref *reg-ends* idx) (svref *reg-ends-stacks* idx))
+ (setf (svref *reg-ends* idx) (pop new-ends))
+ (push (svref *regs-maybe-start* idx) (svref *regs-maybe-start-stacks* idx))
+ (setf (svref *regs-maybe-start* idx) (pop new-maybe-starts))
+ (push (svref *reg-starts* idx) (svref *reg-starts-stacks* idx))
+ (setf (svref *reg-starts* idx) (pop new-starts)))))
+ (pop-registers-state ()
+ ;; return the state that was destroyed by this restore
+ (let (old-starts old-maybe-starts old-ends)
+ (declare (list old-starts old-maybe-starts old-ends))
+ (loop for idx from (+ num subregister-count) downto num do
+ (let ()
@hanshuebner
Common Lisp Libraries by Edi Weitz member
hanshuebner added a line comment Mar 1, 2014

Extra let, lose it.

@nbtrap
nbtrap added a line comment Mar 1, 2014

The point of those LETs is so I don't have to use THE every time I reference IDX. Do you prefer that I use LOCALLY, use THE multiple times, or get rid of the type declaration for IDX altogether?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@hanshuebner hanshuebner commented on an outdated diff Mar 1, 2014
closures.lisp
+ (let ()
+ (declare (fixnum idx))
+ (push (svref *reg-ends* idx) old-ends)
+ (setf (svref *reg-ends* idx) (pop (svref *reg-ends-stacks* idx)))
+ (push (svref *regs-maybe-start* idx) old-maybe-starts)
+ (setf (svref *regs-maybe-start* idx) (pop (svref *regs-maybe-start-stacks* idx)))
+ (push (svref *reg-starts* idx) old-starts)
+ (setf (svref *reg-starts* idx) (pop (svref *reg-starts-stacks* idx)))))
+ (values old-starts old-maybe-starts old-ends)))
+ ;; STORE-END-OF-REG is a thin wrapper around NEXT-FN which
+ ;; will update register offsets after the inner matcher has
+ ;; succeded
+ (store-end-of-reg (start-pos)
+ (declare (fixnum start-pos)
+ (function next-fn))
+ (if subpattern-ref-continuations
@hanshuebner
Common Lisp Libraries by Edi Weitz member
hanshuebner added a line comment Mar 1, 2014

Use cond instead of this let

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@hanshuebner hanshuebner commented on an outdated diff Mar 1, 2014
closures.lisp
- ;; regular expressions like /(a|\1x)*/
- (setf (svref *regs-maybe-start* num) start-pos)
- (let ((next-pos (funcall inner-matcher start-pos)))
- (unless next-pos
- ;; restore old values on failure
- (setf (svref *reg-starts* num) old-*reg-starts*
- (svref *regs-maybe-start* num) old-*regs-maybe-start*
- (svref *reg-ends* num) old-*reg-ends*))
- next-pos)))))))
+ ;; here comes the actual closure for REGISTER; save it in a
+ ;; special variable so it can be called by subpattern
+ ;; references
+ (setf (getf (car register-matchers) num)
+ (lambda (start-pos &optional other-fn)
+ (declare (fixnum start-pos))
+ (if other-fn
@hanshuebner
Common Lisp Libraries by Edi Weitz member
hanshuebner added a line comment Mar 1, 2014

Use cond instead of let

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@hanshuebner hanshuebner and 1 other commented on an outdated diff Mar 1, 2014
closures.lisp
+ (declare (fixnum start-pos))
+ (if other-fn
+ ;; the presence of OTHER-FN indicates that this
+ ;; register has been entered via a subpattern
+ ;; reference closure; save the registers state,
+ ;; creating fresh new "bindings" for the local
+ ;; register offsets; restore the state before
+ ;; returning to the caller
+ (progn
+ (push other-fn subpattern-ref-continuations)
+ (push-registers-state nil nil nil)
+ (prog1
+ (funcall inner-matcher start-pos)
+ (pop-registers-state)
+ (pop subpattern-ref-continuations)))
+ (let ((old-*reg-starts* (svref *reg-starts* num))
@hanshuebner
Common Lisp Libraries by Edi Weitz member
hanshuebner added a line comment Mar 1, 2014

confusing naming. is this special or not? earmuffs are used for special variables only.

@nbtrap
nbtrap added a line comment Mar 1, 2014

Those names were there already. Shall I change them anyway?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@hanshuebner hanshuebner commented on an outdated diff Mar 1, 2014
closures.lisp
@@ -427,6 +493,19 @@ against CHR-EXPR."
reg-start reg-end)
(funcall next-fn next-pos)))))))))
+(defmethod create-matcher-aux ((subpattern-reference subpattern-reference) next-fn)
+ (declare #.*standard-optimize-settings*)
+ (declare (special register-matchers)
+ (function next-fn))
+ ;; close over the special variable REGISTER-MATCHERS in order to
+ ;; reference it during the match phase
+ (let ((num (num subpattern-reference))
+ (register-matchers register-matchers))
+ (declare (fixnum num) (function next-fn))
+ (lambda (start-pos)
+ (let ((subpattern-matcher (getf (car register-matchers) (1- num))))
@hanshuebner
Common Lisp Libraries by Edi Weitz member
hanshuebner added a line comment Mar 1, 2014

extra let, move the expression into the funcall.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@hanshuebner hanshuebner commented on an outdated diff Mar 1, 2014
@@ -693,7 +697,36 @@ closing #\> will also be consumed."
;; also syntax error
(signal-syntax-error* (1- (lexer-pos lexer))
"Character '~A' may not follow '(?<'."
- next-char ))))))
+ next-char))))))
+ ((#\&)
+ ;; subpattern reference by register name
+ (unless *allow-named-registers*
+ (signal-syntax-error* (1- (lexer-pos lexer))
+ "Character '~A' may not follow '(?'."
+ next-char))
+ (let ((next-char (next-char lexer)))
+ (if (alpha-char-p next-char)
@hanshuebner
Common Lisp Libraries by Edi Weitz member
hanshuebner added a line comment Mar 1, 2014

use unless -> signal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@hanshuebner hanshuebner commented on the diff Mar 1, 2014
lexer.lisp
+ "Character '~A' may not follow '(?'."
+ next-char))
+ (let ((next-char (next-char lexer)))
+ (if (alpha-char-p next-char)
+ (progn
+ ;; put the letter back
+ (decf (lexer-pos lexer))
+ (list :subpattern-reference
+ (parse-register-name-aux lexer :subpattern-reference t)))
+ (signal-syntax-error* (1- (lexer-pos lexer))
+ "Character '~A' may not follow '(?&'."
+ next-char))))
+ ((#\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9)
+ ;; put the digit back
+ (decf (lexer-pos lexer))
+ (prog1
@hanshuebner
Common Lisp Libraries by Edi Weitz member
hanshuebner added a line comment Mar 1, 2014

Indentation is wrong. first expression in prog1 gets two spaces extra indentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@hanshuebner
Common Lisp Libraries by Edi Weitz member

My review does not constitute a willingness to merge the change, which is up to Edi to decide.

@nbtrap

Suggested changes have been made.

nbtrap added some commits Mar 1, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment