Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documentation on backreference nesting level #14

Closed
ju1ius opened this issue Aug 3, 2016 · 1 comment
Closed

documentation on backreference nesting level #14

ju1ius opened this issue Aug 3, 2016 · 1 comment

Comments

@ju1ius
Copy link

ju1ius commented Aug 3, 2016

Hi,
first thank you very much @kkos for this wonderful library! ❤️

As you may know, the PHP language uses Oniguruma (currently v5.9.6) as the regexp engine for it's mbstring extension.
The PHP documentation on Oniguruma is unfortunately close to non-existent, so I'm currently trying to contribute a small reference chapter about it's syntax and distinctive features.

However I'm having a hard time to understand the "backreference with nesting level" feature.
What I currently understand is that they allow referencing the result of a subexpression up or down the subexpression call stack. Is that right?

For example considering this simplified version of example 2 from the docs:

(?<element>
    < (?<name> [a-z]+ ) >
    (?> [^<]+ | \g<element> )*
    </ \k<name+0> >
)

It's pretty clear to me that we ask the engine to refer to the result of the <name> subexpr at the current nesting level instead of referring to it's last captured value.

But in the original version from the docs, you use \k<name+1>.
So you're asking for the result of <name> one level deeper than the current nesting level.
I don't understand why this works and why \k<name+0> doesn't.

Would you mind enlightening me on the subject?
That would help me greatly in documenting the feature!

Thanks again!

@kkos
Copy link
Owner

kkos commented Aug 5, 2016

Because captured content of start tag is inside nested call frame and .

  exec(ONIG_ENCODING_UTF8, ONIG_ENCODING_UTF8, ONIG_OPTION_EXTEND,
       "(?<element> \\g<stag> \\g<content>* \\g<etag> ){0}"
       "(?<stag> < \\g<name> \\s* > ){0}"
       "(?<name> [a-zA-Z_:]+ ){0}"
       "(?<content> [^<&]+ (\\g<element> | [^<&]+)* ){0}"
       "(?<etag> </ \\k<name+1> >){0}"
       "\\g<element>",
       "<foo>f</foo>");

Subexp call frame is placed on match stack.
The stack is as follows at back-reference \k<name+1> point

call <etag>
return <content>
call <content>
return <stag>
return <name>
captured <name> info  <= referenced
call <name>
call <stag>
...

You can rewrite to \k<name+0> if you don't use stag call.

  exec(ONIG_ENCODING_UTF8, ONIG_ENCODING_UTF8, ONIG_OPTION_EXTEND,
       "(?<element> <\\g<name>\\s*> \\g<content>* \\g<etag> ){0}"
       "(?<name> [a-zA-Z_:]+ ){0}"
       "(?<content> [^<&]+ (\\g<element> | [^<&]+)* ){0}"
       "(?<etag> </ \\k<name+0> >){0}"
       "\\g<element>",
       "<foo>f</foo>");

@kkos kkos closed this as completed Aug 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants