New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize (?!) in regular expressions #106566
Comments
As an example of a >>> pattern = re.compile(r'^(\[)?\d+(?:\]|;(?(1)(?!)))$')
>>> pattern.match('[123]')
<_sre.SRE_Match object; span=(0, 5), match='[123]'>
>>> pattern.match('123;')
<_sre.SRE_Match object; span=(0, 4), match='123;'>
>>> pattern.match('123') is None
True What it does: you want to match integers either enclosed in brackets or ending with a semi-colon (didn't say it's an interesting example!). A more interesting pattern would be in Perl1: $ perl -e ''aaab' =~ /a+b?(?{print "$&\n";})(*FAIL)/;'
aaab
aaa
aa
a
aab
aa
a
ab
a Footnotes |
Thank you for your example @picnixz. But is not it equivalent to a simpler Perl's example is interesting, but Python does not support embedding arbitrary code (and perhaps should not, this feature looks too unsafe). |
Yes it is but then my example won't be using
I totally agree that this feature is unsafe but AFAIK, Perl doesn't care (it only gives you the possibility to do it although I didn't find an advice telling you not do it due to security concerns). In .NET, you can use ^(?<stack>\[)+[^[\]]*(?<-stack>\])+(?(stack)(?!))$ What it does:
I think this feature (IIRC, .NET refers it to as balancing groups) could be one day incorporated in Python (I don't know if |
Seems it is mostly useful with constructions which Python does not support. When (if) these are added, we'll see if |
Some regular expression engines support
(*FAIL)
as a pattern which fails to match anything.(?!)
is an idiomatic way to write this in engines which do not support(*FAIL)
.It works pretty well, but it can be optimized. Instead of compiling it as ASSERT_NOT opcode
it can be compiled as FAILURE opcode.
Unfortunately I do not know good examples of using
(*FAIL)
in regular expressions (without using(*SKIP)
) to include them in the documentation. Perhaps other patterns of using(*FAIL)
could be optimized future, but I do not know what to optimize.Linked PRs
The text was updated successfully, but these errors were encountered: