-
Notifications
You must be signed in to change notification settings - Fork 732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perl - Allow any non-whitespace character to delimit regexes #974
Conversation
Before merging this I want to think deeper about how we can support any non-whitespace character. It would be far safer to follow that Perl rule so we don't get into trouble later on in the lexer. Maybe It's a bit naive - doesn't allow for escaping the delimiter character. But I'm not even sure if you can escape all delimiters in Perl. Obviously you can when using That said, need to consider a bit more whether there are any negative implications here. |
@dblessing Thanks! Should we add this case to the Perl example? |
@stanhu I'd like to. It's a bit difficult to reproduce without a big example. If we only provided a small string with this then we'd never catch it as the catastrophic backtracking doesn't occur. How much of the example should we paste in? Also, I'm interested if you have any opinions on the regex in my last comment. Does it look safe? |
Hmm, that rule is still seems a little sketchy. It will assuming anything starting with |
@dblessing I agree, we should modify the rules to be able to handle any non-whitespace character. Right now if we change the delimiter to
I think that rule is sketchy. It matches way too much: http://rubular.com/r/Rryq4uMANA I'm not sure why you have the In addition, looking at http://perldoc.perl.org/perlop.html, I see that the pattern
I see that we use |
With Perl, it's tricky that we can't tell a variable like |
@stanhu Yeah, that should work. I don't think we need to worry about the rest of those regex rules in Perl because I'm thinking we should just leave the other rules in this lexer to handle those cases for now. That way 1) we don't break backward compatibility on anything (there are lots of different cases, and some are multiline and some aren't and idk why) 2) we don't have to handle escaped delimiter characters here if we're only handling this final case. I agree we should update the I'll update the PR. |
I also see that the regex rules are tried first before the variable rules. We may need to exclude a leading |
👍 I'll test that case. |
e5b00db
to
7b76cf7
Compare
UPDATE: Oops, I forgot the
|
lib/rouge/lexers/perl.rb
Outdated
|
||
# Perl allows any non-whitespace character to delimit | ||
# a regex when `m` is used. | ||
rule %r(m(\S).*\1[msixpodualngc]*), re_tok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^[^$]?m(\S).*\1[msixpodualngc]*
seems to work: http://rubular.com/r/leuEDyo2s4
pygments was able to parse this because it took same extra steps: Line 526 first catches normal name tokens, and in line 527 a Perl is more complex than I though @@ (with the m/ms/rx syntax) |
7b76cf7
to
e304731
Compare
@stanhu No matter what, I couldn't get it to work with the
|
Ah, of course. It's because the rule doesn't match with the $ at the beginning so it keeps going. Then, the varname rule catches it and pushes the varname state. It means the regex rule doesn't ever get a chance to eval on the Cool 👍 |
@dblessing I'm a little confused by your last comment. Is |
@stanhu No, it's not required, and actually causes things to break. The reason it's not required is because of what I said in my last comment. |
@dblessing Great, thanks. |
e304731
to
f4e6051
Compare
Perl allows regexes to be delimited by any non-whitespace character when `m` is used. The new rule covers that case. Previously only certain special characters were recognized.
f4e6051
to
aa42510
Compare
Released as 3.2.1 |
The previous rules for strings used a regular expression that combined different elements of the string. This could, in a pathological case, cause Rouge to get stuck trying to lex the code. This problem can be fixed by lexing a quoted string in a separate state to `:root`. In addition to adding states for quoted string, this commit also adds support for basic string interpolation. The basic code is lifted from the Ruby lexer. This fixes rouge-ruby#974.
The previous rules for strings used a regular expression that combined different elements of the string. This could, in a pathological case, cause Rouge to get stuck trying to lex the code. This problem can be fixed by lexing a quoted string in a separate state to `:root`. In addition to adding states for quoted string, this commit also adds support for basic string interpolation. The basic code is lifted from the Ruby lexer. This fixes #974.
In Perl, any non-whitespace character can be a regex delimiter. Obviously this doesn't allow
for that, but it does allow for a specific case seen recently where some Perl code had
a comma for a delimiter. In absence of this fix the lexer can get stuck later on other
rules.