Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perl 5 regexpes overreact #57

Closed
KamilaBorowska opened this issue Mar 29, 2012 · 6 comments
Closed

Perl 5 regexpes overreact #57

KamilaBorowska opened this issue Mar 29, 2012 · 6 comments

Comments

@KamilaBorowska
Copy link

Following code will see regexp / $b ... $c /, when it's obviously divide-by operator. Only second line is needed, but I needed to end regexp, so it would be visible (otherwise, everything would be just black. This doesn't happen in JavaScript and other languages with similar regexp syntax (like Ruby).

my ($a, $b, $c, $d);
my $variable = $a / $b;
my $othervar = $c / $d;
@isagalaev
Copy link
Member

I think I fixed it. Can you check your code with the latest master revision?

@KamilaBorowska
Copy link
Author

Confirmed. Broken. This change any broken non-obvious regexpes, like following:

split //, 'abc';

It was working previously. I'm aware that Perl is extremally context-sensitive language, but now it's second extreme. I don't think that parsing this without extreme ammount of work is possible, but what about checking spaces around / character (usually you would put space before regexp, but not inside it :P).

@isagalaev
Copy link
Member

It's not that extremely complex, it's a matter of carefully defining after which lexems a regexp is allowed. Right now I put only a pretty general set of symbols but as your example showed we need to add some keywords too, as javascript does. For Perl, I presume, those will be at least split and return. Can you tell after which other keywords one may expect a regexp?

@KamilaBorowska
Copy link
Author

Anything with non-empty prototype, except that in this case it would be considered as boolean value of match at $_ (for example, print /abc/ is actually print($_ =~ /abc/).

But more seriously, I would also add grep to this list, as grepping by regexp sort of makes sense... But if you would want completeness, things like my @matches = reverse /\w+/g also make sense in Perl (list of all words in $_ in reverse order)... yeah... I'm aware it can be annoying... Perl is very ambiguous... and if you will include subroutine prototypes...

It's nearly impossible to highlight Perl correctly. The syntax highlighter used by Github (Pygments) is pretty nice, but totally fails at $a / $b / $c (but it fails at lot more, like s < > { } (but it's not code you should write), oh wait... your tool also fails at it... but I think it's topic for separate issue, in fact, I've checked code like this in your highlighter just to find this issue).

Also, I have noticed that this change also broken regexes after statement modifiers (all statement modifiers should be allowed) and other literal keywords like or. This also should be fixed.

Also: Just a small note, I think you forgot typeof in that list in JavaScript.

@isagalaev
Copy link
Member

Since highlighter is not a full-blown translator, it allows more freedom in handling syntax. The point to which it really makes sense to try is, basically, highlighting that doesn't break in a spectacular way. For example, if you mistake division for a regexp you have the whole rest of the line highlighted wrong, and it's ugly. But failing to highlight the regexp in some rare case is OK. So I'd be happy with just putting all those keywords you mentioned in a regexp context starter. Or you have better ideas (I saw you forked the repository)?

totally fails at $a / $b / $c (but it fails at lot more, like s < > { } (but it's not code you should write), oh wait... your tool also fails at it...

Can you be more specific? It's better to file a new issue on that.

BTW we have a discussion group, there are more people than here :-).

@isagalaev
Copy link
Member

Added the keywords you mentioned. I'm going to close the issue for now, we can fix any remaining cases as they appear in the wild. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants