Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slashes in character classes in RegExp literals #584

Closed
satyr opened this issue Aug 6, 2010 · 9 comments
Closed

Slashes in character classes in RegExp literals #584

satyr opened this issue Aug 6, 2010 · 9 comments

Comments

@satyr
Copy link
Collaborator

satyr commented Aug 6, 2010

Found some quirks with them:

coffee> /[/]/
Error: In repl, too many ] on line 1
...
coffee> /#{'/'}/
Error: In repl, SyntaxError: Unterminated ' starting on line 1
...
coffee> /#{'\\\\'}/
SyntaxError: Invalid regular expression: /\\\/: \ at end of pattern
...
coffee> /works#{/in #{'Ruby, not'} in/}Coffee/
Error: In repl, Parse error on line 1: Unexpected '/'
...

js> /[/]/
/[/]/
js> RegExp('/')
/\//
js> RegExp('\\\\')
/\\/

irb(main):001:0> /works#{/in #{'Ruby, not'} in/}Coffee/
=> /works(?-mix:in Ruby, not in)Coffee/
@StanAngeloff
Copy link
Contributor

Escape the inner / with \ so /[\/]/ -- I wouldn't consider this a bug, don't seem like valid reg exp to me (why it's allowed in JS is beyond me).

@weepy
Copy link

weepy commented Aug 6, 2010

It's allowed because you don't need to escape things in square braces.

@michaelficarra
Copy link
Collaborator

I'm with weepy, the examples given by satyr should be valid. This looks like a bug to me.

@jashkenas
Copy link
Owner

Anyone got neat ideas for how to parse this in the lexer, without resorting to a fullblown regex parser?

@michaelficarra
Copy link
Collaborator

You mean you don't want to have to write a regex-matching regex for the lexer? Sounds fun to me, but could potentially be extremely complicated. Anyway, I don't see how it could be avoided.

@fab13n
Copy link

fab13n commented Aug 25, 2010

Given how messy to tokenize regexps already are, shouldn't you consider an alternative, cleaner regexp syntax? Something that shares most if its lexing code with string literals?

@michaelficarra
Copy link
Collaborator

I just wrote the following regular expression to match regular expressions:

/\/(\\\/|\[([^\]]|\\\])*\]|[^\/])+\/[img]*/

Anyone have a valid regex that can break it or an invalid regex that it matches?

edit: Crap, I completely forgot about the regex interpolation... that makes this a much more difficult problem.

@danielribeiro
Copy link

Regular expresion strings are not actually regular. They, for instance, include the language of matching parenthesis, which require stack automaton. Pumping lemma suffices to easily show this.

On the other hand, their grammar is quite simple. This so question shows it.

@jashkenas
Copy link
Owner

I've merged satyr's "heregex" branch to master, which fixes this issue, (as well as adding heregexes) ... closing the ticket.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants