Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular expression parse error - /^\_/ and /\\{/g #69

Closed
hyunjunekim opened this issue Nov 8, 2014 · 17 comments
Closed

Regular expression parse error - /^\_/ and /\\{/g #69

hyunjunekim opened this issue Nov 8, 2014 · 17 comments

Comments

@hyunjunekim
Copy link

Hi, I have some tests for regular expression.

case 1

var s = "abcd";
s.search(/^\_/);

SyntaxError: invalid regexp escape (line 1)
    duk_lexer.c:1578

case 2

var s = "abcd";
s.search(/\\{/g);

SyntaxError: invalid regexp quantifier (unknown char) (line 1)
    duk_lexer.c:1468

In the regular expression, I think that '' has some problem.

@svaarala
Copy link
Owner

svaarala commented Nov 8, 2014

The Ecmascript specification is quite strict in what escapes are allowed and required. In your case:

  • Case 1: it's not valid to escape an underscore, so the correct form would be /^_/, without the underscore.
  • Case 2: an open curly brace begins a quantifier of the form {n} or {n,m}. If you want to match a literal open curly brace, it must be escaped: /\\\{/.

Most Ecmascript engines are rather loose about regexp syntax, accepting many regexps that are technically syntax errors in the specification (as far as I've been able to determine). Because this is common behavior, Duktape may need to move into that direction, but right now it follows the E5 specification quite strictly.

@hyunjunekim
Copy link
Author

@svaarala
Case 1 : I think that it's valid.
2.8.7 Regular Expression Literals
valid

_ = > _( \ RegularExpressionNonTerminal )
_ => SourceCharacter
So

/^\_/

is possible.

@svaarala
Copy link
Owner

svaarala commented Nov 8, 2014

Ecmascript specification defines RegExp syntax in Section 15.10.1:

Ultimately it comes down to IdentityEscape:

IdentityEscape ::
SourceCharacter but not IdentifierPart
<ZWJ>
<ZWNJ>

The escaped character cannot be an identifier part, which includes a-z, A-Z, underscore, etc.

@hyunjunekim
Copy link
Author

IdentifierPart ::
IdentifierStart
UnicodeCombiningMark
UnicodeDigit
UnicodeConnectorPunctuation
<ZWNJ>
<ZWJ>

Because of that '_' is IdentifierStart, Is Case1 invalid?
And What do you mean "realworld"label ?

@svaarala
Copy link
Owner

svaarala commented Nov 8, 2014

Yes, IdentifierPart includes all IdentifierStart characters, e.g. underscore.

The "realworld" label means that the desired behavior is not compliant but expected by users.

@hyunjunekim
Copy link
Author

@svaarala Thank you. :)

@svaarala
Copy link
Owner

svaarala commented Nov 8, 2014

I might be wrong in my interpretation of the RegExp syntax in E5 - but so far I haven't been informed otherwise :)

@hyunjunekim
Copy link
Author

@svaarala
In the chrome for V8, very well for case1 and case2.
Is there a difference between Ecmascript Regex and Javascript Regex?

@hyunjunekim
Copy link
Author

@svaarala
special char('_' or '[' or ']' etc), need '' backslash.

@svaarala
Copy link
Owner

svaarala commented Nov 8, 2014

No, as far as I understand V8 and other engines don't comply to the E5 specification - they have a wider regexp syntax.

@sva-p
Copy link
Contributor

sva-p commented Nov 8, 2014

Underscore ('_') is not a special character and should not require escaping.

@hyunjunekim
Copy link
Author

Sorry, very well for '/a[0]/'

@svaarala
Copy link
Owner

svaarala commented Nov 8, 2014

You were missing the trailing trailing slash: /a\[0\]/ if you want to match the brackets literally.

@hyunjunekim
Copy link
Author

@svaarala
I learned a lot for you. Thank you. :)

@svaarala
Copy link
Owner

svaarala commented Nov 8, 2014

No problem :)

Anyway, as I said, it's somewhat confusing that Ecmascript engines vary between what they accept and reject for regexps.

@hyunjunekim
Copy link
Author

@svaarala
I understand that duktape is strict for Regex rather than other engines. :)

@svaarala
Copy link
Owner

Since there's nothing to fix in this issue, I'll close this one. I added #74 to track all the known regexp "real world" issues so that there's a clear idea of what cases will at least need fixing.

@svaarala svaarala changed the title regular expression error in the duktape. Regular expression parse error - /^\_/ and /\\{/g Nov 13, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants