-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The .NET regular expression engine's capturing behavior is not the same as the ECMAScript standard. #24
Comments
Does using RegexOptions.ECMAScript help? http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regexoptions(v=VS.100).aspx |
@hakanson: We are already using the ECMAScript option, which works well for the most part. It is just this little piece that is different. |
I think this is something we'll have to live with for now, doing a custom regular expression implementation for this small detail is too much for too little gain currently. I'll leave the ticket open, and we'll look into it eventually. |
-1 for me for not looking in the code in Core.fs
I'm new to F#; does this mean you are implementing your own compiled RegExp cache? I ask because there is a Regex.CacheSize Property that controls an internal cache of compiled regular expressions. I assume it gave you more control to have your own cache, but thought I would add for completeness (as the risk of looking uninformed a second time on the same issue). http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.cachesize.aspx |
Yes we do maintain our own regexp cache, we found it to be faster actually. |
We found that in a loop like this...
...that .NET's regex cache was not helping. When we implemented the regexp cache shown above, we saw a 50% reduction in the time on the SunSpider regexp test. |
@otac0n - From the looks of it the BCL only caches for static methods on the Regex object so the increase in performance makes sense. |
For regular expressions such as this:
((a+)?(b+)?c+)*
There are 3 capturing groups (one for each left-parenthesis).
If this is matched against a string like the following:
bbbccaac
The .NET implementation will list the following capture groups:
((a+)?(b+)?c) = "aac"
(a+) = "aa"
(b+) = "bbb"
Whereas the ECMAScript spec specifies the following capturing behavior:
((a+)?(b+)?c) = "aac"
(a+) = "aa"
(b+) = undefined
The .NET implementation gives no indication that the
(b+)
capturing group did not participate in its most recent match attempt.The text was updated successfully, but these errors were encountered: