-
-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use XRegExp in order to support free-spacing mode and named captures #591
base: master
Are you sure you want to change the base?
Conversation
aa10bdf
to
22e3f77
Compare
@elia Did you have a chance to take a look at this? I'm not quite sure why the Travis CI build is failing (and only with jruby and 2.1.1). Any ideas? |
Seems that some newly supported syntax is blocked at compiler level: https://travis-ci.org/opal/opal/jobs/35417114#L173 using |
Maybe this could be made an option? Just a guess, but this is probably far slower than native regexps. |
If I understood correctly they're compiled down to regular regexp in most cases, so it shouldn't be an issue, otherwise I agree on making it optional. |
The concern is that they have to be parsed and compiled every time they are touched at runtime. An alternative solution here would be to have them converted at compile time. Obviously we can only do this with liberals. We would have to use the current approach for |
Literals*. I think we can leave liberals as they are :) |
According to my tests, the performance of the previous implementation is (mysteriously) even a little worse than when using XRegExp. |
I'm curious where we are with the decision to integrate with XRegExp or an alternative (perhaps something in Opal itself). This is going to become more and more of a problem as people start using the more advance regular expression constructions, like I like the idea of using XRegExp as a preprocessor so that it writes the expanded regular expression into the transpile source. I tend to put all regular expressions in constants, so for my purposes, I'd be totally fine with that only happening for regular expressions defined in constants. Another possibility is to simply provide a bridge with XRegExp so that if you choose to load it, you can define your regular expressions that way (using a Ruby type that maps to it). Then I can put Opal preprocessor conditionals in my source to prepare the regular expression differently based on the environment. |
I'm also pretty interested in a way to get XRegExp into Opal. I'm brand new to the Opal codebase but happy to help if I can. |
(And for what it's worth, re: @mojavelinux's points above, I've got some advanced regexes that are defined at runtime so I'd vote for a solution that's not compile-time-only.) |
Note: The slight differences between how the JavaScript code behaves and how the Ruby code behaves are all due to regex differences. Once Opal supports XRegExp, it will have the same behavior. See: opal/opal#591 To see how this code works, take a look at the ./opaltest.rb and ./lib/friends.rb files in particular. Run the JavaScript version (with some test code in ./lib/friends.rb) via: ruby opaltest.rb && node friends-opal.js Note that this setup is fairly janky. There are surely cleaner/ more standard ways of using Opal; this was just an attempt to produce *some* code to test with.
Hello! I was just working on this. It looks like the mspec tests won't run because an exception is raised by these two lines in unsupported = /[^imx]/.match flags
raise SyntaxError, "unknown regexp flag '#{unsupported[0]}' in /#{value}/#{flags}" if unsupported when they try to compile this regular expression: Adding After fixing that and dropping XRegExp.addToken(
/\\Z/,
function(match, scope, flags) {
return '$';
}
); (This ought to be right... It's The next error I get is:
My guess is that the exception is coming from But I don't really understand how we got there. Any help? c.f. boblail@982e530 |
(whoops! didn't realize my commit name would create so many mentions!) |
I am missing the option to all in the var XRegExp = require('xregexp'); to the code, will add that in for node. |
For what it's worth, I'll explain the approach we took in Asciidoctor / Asciidoctor.js. We define constants for regexp character groups and classes, then use those constants to build our regular expressions. We assign the compiled expressions to constants. In other words, we've abstracted away the creation of the regular expressions so we can redefine them in the Opal environment. Now that we've done that, I'd say Opal should have a flag to control (either at compile or runtime, I'm not sure) whether XRegExp is used. I can say with confidence that the extended groups in the regular expression (matching accented letters, for example) are not measurably slower than the ASCII ones (e.g., I'm sure it takes extra time to compile the extended groups (e.g., To clarify, I'm not making a case for or against here. I'm just documenting the strategy we took. |
Interestingly we've just run into this issue in interscript/interscript-js#10 . We're going to take the Opal-specific override approach but if this PR ever gets merged we'd love to use it. |
To note a thing - Opal 1.3 supports named captures in web browsers that support it - which means everything modern. |
This is my attempt to have Opal use XRegExp for all Regexps.
Please note that I have not included XRegExp in the Runtime because I was not sure about how that should be done.
This implementation works with the current development version of XRegExp 3.0. https://raw.githubusercontent.com/slevithan/xregexp/master/src/xregexp.js
In addition I also added the missing #options and #casefold? functions.
The new constructor for Regexp also provides support for passing flags when creating a Regexp which was previously not supported.