Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use XRegExp in order to support free-spacing mode and named captures #591

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

wmnnd
Copy link
Contributor

@wmnnd wmnnd commented Sep 15, 2014

This is my attempt to have Opal use XRegExp for all Regexps.
Please note that I have not included XRegExp in the Runtime because I was not sure about how that should be done.
This implementation works with the current development version of XRegExp 3.0. https://raw.githubusercontent.com/slevithan/xregexp/master/src/xregexp.js

In addition I also added the missing #options and #casefold? functions.

The new constructor for Regexp also provides support for passing flags when creating a Regexp which was previously not supported.

@wmnnd
Copy link
Contributor Author

wmnnd commented Nov 24, 2014

@elia Did you have a chance to take a look at this? I'm not quite sure why the Travis CI build is failing (and only with jruby and 2.1.1). Any ideas?

@elia
Copy link
Member

elia commented Nov 24, 2014

Seems that some newly supported syntax is blocked at compiler level:

https://travis-ci.org/opal/opal/jobs/35417114#L173

using bundle exec rake mspec_node should give you a more meaningful stacktrace to identify the offending line

@vendethiel
Copy link
Contributor

Maybe this could be made an option? Just a guess, but this is probably far slower than native regexps.

@elia
Copy link
Member

elia commented Nov 24, 2014

If I understood correctly they're compiled down to regular regexp in most cases, so it shouldn't be an issue, otherwise I agree on making it optional.

@elia elia added this to the 0.8 milestone Nov 24, 2014
@adambeynon
Copy link
Contributor

The concern is that they have to be parsed and compiled every time they are touched at runtime. An alternative solution here would be to have them converted at compile time. Obviously we can only do this with liberals. We would have to use the current approach for Regexp.new

@adambeynon
Copy link
Contributor

Literals*. I think we can leave liberals as they are :)

@wmnnd
Copy link
Contributor Author

wmnnd commented Nov 24, 2014

According to my tests, the performance of the previous implementation is (mysteriously) even a little worse than when using XRegExp.
However, the real performance-gap is between using Opal and plain JavaScript - regardless of XRegExp. The only thing you might want to consider when using XRegExp is that compiling takes a tiny bit longer.
Of course, neither should usually not be a problem unless you are processing large amounts of data. A simple Regexp such as /a.*g/ can be run 10,000 times in much less than a second.

@elia elia modified the milestones: 0.8, 1.0 Jun 14, 2015
@elia elia mentioned this pull request Oct 3, 2015
@mojavelinux
Copy link
Contributor

I'm curious where we are with the decision to integrate with XRegExp or an alternative (perhaps something in Opal itself). This is going to become more and more of a problem as people start using the more advance regular expression constructions, like \p{Word} in Ruby 2. And that's important, because it allows text processing applications (like Asciidoctor) to have universal language support. If we match instead with \w, then the regular expression only works with languages that are based on basic Latin.

I like the idea of using XRegExp as a preprocessor so that it writes the expanded regular expression into the transpile source. I tend to put all regular expressions in constants, so for my purposes, I'd be totally fine with that only happening for regular expressions defined in constants.

Another possibility is to simply provide a bridge with XRegExp so that if you choose to load it, you can define your regular expressions that way (using a Ruby type that maps to it). Then I can put Opal preprocessor conditionals in my source to prepare the regular expression differently based on the environment.

@JacobEvelyn
Copy link

I'm also pretty interested in a way to get XRegExp into Opal. I'm brand new to the Opal codebase but happy to help if I can.

@JacobEvelyn
Copy link

(And for what it's worth, re: @mojavelinux's points above, I've got some advanced regexes that are defined at runtime so I'd vote for a solution that's not compile-time-only.)

JacobEvelyn added a commit to JacobEvelyn/friends that referenced this pull request Jan 10, 2016
Note: The slight differences between how the JavaScript code
behaves and how the Ruby code behaves are all due to regex
differences. Once Opal supports XRegExp, it will have the
same behavior. See:

opal/opal#591

To see how this code works, take a look at the ./opaltest.rb
and ./lib/friends.rb files in particular. Run the JavaScript
version (with some test code in ./lib/friends.rb) via:

ruby opaltest.rb && node friends-opal.js

Note that this setup is fairly janky. There are surely cleaner/
more standard ways of using Opal; this was just an attempt to
produce *some* code to test with.
boblail added a commit to boblail/opal that referenced this pull request May 6, 2016
boblail added a commit to boblail/opal that referenced this pull request May 6, 2016
boblail added a commit to boblail/opal that referenced this pull request May 6, 2016
@boblail
Copy link

boblail commented May 6, 2016

Hello!

I was just working on this.

It looks like the mspec tests won't run because an exception is raised by these two lines in literal.rb:

unsupported = /[^imx]/.match flags
raise SyntaxError, "unknown regexp flag '#{unsupported[0]}' in /#{value}/#{flags}" if unsupported

when they try to compile this regular expression: /[0-9]/g.

Adding g to unsupported = /[^imx]/.match flags gets us past that error, but I'm not sure if that's correct — since g is a valid regexp flag in Javascript but not in Ruby.


After fixing that and dropping XRegExp's source into runtime.js 😬, I hit an error where XRegExp didn't recognize the escape character \Z. I defined that:

XRegExp.addToken(
    /\\Z/,
    function(match, scope, flags) {
      return '$';
    }
);

(This ought to be right... It's $ that should be redefined to match how Ruby works...)


The next error I get is:

Exception: An error occurred while compiling: "foo = 42 if (Test)"
other.$=~ is not a function
    at Opal.defs.TMP_1 [as $new] (/private/var/folders/7b/7g1jyn3d09l17gl_mxnrmg8m0009t9/T/opal-nodejs-runner-20160506-72469-1rm4tcr:6813:15)
    at ːexception (/private/var/folders/7b/7g1jyn3d09l17gl_mxnrmg8m0009t9/T/opal-nodejs-runner-20160506-72469-1rm4tcr:6831:31)
    at $Compiler.ːraise (/private/var/folders/7b/7g1jyn3d09l17gl_mxnrmg8m0009t9/T/opal-nodejs-runner-20160506-72469-1rm4tcr:6558:31)
    at $Compiler.ːcompile (/private/var/folders/7b/7g1jyn3d09l17gl_mxnrmg8m0009t9/T/opal-nodejs-runner-20160506-72469-1rm4tcr:38047:27)
    at Object_alloc.ːexpect_compiled (/private/var/folders/7b/7g1jyn3d09l17gl_mxnrmg8m0009t9/T/opal-nodejs-runner-20160506-72469-1rm4tcr:79720:113)
    at Object_alloc.$aa.$$p.TMP_70 (/private/var/folders/7b/7g1jyn3d09l17gl_mxnrmg8m0009t9/T/opal-nodejs-runner-20160506-72469-1rm4tcr:79710:23)
    at Object_alloc.ːinstance_eval (/private/var/folders/7b/7g1jyn3d09l17gl_mxnrmg8m0009t9/T/opal-nodejs-runner-20160506-72469-1rm4tcr:5200:24)
    at module_constructor.ːprotect (/private/var/folders/7b/7g1jyn3d09l17gl_mxnrmg8m0009t9/T/opal-nodejs-runner-20160506-72469-1rm4tcr:40238:78)
    at $a.$$p.TMP_29 (/private/var/folders/7b/7g1jyn3d09l17gl_mxnrmg8m0009t9/T/opal-nodejs-runner-20160506-72469-1rm4tcr:39823:88)
    at Opal.yieldX (/private/var/folders/7b/7g1jyn3d09l17gl_mxnrmg8m0009t9/T/opal-nodejs-runner-20160506-72469-1rm4tcr:1215:18)

My guess is that the exception is coming from String#=~

But I don't really understand how we got there. Any help?

c.f. boblail@982e530

@boblail
Copy link

boblail commented May 6, 2016

(whoops! didn't realize my commit name would create so many mentions!)

@h4ck3rm1k3
Copy link
Contributor

I am missing the option to all in the var XRegExp = require('xregexp'); to the code, will add that in for node.

@mojavelinux
Copy link
Contributor

mojavelinux commented Mar 21, 2017

For what it's worth, I'll explain the approach we took in Asciidoctor / Asciidoctor.js.

We define constants for regexp character groups and classes, then use those constants to build our regular expressions. We assign the compiled expressions to constants. In other words, we've abstracted away the creation of the regular expressions so we can redefine them in the Opal environment.

Now that we've done that, I'd say Opal should have a flag to control (either at compile or runtime, I'm not sure) whether XRegExp is used.

I can say with confidence that the extended groups in the regular expression (matching accented letters, for example) are not measurably slower than the ASCII ones (e.g., [A-Z]). That's probably because all the work of running the regular expression happens in compiled C code.

I'm sure it takes extra time to compile the extended groups (e.g., \p{L}). That's why I would prefer to do it at compile time. But I understand there are cases when you need to make a dynamic regular expression...so access to XRegExp is important. However, you could move that logic into a method that you override in the Opal environment and then use XRegExp yourself.

To clarify, I'm not making a case for or against here. I'm just documenting the strategy we took.

@ronaldtse
Copy link

Interestingly we've just run into this issue in interscript/interscript-js#10 . We're going to take the Opal-specific override approach but if this PR ever gets merged we'd love to use it.

@elia elia removed this from the v1.1 milestone Jan 4, 2021
@hmdne hmdne added the regexp label Jul 5, 2021
@hmdne
Copy link
Member

hmdne commented Nov 3, 2021

To note a thing - Opal 1.3 supports named captures in web browsers that support it - which means everything modern.

hmdne pushed a commit to hmdne/opal that referenced this pull request Jan 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants