New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruby -> JavaScript regexp support #6
Comments
I'm interested to see any work you do towards this. But you should know up front that it's not possible to translate all Ruby/Oniguruma features to JS using XRegExp. Shorthand classes like Oniguruma's The only thing you mentioned that isn't possible is Ruby's You could make it work whenever JS's XRegExp.addToken(
/\\A/,
function () {
if (this.hasFlag('m')) {
throw new SyntaxError('cannot use \\A with flag /m');
}
return '^';
}
); Since that doesn't specify a token scope, it will work outside of character classes only, which is what you want here. The problem with the above approach is that, if you want to emulate Ruby, you should enable Note that Ruby's Also note that the |
Thanks for this comment Steven, it's really very helpful. I'm only planning on translating as many of the Oniguruma features as I can; hopefully I can cover some of the more common cases. People will still have to take care when writing expressions for both environments but at least it will be a little less painful. I started work on this last week and funnily enough, the first thing I tried to tackle was the I'll let you know when I've got the project up on GitHub. I'm out of the country for the next few weeks, so it might be late June by the time anything of note is finished. Thanks again for your help! |
Coolness. Yours might be the first third-party standalone XRegExp addon, so I'm looking forward to it. BTW, XRegExp.addToken(
/\\([hH])/,
function (match, scope) {
var inv = (match[1] === 'H'); // Uppercase for inverted
if (scope === 'class') {
return inv ? '\\0-/:-@G-`g-\\uffff' : '0-9A-Fa-f';
}
return '[' + (inv ? '^' : '') + '0-9A-Fa-f]';
},
{scope: 'all'}
); Also, it might be best to add your Ruby emulation only via a special constructor or function, so that standard XRegExp syntax remains as is when using the (function (XRegExp) {
var rubyMode = false;
function RubyRegex(pattern, flags) {
// Follow Ruby's flag /m -> /s quirk, and always apply the JavaScript /m
// flag so that ^ and $ work like Ruby
flags = (flags || '').replace(/m/g, 's') + 'm';
// Enable all the Ruby syntax extension tokens
rubyMode = true;
try {
return XRegExp(pattern, flags);
} catch (err) {
throw err;
} finally {
// Need to turn off rubyMode even if bad syntax caused an error
rubyMode = false;
}
}
XRegExp.install('extensibility');
// These tokens are activated only when building a regex using RubyRegex
// (not XRegExp), due to their trigger functions
XRegExp.addToken(
/\\A|other unsupported tokens/, // Update as necessary
function (match) {
throw new SyntaxError(match[0] + ' is not supported');
},
{
// Might need another token for unsupported Ruby syntax in scope
// 'class' or 'all'
scope: 'default',
trigger: function () {return !!rubyMode;}
}
);
XRegExp.addToken(
/\\z/,
function () {
return '$(?!\\s)';
},
{
trigger: function () {return !!rubyMode;}
}
);
// Add more tokens with the same trigger function...
}(XRegExp)); All of the code I've posted in these comments is untested, so beware of bugs. Hopefully this is helpful, though. :) Note that |
I'm going ahead with the special constructor for now, but I was just thinking: it might be cool to have a way of switching on a plugin like this through the XRegExp interface. Something like: XRegExp.use('ruby', true); which would allow us to add new tokens like this: XRegExp.addToken(
/\\z/,
function () {
return '$(?!\\s)';
},
{
trigger: function () { return XRegExp.using('ruby'); }
}
); without a new constructor. You'd need to store the keys and values passed by the Have you considered something like this before? I'm happy to fork XRegExp to demonstrate further if you want to explore it. |
That's a nice design. I like it. But I'd rather not rush into accepting new features that are exclusively intended for running XRegExp with different token sets via addons. (To use less XRegExp-exclusive terminology, I'm talking about features for swapping regex flavors on the fly.) Another way to do something similar would be to add a method that returns a new XRegExp object that uses a fresh and discrete list of tokens. Here's some hypothetical code: var RubyRegex = XRegExp.gimmeAFreshXRegExpYo();
RubyRegex.addToken(
// No trigger function needed here
); That way, you wouldn't have to worry about causing conflicts with code that calls the I'm happy to look over any changes in an XRegExp fork. But if your goal is to get them accepted upstream in the near term, it might be best to leave out Also, I'm not sure what the second (boolean) argument in your XRegExp.use('ruby', true);
XRegExp.use('steve', false); // turn off just this one
XRegExp.use('joe', true);
// Now using a mashup of ruby and joe token sets, in addition to the default tokens The biggest challenge I envision with that is that tokens can overlap or otherwise conflict. Right now, there's a simple rule: the token added latest wins. In the above scenario, I'm not sure what the semantics would be. FYI, I don't expect that I will allow any related features (such as the design you proposed or the one I described) to permit starting with a blank slate of no tokens (i.e., reverting to native JavaScript syntax). Some of the built-in tokens are critical to the bug-free functioning of XRegExp and its official addons. In particular, I'm thinking of the built-in tokens for making empty character classes work consistently cross-browser (necessary if you want to parse regex syntax, since otherwise you can't know where character classes end), and for disallowing octals (which is relied upon by BTW, if there's a good solution to the problem of token precedence with // Critical tokens like those for disabling octals are always included and cannot
// be installed/uninstalled
XRegExp.install({
namedCapture: true,
builtinFlags: true,
strictErrors: true,
miscSyntax: true
}); That way, XRegExp's syntax would stay the same, out of the box, but logical token groups (rather than individual tokens) could be disabled upon request. Not sure how quickly any of this will come to fruition, but it's a good discussion. I do want XRegExp to have robust support for addons. However, I'd also like to keep the addon API simple and prevent it from adding significant file size to |
Lots to think about here! I actually like your idea for returning a new The boolean argument for Anyway, I'll carry on with the wrapped constructor from above. I'm keen to get my teeth into writing the new tokens. I'll get back to you when I'm back at home in a few weeks with some more things to ponder! Thanks for the chat. |
Honestly, that approach would probably be safer and more manageable than use/install based functionality. Not only because the use/install route potentially adds complex semantics and issues to worry about in the future, but also because XRegExp is already almost too customizable for its own good. In particular, being able to remove XRegExp's built-in syntax might not actually be in the best interest of users.
If you have any questions about whether particular Oniguruma features can currently be reproduced in XRegExp, I'd be happy to answer. |
Is the reverse of this available (ie. Javascript Regex -> Ruby Regex)? I'm assuming Javascript Regex is less powerful than Ruby Regex, and thus the translation from Javascript -> Ruby might be more thorough. |
@joecorcoran FYI, due to various changes in recent builds of XRegExp 3.0.0-pre, the wrapped constructor approach will no longer work. Rather, I now recommend simply using a shared flag (such as Also, the way that syntax tokens are linked to flags has been simplified. E.g., instead of this: XRegExp.addToken(
/\\z/,
function() {
return '$(?!\\s)';
},
{trigger: function() {
return this.hasFlag('R');
}}
); ...In XRegExp 3 you will need to use this: XRegExp.addToken(
/\\z/,
function() {
return '$(?!\\s)';
},
{flag: 'R'}
); |
@joecorcoran Just discovered this gem and I'm way into it so far, great work. I also ran into this issue and with some googling I found this method in the new rails routing inspector: I patched it in here and it seems to be working so far: I could create a pull request for it but I figured I'd point it out to you in case you thought there was a better place to patch it in first. |
Hey @waymondo, thanks for pointing that out, I've never noticed it before. It might actually turn out to be a cheap way of achieving what this ticket was originally discussing! I definitely wouldn't want to just throw it in there though – it would be much better as a |
Closing this as part of bug triage. Feel free to continue discussion. |
There are big differences between the regular expression capabilities of Ruby and JavaScript. It's not within the scope of this project to bridge the gap, but it does cause problems for certain typical uses of the format validator. See issue #5 for an example.
At the moment, I'm thinking that the best way to solve this is to write a plugin for XRegExp which will add some of the more commonly used Oniguruma features (character types and anchors, POSIX bracket syntax for character classes). We can then use XRegExp in place of the native RegExp in Judge's client-side format validator method.
The text was updated successfully, but these errors were encountered: