Skip to content

Added XRegExp.join() which joins an Array with the given separator in… #43

Closed
wants to merge 7 commits into from

2 participants

@KylePDavis

...a single XRegExp

Split the majority of XRegExp.union into a private prepareJoin() helper.
Added XRegExp.join() which joins an Array with the given separator into
a single XRegExp.
Updated XRegExp.union() to use new XRegExp.join() internally.
Rebuilt the xregexp-all.js file.

Kyle Davis Added XRegExp.join() which joins an Array with the given separator in…
…to a single XRegExp

Split the majority of XRegExp.union into a private prepareJoin() helper.
Added XRegExp.join() which joins an Array with the given separator into
a single XRegExp.
Updated XRegExp.union() to use new XRegExp.join() internally.
Rebuilt the xregexp-all.js file.
e5d6faa
@slevithan
Owner

This is very interesting, and well implemented. Thanks for making an effort to follow the conventions already found in XRegExp. The one thing missing is Jasmine tests in tests/spec/s-xregexp-methods.js, but I can add those.

That said, can you explain what the use cases for XRegExp.join might be? Apart from the alternation metacharacter | that is used by XRegExp.union, the only separator that, off the top of my head, seems like it might be useful to support here is the empty string. Keep in mind that the XRegExp.build addon can already be used for weird and unusual cases (it is essentially a more robust XRegExp.union).

Although I'm very interested to hear your arguments in favor of XRegExp.join, I think it might make more sense to add a new optional argument to XRegExp.union that lets you change the default | separator. I.e., instead of XRegExp.join([...], '', 'g'), perhaps you could use XRegExp.union([...], 'g', '').

@KylePDavis

Thank you and my apologies for the missing test cases. I'd be more than happy to implement those if you'd prefer. It should be very similar to XRegExp.union(). Either way, just let me know.

As for the use cases, I only have one so far but thought it might be helpful to provide a generic implementation so that others could join on whatever they wanted. Here's a little bit about my specific use case:

  1. I have a format String that I have tokenized (using XRegExp)
  2. I need to take each special token or literal token and generate a resulting compound XRegExp to match Strings in the format described by the format String

The biggest hang up I had with XRegExp.build() is that to use this I'd have to take my token stream that I just parsed out of a format string and then build a format string for .build() so that I could parse that and then turn it into the final compound XRegExp. This just seemed like too much run around for this use case and a .join() or .concat() or similar seemed like a reasonable expectation since it seems like the best way to ensure that the XRegExp extras are preserved.

I had considered simply extending .union() with a separator but it didn't seem like it would match up very well with the concept of boolean unions anymore if I were to go that route. So instead I thought it might be clearer to make .union() a special case of .join().

Anyhow, I'm happy to discuss this further and tweak as needed. Maybe moving the code from the private back into .join() would clean it up a bit. Just let me know how you'd like to proceed.

@slevithan
Owner

OK, it makes sense that XRegExp.build doesn't work for your case, but I'm still a bit unclear as to what you're doing. What separator are you using, if not the | used by XRegExp.union? Could you show some reduced code? I figured that the kind of thing you described (string tokenizing, etc.) would be the strongest use case for XRegExp.union, so it's good to hear from people like you who are doing similar things but not finding their cases supported.

I don't necessarily agree that a more generic solution is better. A couple examples of recent changes in the opposite direction:

  • Several versions ago, I removed the addFlags method and only offered XRegExp.globalize in its place, since flag /g is the only commonly added flag (it's needed to allow lastIndex updating, so it often makes sense to add it when a function has to work with regexes provided as arguments). Since then, I've never missed the ability to add other flags, and no one has ever asked for the old functionality back.
  • v3.0.0-pre removed XRegExp.addToken's very generic trigger function, replacing it with the flag string that covers what the trigger function was almost always used for. This rules out some possibilities, but makes XRegExp addons easier to write and understand.

As for the names join or concat, I'm a bit hesitant to use them. They're perfectly descriptive names, but my (minor) issue with them is that XRegExp includes a lot of methods (split, replace, match, exec, test) that closely mimic their String.prototype or RegExp.prototype counterparts. Thus, the name XRegExp.join might sound like a bugfixed version of Array.prototype.join with some XRegExp enhancements (similar to the other XRegExp methods styled on native String.prototype and RegExp.prototype methods), which would give a misleading impression about your regex-specific joining method. Avoiding this confusion is one of the reasons I'm currently leaning toward adding functionality to XRegExp.union to support cases it can't yet handle, rather than adding a new method.

@KylePDavis

Okay, that all makes sense. I think I understand your naming concerns a bit more now.
As for the joining separator I am indeed using empty string to join things.
I've created a Gist with a simple example that I hope illustrates my use case over here: https://gist.github.com/3755133

@slevithan
Owner

Neat! Thanks for sharing your code. :)

So, I'm going to go ahead and say that XRegExp 3.0.0 will provide some way to join regexes using an empty string joiner. However, I'm not yet certain what the best way to allow this functionality might be.

One option (as discussed previously) is to let XRegExp.union accept a third, optional, string argument, called joiner or separator. This argument would default to '|', but could be provided as ''. Perhaps XRegExp should throw an error if a value other than '|' or '' is given, since I think it might be unwise to allow arbitrary regex patterns as the joining pattern (for one thing, including capturing groups or backreferences in the pattern could cause unexpected behavior). Another option is to accept an options object with a boolean for switching from union to concat mode, using something like XRegExp.union([...], {flags: 'g', concat: true}). A third option is to add a new method as you did in this pull request (perhaps XRegExp.concat) that always joins patterns without any separator (with both XRegExp.union and XRegExp.concat relying on a private function that lets you specify the joining pattern). This third option seems like the cleanest answer, but I have an adverse reaction to adding a new method for something that I suspect will very rarely be used. :-/

I'm going to let this incubate for a while, allowing for more feedback and thought before settling on a final decision.

@KylePDavis

Great! I can use my fork for that scenario in the interim and then once it's in XRegExp 3.0.0 in some form or another I'll switch over to XRegExp proper and delete my fork. Thanks again for your help on this.

@slevithan slevithan modified the milestone: v3.1.0, v3.0.0 Aug 27, 2015
@slevithan
Owner

Closing in favor of #106.

@slevithan slevithan closed this Oct 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.