New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add r method to StringContext (for regexes) #7496

Open
scabug opened this Issue May 18, 2013 · 15 comments

Comments

Projects
None yet
1 participant
@scabug
Copy link

scabug commented May 18, 2013

Scala currently has a method StringLike#r for compiling strings to regexes.

Now that Scala has string interpolation and macros I think we can do even better with a r method on StringContext. Here is why:

  1. Escaping gets simpler (before: """\d+""".r, after: r"\d+")

  2. Parameterization gets simpler (before: ("foo"+Pattern.escape(bar)).r, after: r"foo$bar")

  3. Regexes can be syntax-checked at compile-time

  4. Pattern matching against regexes gets simpler (before: val re = """\d+(.*)\d+""".r; … case re\(x\) => x, after: case r"\d+$x\d+" => x)

I have a proof of concept implementation available at https://github.com/qerub/scala-regex-stringcontext and can make a patch for Scala proper if people like this idea.

@scabug

This comment has been minimized.

Copy link
Author

scabug commented May 18, 2013

@scabug

This comment has been minimized.

Copy link
Author

scabug commented May 18, 2013

@qerub said:
This issue obviously hasn't blocker priority, but the field is stuck in my browser.

@scabug

This comment has been minimized.

Copy link
Author

scabug commented May 18, 2013

@soc said (edited on May 18, 2013 2:27:49 PM UTC):
Fixed the fields.

I think there are already a few implementations of that, including one in which you can name the capture group inline.

Would be interesting to check and compare what those implementations do and figure out what's a nice subset to include.

@scabug

This comment has been minimized.

Copy link
Author

scabug commented May 18, 2013

@qerub said:
Thanks!

I think I found an implementation of what you're mentioning: http://dcsobral.blogspot.com/2012/01/string-interpolation-on-scala-210.html

Yes, I agree. Let's proceed by collecting all relevant ideas.

@scabug

This comment has been minimized.

Copy link
Author

scabug commented May 18, 2013

@qerub said:
I've updated my PoC implementation with support for direct pattern matching and added a fourth item to the issue description.

Code: qerub/scala-regex-stringcontext@4228dcb#L1R41

@scabug

This comment has been minimized.

Copy link
Author

scabug commented May 18, 2013

@qerub said (edited on May 18, 2013 10:27:41 PM UTC):
Open questions for #4: Should the interpolation points match {{.*}}, {{.+}}, {{.*?}} or {{.+?}}? What should happen if the regex include explicit capture groups?

@scabug

This comment has been minimized.

Copy link
Author

scabug commented May 18, 2013

@dcsobral said:
There's an alternative version of interpolation matching where the interpolated text will always match the existing capture groups. By convention, people write things like (\d+)$x (\d+)$y, but that works just like (\d+) (\d+)$x$y. Anyway, such code was recently used by Adriaan Moors on a presentation (iirc), and can also be seen in this blog.

Personally, I think that style is error prone, since the order in which things get assigned is not related to their position, but to the number of capturing groups. Add a capture group by mistake, and the match will start failing.

Of these two, I do prefer my own version (as described in my blog post already mentioned), and I do like .* over any of the alternatives, but that does decrease the usefulness of matching quite a bit, since one often wants whatever is being capture to follow a pattern of its own.

What I'd really rather do is go with macros and use named patterns. Java 7 supports (?<NAME>X) to specify named capture groups. A Scala macro could check the string for named capture groups and assign them (with or without Java 7 support), and translate a string with them into one with normal interpolation, such as the one in the blog I mention. Since the macro will check the regex for named groups, it can ensure that the names are properly assigned, and give compile time errors when matching if there are unnamed capture groups.

@scabug

This comment has been minimized.

Copy link
Author

scabug commented May 19, 2013

@qerub said:
Thanks for your great input, Daniel!

I like your idea with named capture groups but I don't know how to implement it; it requires a code transform that's non-local to unapplySeq. Any hints?

@scabug

This comment has been minimized.

Copy link
Author

scabug commented May 20, 2013

@adriaanm said:
Daniel, I agree named groups would be an improvement. In my presentation I was just trying to explain string interpolation on a slide. In any case, I hadn't considered the problem you point out -- good point!

@scabug

This comment has been minimized.

Copy link
Author

scabug commented May 20, 2013

@xeno-by said:
We could always go for untyped macros ;) #5903

@scabug

This comment has been minimized.

Copy link
Author

scabug commented Oct 7, 2013

@som-snytt said:
I'm hijacking this issue, if that's OK, for a macro that does named groups (by parsing them out, for Java 6 support, and supplying them to Regex) and as a bonus (which I think was a D.C. Sobral request on StackOverflow), optional groups are extracted as Option instead of nullable. I don't know why they call them macros instead of magic.

Still to do: interpolator ensures holes and groups balance; nail down a nicer syntax for how holes "bind" to groups. Note that (\d+)$i is kind of backward to $i%d.

Since we are used to f-interp, how about a scanf version, sf"$s%s $n%d $v%f", that would cover the common extractions.

@scabug

This comment has been minimized.

Copy link
Author

scabug commented Jan 17, 2014

@som-snytt said:
wip at https://github.com/som-snytt/regextractor
https://github.com/som-snytt/regextractor/blob/master/core/src/test/scala/regex/GrTest.scala

No effort yet to turn r"(?text)" into r"$x(text)", but you do get (m group "x").

I previously did some work to "fast-track" it, which I can pursue later if there's interest.

@scabug

This comment has been minimized.

Copy link
Author

scabug commented Jul 25, 2014

@jroper said:
I don't know if this has been considered, but I think it would be very useful if the regex expression was not compiled on each invocation of unapplySeq (which seems to be the case for all the proposed solutions so far), rather, it should be stored statically and referenced from the generated unapplySeq method. I'm not sure if this is possible with macros, but compiling regexes causes significant performance issues when their compilation can otherwise be cached - for the use cases that I want to use regex string interpolated extractors (for a router in Play Framework), if the regex had to be compiled in each invocation, that would be a performance show stopper.

@scabug

This comment has been minimized.

Copy link
Author

scabug commented Dec 8, 2015

@SethTisue said:
a SLIP on this would (I think) be welcome

@scabug

This comment has been minimized.

Copy link
Author

scabug commented Dec 14, 2015

@soc said:
I think the issue is that this needs compiler support for hoisting regex expressions (and compilation) to ~static places. I absolutely want to see something like this, but I don't have the bandwidth (and only minor desire to go through the SLIP process).

@scabug scabug added the enhancement label Apr 7, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment