Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiline mode (anchors for individual lines) #16

Open
jcgoble3 opened this issue Feb 19, 2016 · 3 comments
Open

Multiline mode (anchors for individual lines) #16

jcgoble3 opened this issue Feb 19, 2016 · 3 comments

Comments

@jcgoble3
Copy link
Owner

A staple of any regex flavor. For the enhanced patterns (issue #9), this could simply be a flag passed as a separate argument that changes the meaning of ^ and $. But it could be implemented for the basic functions as well, if an appropriate letter code can be chosen.

Unfortunately, many letters are already taken by PUC Lua: a, b, c, d, f, g, l, p, s, u, w, x, and z. That's half the alphabet in use by stock patterns alone, not to mention that named captures (issue #14) have taken k. So we're running low on letters and numbers are not an option (backreferences). Also, I'm guaranteeing that all patterns written for stock Lua will remain compatible with the basic functions in this library as long as they do not rely on undefined behavior, and the 5.3 manual specifies that non-magic characters match themselves literally whether escaped or not, so new punctuation characters cannot be used even with an escape. So I may have to relegate this to enhanced patterns only.

@jcgoble3
Copy link
Owner Author

As a side note, the frontier pattern can work for this most of the time, but it's a bit clunky, and doesn't handle the case of two consecutive linebreaks.

@jcgoble3
Copy link
Owner Author

I wonder if I can use an ASCII control code here. The 5.3 manual says this:

"x: (where x is not one of the magic characters ^$()%.[]*+-?) represents the character x itself."

and:

"%x: (where x is any non-alphanumeric character) represents the character x. This is the standard way to escape the magic characters. Any non-alphanumeric character (including all punctuation characters, even the non-magical) can be preceded by a '%' when used to represent itself in a pattern."

The question: how many people would interpret "any non-alphanumeric character" to include control codes, and how many would bother to escape control codes? FWIW, the escape() function I introduced in issue #12 explicitly does not escape control codes.

If I go that route, probably the best sequence would be %\027 (27 being the escape character) followed by an argument to indicate what to do.

@jcgoble3
Copy link
Owner Author

Better yet: GNU recognizes \e for the escape character. So why not use %e? It's recognizable, backward compatible, and not already in use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant