-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple regex replacements at once #433
Comments
On 30 Dec 2017, at 22:37 , Bruno Le Floch ***@***.***> wrote:
In https://tex.stackexchange.com/questions/407637/ <https://tex.stackexchange.com/questions/407637/> it would have been
convenient to replace all ( by \paren{ and all ) by } at once to
make parenthesis-matching rely on TeX's matching of braces. (Instead I
abused xparse's delimiter matching.) Unfortunately, l3regex only
allows to make one replacement at a time and this would result in an
unbalanced token list, rightfully refused by l3regex. I propose
\regex_replace_case_once:nN(TF) and \regex_replace_case_all:nN(TF)
that would perform several replacements at once:
\regex_replace_case_all:nN
{
{ \( } { \c{paren} \cB\{ }
{ \) } { \cE\} }
}
\l_tmpa_tl
Not sure about the order of words in the function's name. Another
choice is to provide a replacement command that x-expands the
replacement parts (after finding the matches):
\regex_replace_all_x:nnN
{ [()] } % match open or close parenthesis
{
\c{if:w} \( \0
\c{paren} \cB\{
\c{else:}
\cE\}
\c{fi:}
}
I might be missing some obvious alternatives and other possible interfaces.
I find this an interesting idea; I’ve already fought with this problem, solving it by doing recursive replacements.
There are possible race conditions, though.
Ciao
Enrico
|
There are possible race conditions, though.
Could you elaborate please? Do you mean the question of whether the
replacements are done one after the other through the whole token list,
versus doing them simultaneously as if looking for matches of
"regex_1|regex_2|...|regex_n"? I strongly prefer the latter, partly
because it seems easier to implement from what I remember of the code,
and partly because it seems much more natural: to exchange x and y one
would then just need `{x} {y} {y} {x}` and not the more convoluted `{x}
{z} {y} {x} {z} {y}` hoping that `z` does not appear in the token list.
|
On 31 Dec 2017, at 02:58 , Bruno Le Floch ***@***.***> wrote:
> There are possible race conditions, though.
Could you elaborate please? Do you mean the question of whether the
replacements are done one after the other through the whole token list,
versus doing them simultaneously as if looking for matches of
"regex_1|regex_2|...|regex_n"? I strongly prefer the latter, partly
because it seems easier to implement from what I remember of the code,
and partly because it seems much more natural: to exchange x and y one
would then just need `{x} {y} {y} {x}` and not the more convoluted `{x}
{z} {y} {x} {z} {y}` hoping that `z` does not appear in the token list.
I can imagine the latter substitution restoring part of what the
former is replacing; depending on the implementation this might
be disallowed, or left to the user as an interesting exercise in
finding where the infinite loop is popping out. ;-)
I agree with {x}{y}{y}{x}, if feasible.
Ciao
Enrico
|
I can imagine the latter substitution restoring part of what the
former is replacing; depending on the implementation this might
be disallowed, or left to the user as an interesting exercise in
finding where the infinite loop is popping out. ;-)
I agree with {x}{y}{y}{x}, if feasible.
It's feasible: basically use the automaton that would be built for x|y
and then have a replacement that depends on which branch was taken.
It's actually trickier because group numbers need to be appropriately
reset for each branch.
|
Using the same machinery we should provide (each of the regex should be allowed to be an
Maybe the following (but I don't feel that this is quite the right syntax). No analogue of
The implementation strategy is roughly speaking to make a big regex |
Hum... It is clear that But for the |
In my pull request #928 I'm not implementing analogues of |
The mentioned `\exp_not:N` was dropped in commit 49c3147 (Correct computation of brace balance in regex_case functions (see latex3#433), 2021-05-16).
The mentioned `\exp_not:N` was dropped in commit 49c3147 (Correct computation of brace balance in regex_case functions (see latex3#433), 2021-05-16).
The mentioned `\exp_not:N` was dropped in commit 49c3147 (Correct computation of brace balance in regex_case functions (see latex3#433), 2021-05-16).
In https://tex.stackexchange.com/questions/407637/ it would have been
convenient to replace all
(
by\paren{
and all)
by}
at once tomake parenthesis-matching rely on TeX's matching of braces. (Instead I
abused
xparse
's delimiter matching.) Unfortunately,l3regex
onlyallows to make one replacement at a time and this would result in an
unbalanced token list, rightfully refused by
l3regex
. I propose\regex_replace_case_once:nN(TF)
and\regex_replace_case_all:nN(TF)
that would perform several replacements at once:
Not sure about the order of words in the function's name. Another
choice is to provide a replacement command that x-expands the
replacement parts (after finding the matches):
I might be missing some obvious alternatives and other possible interfaces.
The text was updated successfully, but these errors were encountered: