Multiple regex replacements at once #433

blefloch · 2017-12-30T21:37:47Z

In https://tex.stackexchange.com/questions/407637/ it would have been
convenient to replace all ( by \paren{ and all ) by } at once to
make parenthesis-matching rely on TeX's matching of braces. (Instead I
abused xparse's delimiter matching.) Unfortunately, l3regex only
allows to make one replacement at a time and this would result in an
unbalanced token list, rightfully refused by l3regex. I propose
\regex_replace_case_once:nN(TF) and \regex_replace_case_all:nN(TF)
that would perform several replacements at once:

\regex_replace_case_all:nN
  {
    { \( } { \c{paren} \cB\{ }
    { \) } { \cE\} }
  }
  \l_tmpa_tl

Not sure about the order of words in the function's name. Another
choice is to provide a replacement command that x-expands the
replacement parts (after finding the matches):

\regex_replace_all_x:nnN
  { [()] } % match open or close parenthesis
  {
    \c{if:w} \( \0
      \c{paren} \cB\{
    \c{else:}
      \cE\}
    \c{fi:}
  }

I might be missing some obvious alternatives and other possible interfaces.

The text was updated successfully, but these errors were encountered:

eg9 · 2017-12-30T22:05:17Z

On 30 Dec 2017, at 22:37 , Bruno Le Floch ***@***.***> wrote: In https://tex.stackexchange.com/questions/407637/ <https://tex.stackexchange.com/questions/407637/> it would have been convenient to replace all ( by \paren{ and all ) by } at once to make parenthesis-matching rely on TeX's matching of braces. (Instead I abused xparse's delimiter matching.) Unfortunately, l3regex only allows to make one replacement at a time and this would result in an unbalanced token list, rightfully refused by l3regex. I propose \regex_replace_case_once:nN(TF) and \regex_replace_case_all:nN(TF) that would perform several replacements at once: \regex_replace_case_all:nN { { \( } { \c{paren} \cB\{ } { \) } { \cE\} } } \l_tmpa_tl Not sure about the order of words in the function's name. Another choice is to provide a replacement command that x-expands the replacement parts (after finding the matches): \regex_replace_all_x:nnN { [()] } % match open or close parenthesis { \c{if:w} \( \0 \c{paren} \cB\{ \c{else:} \cE\} \c{fi:} } I might be missing some obvious alternatives and other possible interfaces.

I find this an interesting idea; I’ve already fought with this problem, solving it by doing recursive replacements. There are possible race conditions, though. Ciao Enrico

blefloch · 2017-12-31T01:58:03Z

There are possible race conditions, though.

Could you elaborate please? Do you mean the question of whether the replacements are done one after the other through the whole token list, versus doing them simultaneously as if looking for matches of "regex_1|regex_2|...|regex_n"? I strongly prefer the latter, partly because it seems easier to implement from what I remember of the code, and partly because it seems much more natural: to exchange x and y one would then just need `{x} {y} {y} {x}` and not the more convoluted `{x} {z} {y} {x} {z} {y}` hoping that `z` does not appear in the token list.

eg9 · 2017-12-31T09:19:13Z

On 31 Dec 2017, at 02:58 , Bruno Le Floch ***@***.***> wrote: > There are possible race conditions, though. Could you elaborate please? Do you mean the question of whether the replacements are done one after the other through the whole token list, versus doing them simultaneously as if looking for matches of "regex_1|regex_2|...|regex_n"? I strongly prefer the latter, partly because it seems easier to implement from what I remember of the code, and partly because it seems much more natural: to exchange x and y one would then just need `{x} {y} {y} {x}` and not the more convoluted `{x} {z} {y} {x} {z} {y}` hoping that `z` does not appear in the token list.

I can imagine the latter substitution restoring part of what the former is replacing; depending on the implementation this might be disallowed, or left to the user as an interesting exercise in finding where the infinite loop is popping out. ;-) I agree with {x}{y}{y}{x}, if feasible. Ciao Enrico

blefloch · 2018-01-01T18:41:49Z

I can imagine the latter substitution restoring part of what the former is replacing; depending on the implementation this might be disallowed, or left to the user as an interesting exercise in finding where the infinite loop is popping out. ;-) I agree with {x}{y}{y}{x}, if feasible.

It's feasible: basically use the automaton that would be built for x|y and then have a replacement that depends on which branch was taken. It's actually trickier because group numbers need to be appropriately reset for each branch.

blefloch · 2021-04-27T15:30:35Z

Using the same machinery we should provide (each of the regex should be allowed to be an N-type regex: the overhead from testing that is tiny compared to the regex matching machinery).

\regex_case_match:nn { { regex1 } { code1 } { regex2 } { code 2 } ... } { tl } and TF version
\regex_case_count:n { { regex1 } \l_A_int { regex2 } \l_B_int ... } { tl }
\regex_case_replace_once:nN { { regex1 } { replace1 } ... } \l_my_tl and TF version
\regex_case_replace_all:nN { { regex1 } { replace1 } ... } \l_my_tl and TF version

Maybe the following (but I don't feel that this is quite the right syntax). No analogue of \regex_extract_once:nnN nor \regex_split:nnN because I don't see how to make it different from a larger regex regex1|regex2|....

\regex_case_extract_all:nnN { { regex1 } \l_A_seq { regex2 } \l_B_seq ... } { tl } \l_full_seq

The implementation strategy is roughly speaking to make a big regex (?| (?:regex1) | (?:regex2) | ... ) but change how the final state in the automaton is handled, so that it keeps track of which branch succeeded. This needs 3-5 hours of code/test/docu.

blefloch · 2021-05-06T14:56:50Z

Hum... It is clear that \regex_case_count:n { { A } \l_a_int { B } \l_B_int } { B A X } should store 1 and 1 in the two ints so the two regexes have to be searched at the same time and matches should be counted along the way as we find them. This means that \regex_case_count:n { { A . } \l_a_int { B . } \l_B_int } { B A X } should store 0 and 1 in the two ints (first BA is found as a match for the second regex, then there isn't any A left).

But for the case_match function it is much less clear what \regex_case_match:nn { { A . } { code-A } { B . } { code-B } } { B A X } should do. Should it work consistently with the case_count function, hence run code-B? Should it test regexes one after the other, hence match A . later in the string and run code-A?

blefloch · 2021-05-16T20:26:22Z

In my pull request #928 I'm not implementing analogues of regex_count or regex_extract_all because it is very unclear what should be the syntax: it seems too messy to require the user to use one variable to get the result of each case. No analogue of regex_extract_once or regex_split because it is even less clear what to do.

The mentioned `\exp_not:N` was dropped in commit 49c3147 (Correct computation of brace balance in regex_case functions (see latex3#433), 2021-05-16).

The mentioned `\exp_not:N` was dropped in commit 49c3147 (Correct computation of brace balance in regex_case functions (see #433), 2021-05-16).

blefloch added enhancement New feature or request l3regex labels Dec 30, 2017

blefloch self-assigned this Dec 30, 2017

blefloch added a commit that referenced this issue May 16, 2021

Implement regex_case_match (see #433)

45836eb

blefloch mentioned this issue May 16, 2021

l3regex: get the positions of all matches? #921

Closed

blefloch added a commit that referenced this issue May 16, 2021

Tweak code and doc of \regex_case_match:nnTF (see #433)

44c7db5

blefloch added a commit that referenced this issue May 16, 2021

Implement \regex_case_replace_once:nN(TF) (see #433)

bac4378

blefloch added a commit that referenced this issue May 16, 2021

Implement \regex_case_replace_all:nN(TF) (see #433)

24646b3

blefloch added a commit that referenced this issue May 16, 2021

Correct computation of brace balance in regex_case functions (see #433)

911fedf

blefloch added a commit that referenced this issue May 16, 2021

Update changelog to record the new multi-regex functions (fixes #433)

b168e7f

blefloch mentioned this issue May 16, 2021

Tools to match/replace multiple regex concurrently (fixes #433) #928

Merged

blefloch added a commit that referenced this issue May 17, 2021

Correct regex_case for a single-token explicit regex (see #433)

ba1d113

josephwright pushed a commit that referenced this issue Jan 10, 2022

Implement regex_case_match (see #433)

663838c

josephwright pushed a commit that referenced this issue Jan 10, 2022

Tweak code and doc of \regex_case_match:nnTF (see #433)

c21bb89

josephwright pushed a commit that referenced this issue Jan 10, 2022

Implement \regex_case_replace_once:nN(TF) (see #433)

55d9f97

josephwright pushed a commit that referenced this issue Jan 10, 2022

Implement \regex_case_replace_all:nN(TF) (see #433)

3476a24

josephwright pushed a commit that referenced this issue Jan 10, 2022

Correct computation of brace balance in regex_case functions (see #433)

ffe85eb

josephwright pushed a commit that referenced this issue Jan 10, 2022

Update changelog to record the new multi-regex functions (fixes #433)

a62da3c

josephwright pushed a commit that referenced this issue Jan 10, 2022

Correct regex_case for a single-token explicit regex (see #433)

4a72b78

josephwright pushed a commit that referenced this issue Jan 10, 2022

Implement regex_case_match (see #433)

c7c2447

josephwright pushed a commit that referenced this issue Jan 10, 2022

Tweak code and doc of \regex_case_match:nnTF (see #433)

fe240ed

josephwright pushed a commit that referenced this issue Jan 10, 2022

Implement \regex_case_replace_once:nN(TF) (see #433)

25b87ef

josephwright pushed a commit that referenced this issue Jan 10, 2022

Implement \regex_case_replace_all:nN(TF) (see #433)

ef56dfb

josephwright pushed a commit that referenced this issue Jan 10, 2022

Correct computation of brace balance in regex_case functions (see #433)

43d3214

josephwright pushed a commit that referenced this issue Jan 10, 2022

Update changelog to record the new multi-regex functions (fixes #433)

20f43b9

josephwright pushed a commit that referenced this issue Jan 10, 2022

Correct regex_case for a single-token explicit regex (see #433)

f9f8b97

josephwright pushed a commit that referenced this issue Jan 10, 2022

Implement regex_case_match (see #433)

5f6e07a

josephwright pushed a commit that referenced this issue Jan 10, 2022

Tweak code and doc of \regex_case_match:nnTF (see #433)

8799381

josephwright pushed a commit that referenced this issue Jan 10, 2022

Implement \regex_case_replace_once:nN(TF) (see #433)

a6bfeb2

josephwright pushed a commit that referenced this issue Jan 10, 2022

Implement \regex_case_replace_all:nN(TF) (see #433)

4e7ad36

josephwright pushed a commit that referenced this issue Jan 10, 2022

Correct computation of brace balance in regex_case functions (see #433)

49c3147

josephwright closed this as completed in f5ef19b Jan 10, 2022

josephwright pushed a commit that referenced this issue Jan 10, 2022

Correct regex_case for a single-token explicit regex (see #433)

b8b5338

josephwright pushed a commit that referenced this issue Oct 30, 2023

Drop a sentence in comment for \__regex_replacement_put_submatch:n

7eb30a4

The mentioned `\exp_not:N` was dropped in commit 49c3147 (Correct computation of brace balance in regex_case functions (see #433), 2021-05-16).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple regex replacements at once #433

Multiple regex replacements at once #433

blefloch commented Dec 30, 2017

eg9 commented Dec 30, 2017 via email

blefloch commented Dec 31, 2017 via email

eg9 commented Dec 31, 2017 via email

blefloch commented Jan 1, 2018 via email

blefloch commented Apr 27, 2021

blefloch commented May 6, 2021

blefloch commented May 16, 2021

Multiple regex replacements at once #433

Multiple regex replacements at once #433

Comments

blefloch commented Dec 30, 2017

eg9 commented Dec 30, 2017 via email

blefloch commented Dec 31, 2017 via email

eg9 commented Dec 31, 2017 via email

blefloch commented Jan 1, 2018 via email

blefloch commented Apr 27, 2021

blefloch commented May 6, 2021

blefloch commented May 16, 2021