Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple regex replacements at once #433

Closed
blefloch opened this issue Dec 30, 2017 · 7 comments
Closed

Multiple regex replacements at once #433

blefloch opened this issue Dec 30, 2017 · 7 comments
Assignees
Labels
enhancement New feature or request l3regex

Comments

@blefloch
Copy link
Member

In https://tex.stackexchange.com/questions/407637/ it would have been
convenient to replace all ( by \paren{ and all ) by } at once to
make parenthesis-matching rely on TeX's matching of braces. (Instead I
abused xparse's delimiter matching.) Unfortunately, l3regex only
allows to make one replacement at a time and this would result in an
unbalanced token list, rightfully refused by l3regex. I propose
\regex_replace_case_once:nN(TF) and \regex_replace_case_all:nN(TF)
that would perform several replacements at once:

\regex_replace_case_all:nN
  {
    { \( } { \c{paren} \cB\{ }
    { \) } { \cE\} }
  }
  \l_tmpa_tl

Not sure about the order of words in the function's name. Another
choice is to provide a replacement command that x-expands the
replacement parts (after finding the matches):

\regex_replace_all_x:nnN
  { [()] } % match open or close parenthesis
  {
    \c{if:w} \( \0
      \c{paren} \cB\{
    \c{else:}
      \cE\}
    \c{fi:}
  }

I might be missing some obvious alternatives and other possible interfaces.

@blefloch blefloch added enhancement New feature or request l3regex labels Dec 30, 2017
@blefloch blefloch self-assigned this Dec 30, 2017
@eg9
Copy link
Contributor

eg9 commented Dec 30, 2017 via email

@blefloch
Copy link
Member Author

blefloch commented Dec 31, 2017 via email

@eg9
Copy link
Contributor

eg9 commented Dec 31, 2017 via email

@blefloch
Copy link
Member Author

blefloch commented Jan 1, 2018 via email

@blefloch
Copy link
Member Author

Using the same machinery we should provide (each of the regex should be allowed to be an N-type regex: the overhead from testing that is tiny compared to the regex matching machinery).

  • \regex_case_match:nn { { regex1 } { code1 } { regex2 } { code 2 } ... } { tl } and TF version
  • \regex_case_count:n { { regex1 } \l_A_int { regex2 } \l_B_int ... } { tl }
  • \regex_case_replace_once:nN { { regex1 } { replace1 } ... } \l_my_tl and TF version
  • \regex_case_replace_all:nN { { regex1 } { replace1 } ... } \l_my_tl and TF version

Maybe the following (but I don't feel that this is quite the right syntax). No analogue of \regex_extract_once:nnN nor \regex_split:nnN because I don't see how to make it different from a larger regex regex1|regex2|....

  • \regex_case_extract_all:nnN { { regex1 } \l_A_seq { regex2 } \l_B_seq ... } { tl } \l_full_seq

The implementation strategy is roughly speaking to make a big regex (?| (?:regex1) | (?:regex2) | ... ) but change how the final state in the automaton is handled, so that it keeps track of which branch succeeded. This needs 3-5 hours of code/test/docu.

@blefloch
Copy link
Member Author

blefloch commented May 6, 2021

Hum... It is clear that \regex_case_count:n { { A } \l_a_int { B } \l_B_int } { B A X } should store 1 and 1 in the two ints so the two regexes have to be searched at the same time and matches should be counted along the way as we find them. This means that \regex_case_count:n { { A . } \l_a_int { B . } \l_B_int } { B A X } should store 0 and 1 in the two ints (first BA is found as a match for the second regex, then there isn't any A left).

But for the case_match function it is much less clear what \regex_case_match:nn { { A . } { code-A } { B . } { code-B } } { B A X } should do. Should it work consistently with the case_count function, hence run code-B? Should it test regexes one after the other, hence match A . later in the string and run code-A?

@blefloch
Copy link
Member Author

In my pull request #928 I'm not implementing analogues of regex_count or regex_extract_all because it is very unclear what should be the syntax: it seems too messy to require the user to use one variable to get the result of each case. No analogue of regex_extract_once or regex_split because it is even less clear what to do.

muzimuzhi added a commit to muzimuzhi/latex3 that referenced this issue Oct 30, 2023
The mentioned `\exp_not:N` was dropped in commit
49c3147 (Correct computation of brace balance in regex_case functions (see latex3#433), 2021-05-16).
muzimuzhi added a commit to muzimuzhi/latex3 that referenced this issue Oct 30, 2023
The mentioned `\exp_not:N` was dropped in commit
49c3147 (Correct computation of brace balance in regex_case functions (see latex3#433), 2021-05-16).
muzimuzhi added a commit to muzimuzhi/latex3 that referenced this issue Oct 30, 2023
The mentioned `\exp_not:N` was dropped in commit
49c3147 (Correct computation of brace balance in regex_case functions (see latex3#433), 2021-05-16).
josephwright pushed a commit that referenced this issue Oct 30, 2023
The mentioned `\exp_not:N` was dropped in commit
49c3147 (Correct computation of brace balance in regex_case functions (see #433), 2021-05-16).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request l3regex
Projects
None yet
Development

No branches or pull requests

2 participants