Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

l3text: \c_group_begin_token and \c_group_end_token are not handled correctly #874

Closed
Skillmon opened this issue Apr 24, 2021 · 14 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@Skillmon
Copy link
Contributor

Inside of \text_expand:n the tokens \c_group_begin_token and \c_group_end_token aren't handled correctly. Instead of shipping tokens to the output of \text_expand:n they will leave { \if_false: } \fi: and \if_false: { \fi: }, respectively, in the input stream in front of the \exp:w-expansion, and place the next loop iteration afterwards.

This results in a low level Missing number, treated as zero. error being thrown.

MWE:

\documentclass[]{minimal}

\begin{document}
\ExplSyntaxOn
\text_expand:n { \c_group_begin_token \c_group_end_token }
\ExplSyntaxOff
\end{document}
@PhelypeOleinik
Copy link
Member

One option to handle this would be for \__text_expand_store:n to put things in \unexpanded in the output token list, and have a \__text_expand_store_e:n that doesn't, then at the end do one run of \expanded in the whole thing:

\documentclass{minimal}

\ExplSyntaxOn
\cs_gset:Npn \__text_expand_store:n #1
  { \__text_expand_store:nw { \exp_not:n {#1} } }
\cs_gset:Npn \__text_expand_store_e:n #1
  { \__text_expand_store:nw {#1} }
\cs_gset:Npn \__text_expand_end:w #1 \__text_expand_result:n #2
  {
    \group_align_safe_end:
    \exp_after:wN \exp_end:
    \tex_expanded:D {#2}
  }
\cs_gset:Npn \__text_expand_N_type_auxii:N #1
  {
    \token_if_eq_meaning:NNTF #1 \c_group_begin_token
      {
        \__text_expand_store_e:n { { \if_false: } \fi: }
        \__text_expand_loop:w
      }
      {
        \token_if_eq_meaning:NNTF #1 \c_group_end_token
          {
            \__text_expand_store_e:n { \if_false: { \fi: } }
            \__text_expand_loop:w
          }
          { \__text_expand_N_type_auxiii:N #1 }
      }
  }

\edef\x{\text_expand:n { \bgroup é \egroup }}\show\x
\stop

But it would fail miserably if \bgroup/\egroup are unbalanced (for example if it sees something like \@ifnextchar\bgroup{}{}).

@Skillmon
Copy link
Contributor Author

Why is \bgroup/\egroup processed at all?

@PhelypeOleinik
Copy link
Member

PhelypeOleinik commented Apr 24, 2021

I guess it depends a bit on the scope of “what should \text_expand:n do”. I see it as expanding the argument to text, so maybe \c_left_brace_str and \c_right_brace_str would be appropriate

@josephwright
Copy link
Member

Why is \bgroup/\egroup processed at all?

Equivalent to {/} and therefore forms a group - we are expandable, so we can't tell the difference.

@PhelypeOleinik
Copy link
Member

we are expandable, so we can't tell the difference.

We can actually. \tl_if_head_is_group:nTF is used earlier. At this point we know it is an N-type token, so definitely \bgroup/\egroup

@Skillmon
Copy link
Contributor Author

@josephwright not true, we already know the difference, since by the time we handle \bgroup/\egroup we already know the token is a valid N-type argument, so must be an implicit and not an explicit brace.

@Skillmon
Copy link
Contributor Author

Imho, \bgroup/\egroup (or equivalent) shouldn't be handled at all and passed through as they are, just like \group_begin: and \group_end: would.

@Skillmon
Copy link
Contributor Author

@PhelypeOleinik but the argument is (per documentation) formatted text, so could contain markup (and it might be possible that some markup is mistakenly not defined as \protected and uses some boxing with \bgroup/\egroup). So \c_left_brace_str/\c_right_brace_str are definitely wrong here.

@blefloch
Copy link
Member

Related issue. The blanket statement "Implicit tokens are converted to their explicit equivalent." in the documentation of \text_purify:n seems to motivate the definition of \@@_token_to_explicit:N, which does two things I disagree with.

  • It turns for instance \chardef and \mathchardef tokens into \char_generate:nn {#1} { \char_value_catcode:n {#1} } (is that reasonable for \mathchardef??). This is fine for catcode 10, 11, 12 (space/letter/other) but not any other. For instance after \chardef\myhat=`^ the command \myhat typesets a hat, but it would get turned to a superscript token by this code. Same problem for catcode 1,2 etc.

  • It turns implicit character tokens such as \c_group_begin_token into explicit ones, which is fine for most catcodes (at least 3,4,7,8,10,11,12), but is strange for catcodes 1,2.

I think this code might historically be why \text_expand:n has the tests in \@@_expand_N_type_auxi:N, but now I don't see why the \c_@@_chardef_group_begin_token etc are filtered out. What happens if we do the following?

\cs_set_eq:NN \__text_expand_N_type_auxi:N \__text_expand_N_type_auxiii:N

@car222222
Copy link
Contributor

Note that the rationale for, use of and inputs to these 'text altering commands' and related concepts needs revisiting. So there is not much point in discussing right now most of the details of 'what should happen to xyz'.

But meanwhile, fixes should be added for gross infelicities such as the origin of this issue, and maybe a small number of the other contributions.

@blefloch blefloch added bug Something isn't working expl3 labels Apr 29, 2021
@blefloch blefloch assigned blefloch and unassigned blefloch May 12, 2021
@josephwright josephwright self-assigned this May 17, 2023
@josephwright
Copy link
Member

@blefloch wrote

  • It turns for instance \chardef and \mathchardef tokens into \char_generate:nn {#1} { \char_value_catcode:n {#1} } (is that reasonable for \mathchardef??). This is fine for catcode 10, 11, 12 (space/letter/other) but not any other. For instance after \chardef\myhat=`^ the command \myhat typesets a hat, but it would get turned to a superscript token by this code. Same problem for catcode 1,2 etc.

Indeed, this doesn't make sense. The 11/2 distinction doesn't make sense here, so I am adjusting to 'if it's a space, make a space, otherwise make an other char'. I think that should be clear enough.

@josephwright
Copy link
Member

@blefloch asked

  • (is that reasonable for \mathchardef??)

Good question! I suspect possibly not but we will need quite a bit of data, etc., to map characters otherwise. I think one to address as and when we find issues: at least at present we get 'something' out.

@josephwright
Copy link
Member

OK, reading back the code comments I remember now why this code is there. If you look at the current comments, the aim is to deal with not only \bgroup/\egroup but also implicit spaces. The reason was that when we get 'further along' to case changing, we need to know about spaces and brace groups to decide where word boundaries are: things like 'is this a final sigma' need to know if the next token is a letter, a space or something else. Because we are dealign with TeX, we can't just assume everything is 'text', so there are some compromises.

Looking back, it's probably better to leave all implicit chars alone in \text_expand:n, and if they crop in some 'text processing' context, say 'tough'. (We do say that the argument here should be 'broadly text'.) We still need to deal with them in \text_purify:n, as that has to remove all non-Unicode text and so drop stuff if there is no alternative. But that's a different question.

At least a couple of commits coming up.

@davidcarlisle
Copy link
Member

Looking back, it's probably better to leave all implicit chars alone in \text_expand:n, and if they crop in some 'text processing' context, say 'tough'. (We do say that the argument here should be 'broadly text'.) We still need to deal with them in \text_purify:n

leaving as-is in \text_expand:n and making catcode 12 or 10 in \text_purify:n sounds good to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants