[l3text] Case change for \i breaks with hyperref #671

moewew · 2020-02-10T07:13:18Z

When compiled with pdfLaTeX (with LaTeX2e <2020-02-02> patch level 1, L3 programming layer <2020-02-08>) the following MWE fails for me

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{expl3}[2020/01/12]

\ExplSyntaxOn
\def\test{\text_lowercase:n}
\ExplSyntaxOff

\usepackage{hyperref}

\begin{document}
\test{\i}
\end{document}

with

! Undefined control sequence.
\GenericError  ...                                
                                                    #4  \errhelp \@err@     ...
l.13 \test{\i}
              
? X

Full log file dotlessihyperref.log

The problem goes away if hyperref is dropped.

The text was updated successfully, but these errors were encountered:

josephwright · 2020-02-10T08:38:07Z

Ah, I forgot to cover the hyperref encodings: fix coming up.

u-fischer · 2020-02-10T08:42:13Z

@josephwright it isn't only hyperref. This here fails too:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{expl3}[2020/01/12]

\ExplSyntaxOn
\def\test{\text_lowercase:n}
\ExplSyntaxOff

\DeclareTextSymbol{\i}{OT1}{25}
\begin{document}

\test{\i}

\end{document}

josephwright · 2020-02-10T08:43:32Z

@u-fischer Well if you use OT1 you deserve what you get! More seriously, I'm not sure whether that should be fixed or not: the expl3 code tends to work on the basis of T1 with 8-bit engines, TU otherwise.

u-fischer · 2020-02-10T08:46:18Z

@josephwright OT1 was only an example. It fails as soon as one try to add a definition for \i in some other encoding. (I started with PD1 from hyperref). E.g.

\documentclass{article}
\usepackage[LGR,T1]{fontenc}
\usepackage{expl3}[2020/01/12]

\ExplSyntaxOn
\def\test{\text_lowercase:n}
\ExplSyntaxOff

\DeclareTextSymbol{\i}{LGR}{25}
\begin{document}
%\tracingmacros=1
\test{\i}

\end{document}

josephwright · 2020-02-10T08:50:07Z

@u-fischer Yes, but the general point stands: encodings other than T1 and TU, plus the hyperref ones, are really not supported, certainly by the case changer. We can add more encodings to the 'known' list, but they do have to be pre-defined: there's no way to pick up \<name>-cmd other than knowing it in advance.

If we want to go for 'all the obvious encodings', we can, but then we have to start to worry about putting in the case changing data too (LGR for example would be completely wrong at present).

FrankMittelbach · 2020-02-10T08:52:02Z

Am 10.02.20 um 09:43 schrieb Joseph Wright:

if you want it to become a replacement for current \MakeUppercase then it has to, sorry

josephwright · 2020-02-10T08:56:06Z

@FrankMittelbach Fair enough. Two parts to the issue

Making sure the expansion code is safe: easy as it just needs to know \<name>-cmd
Making the case changing itself work: more tricky as there needs to be encoding-specific
code for the mappings

car222222 · 2020-02-10T08:59:17Z

if you want it to become a replacement for current \MakeUppercase then it has to, sorry

Sorry, but we do not want this to happen so this is not needed.
Whilst It could be that one day a non-OT1 version of \MakeUpperCase will use this fairly directly, the current functionality thereof will need to be provided by use of a list of LICRs, or whatever.

Note that I am making no comment here on what the l3 text processing module should provide, or how it should do so, but I am pointing out that these are distinct questions.

But It does not need to provide low-level replacements for anything.

FrankMittelbach · 2020-02-10T09:03:29Z

Am 10.02.20 um 09:50 schrieb Joseph Wright:

shouldn't you be able to detect if something is an encoding specific command in a general way? I really don't think it would fly if \MakeUppercase was able to handle Cyrillic LGR OT1 ... and that would stop working Maybe it is enough to restrict to "known established encodings" but in theory anything that comes along via \DeclareFontEncoding should be treated (somehow).

FrankMittelbach · 2020-02-10T09:05:08Z

if you want it to become a replacement for current \MakeUppercase then it has to, sorry

Sorry, but we do not want this to happen. It could be that one day \

????

car222222 · 2020-02-10T09:07:43Z

if you want it to become a replacement for current \MakeUppercase then it has to, sorry
Sorry, but we do not want this to happen so this is not needed.
Whilst It could be that one day a non-OT1 version of \MakeUpperCase will use this fairly directly, the current functionality thereof will need to be provided by use of a list of LICRs, or whatever.

Note that I am making no comment here on what the l3 text processing module should provide, or how it should do so, but I am pointing out that these are distinct questions.

But It does not need to provide low-level replacements for anything.

car222222 · 2020-02-10T09:10:28Z

What are these bizarre beasts, and why are they supported: the hyperref ones ??

FrankMittelbach · 2020-02-10T09:12:22Z

Two parts to the issue

* Making sure the expansion code is safe: easy as it just needs to know `\<name>-cmd`

which we get from \DeclareFontEncoding (I guess)

* Making the case changing itself work: more tricky as there needs to be encoding-specific
  code for the mappings

conceptually (though not very efficient) I see this sequence

LICR (in some encoding X) -> "LICR in TU" -> do casing -> "new LICR in TU" -> "new LICR in X"

with probably a lot of headache but ...

josephwright · 2020-02-10T10:32:43Z

On 10/02/2020 09:12, Frank Mittelbach wrote: > Two parts to the issue > > * Making sure the expansion code is safe: easy as it just needs to know `\<name>-cmd` which we get from \DeclareFontEncoding (I guess)

Currently I've just got a block of hard-coded statements \cs_new:Npn \@@_expand_textcomp:NN #1#2 { \exp_not:n {#1} } \text_declare_expand_equivalent:cn { ?-cmd } { \@@_expand_textcomp:NN } \text_declare_expand_equivalent:cn { T1-cmd } { \@@_expand_textcomp:NN } \text_declare_expand_equivalent:cn { TS1-cmd } { \@@_expand_textcomp:NN } \text_declare_expand_equivalent:cn { TU-cmd } { \@@_expand_textcomp:NN } We could generate additional entries automatically, though presumably this would be \AtBeginDocument so would be problematic in the preamble. Of course, \MakeUppercase is not expandable; it's unlikely that people are doing \MakeUppercase{\title{fo\aa{}}} in the preamble with encoding-specific commands in the text. The alternative would be to look at each cs token and look for \<thing>-cmd, then check <thing> at run-time. That's slightly more painful but is doable.

> * Making the case changing itself work: more tricky as there needs to be encoding-specific > code for the mappings conceptually (though not very efficient) I see this sequence LICR (in some encoding X) -> "LICR in TU" -> do casing -> "new LICR in TU" -> "new LICR in X" with probably a lot of headache but ...

My concern isn't really this, it's simple characters. For the former, presumably we can parse \@uclclist and ensure all the mappings are set up. The expl3 code starts from the assumption that we can work on UTF-8 for characters, so A-Za-z are themselves and the upper half of the 8-bit range is all \active. Now, it's possible that in LGR or whatever the changes to \lccode/\uccode mean things still 'appear' to work, but I'm not certain. My other concern is that if we are case-changing *text*, font encoding should be irrelevant as we don't know that at the point we do the case changing (expansion vs typesetting). Basically, I was imagining that we'd move toward use expl3 for \MakeUppercase *but* first we'd need testing. Joseph

car222222 · 2020-02-10T11:02:47Z

My other concern is that if we are case-changing text, font encoding
should be irrelevant as we don't know that at the point we do the case
changing (expansion vs typesetting).

An important point: the text may finally get typeset more than once, in different fonts worth maybe different encodings.

So best to think of this text module as dealing with pure text (Unicode streams encoded as utf-8). Very different from any text model used in 2e.

The latter model needs to be supported by the commands used with 2e text, but this support may not be appropriate for inclusion in this l3 model.

Different needs of different models. But the main Take Home is that precise and explicit definitions of such things as models, alphabets and syntax are important in software engineering.

josephwright · 2020-02-10T15:32:49Z

Suggested improved approach:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{expl3}[2020/01/12]
\makeatletter
%\cdp@elt {OML}{cmm}{m}{it}
\ExplSyntaxOn
\cs_set:Npn \__text_expand_cs:N #1
  {
    \str_if_eq:nnTF {#1} { \protect }
      { \__text_expand_protect:N }
      { \__text_expand_encoding:N #1 }
  }
\cs_new:Npn \__text_expand_encoding:N #1
  {
    \exp_after:wN \__text_expand_encoding:Nnnnn \exp_after:wN #1
      \cdp@elt { \q_recursion_tail } { } { } { } \q_recursion_stop
  }
\cs_new:Npn \__text_expand_encoding:Nnnnn #1#2#3#4#5
  {
    \quark_if_recursion_tail_stop_do:nn {#2} { \__text_expand_replace:N #1 }
    \str_if_eq:eeTF { \exp_not:N #1 } { \exp_not:c { #2 - cmd } }
      { \__text_expand_loop:w \__text_expand_textcomp:NN #1 }
      { \__text_expand_encoding:Nnnnn #1 }
  }
\AtBeginDocument
  {
    \cs_set:Npn \__text_expand_cs:N #1
      {
        \str_if_eq:nnTF {#1} { \protect }
          { \__text_expand_protect:N }
          { \__text_expand_replace:N #1 }
      }
    \cs_set_protected:Npn \cdp@elt #1#2#3#4
      {
        \text_declare_expand_equivalent:cn { #1 -cmd }
          { \__text_expand_textcomp:NN }
      }
    \cdp@list
  }
\ExplSyntaxOff
\makeatother

\ExplSyntaxOn
\def\test{\text_lowercase:n}
\ExplSyntaxOff

\usepackage{hyperref}

\begin{document}
\test{\i}
\end{document}

This checks dynamically for encoding in the preamble, and finalised the list for the document body. Thus load order should not be an issue: all available encodings are declared.

josephwright · 2020-02-10T16:15:32Z

I'll probably just add the above ...

u-fischer · 2020-02-10T16:38:59Z

Why is there \__text_expand_cs:N twice with slightly different definition?

josephwright · 2020-02-10T16:42:11Z

@u-fischer in the preamble, we have a dynamic \cdp@list which we therefore have to parse. Once we get to the start of the document, we can 'finalise' the list, which in any case is \@onlypreamble so vanishes if we don't capture it. I think I'll actually go for a sequence-based approach.

u-fischer · 2020-02-10T16:52:43Z

@josephwright I'm not sure if I understand the preamble reference. Are you processing \cdp@list in more places? When? It seems not to be when an encoding is defined, at least I can't do \usepackage{hyperref}\edef\temp{\test{\i}} in the preamble.

josephwright · 2020-02-10T17:19:54Z

@u-fischer There are a few gremlins in the above! Try

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{expl3}[2020/01/12]
\makeatletter
%\cdp@elt {OML}{cmm}{m}{it}
\ExplSyntaxOn
\cs_set:Npx \__text_expand_cs:N #1
  {
    \exp_not:N \str_if_eq:nnTF {#1} { \exp_not:N \protect }
      { \exp_not:N \__text_expand_protect:N }
      {
        \cs_if_exist:NTF \cdp@list
          { \exp_not:N \__text_expand_encoding:N #1 }
          { \exp_not:N \__text_expand_expand:N #1 }
      }
  }
\cs_if_exist:NT \cdp@list
  {
    \cs_new:Npn \__text_expand_encoding:N #1
      {
        \exp_after:wN \__text_expand_encoding:NNnnnn \exp_after:wN #1
          \cdp@list \q_recursion_tail { } { } { } { } \q_recursion_stop
      }
    \cs_new:Npn \__text_expand_encoding:NNnnnn #1#2#3#4#5#6
      {
        \quark_if_recursion_tail_stop_do:Nn #2 { \__text_expand_replace:N #1 }
        \str_if_eq:eeTF { \exp_not:N #1 } { \exp_not:c { #3 - cmd } }
          {
            \use_i_delimit_by_q_recursion_stop:nw
              { \__text_expand_loop:w \__text_expand_textcomp:NN #1 }
          }
          { \__text_expand_encoding:NNnnnn #1 }
      }
    \AtBeginDocument
      {
        \cs_set:Npn \__text_expand_cs:N #1
          {
            \str_if_eq:nnTF {#1} { \protect }
              { \__text_expand_protect:N }
              { \__text_expand_replace:N #1 }
          }
        \cs_set_protected:Npn \cdp@elt #1#2#3#4
          {
            \text_declare_expand_equivalent:cn { #1 -cmd }
              { \__text_expand_textcomp:NN }
          }
        \cdp@list
      }
  }
\ExplSyntaxOff
\makeatother

\ExplSyntaxOn
\def\test{\text_lowercase:n}
\ExplSyntaxOff

\usepackage{hyperref}

\edef\temp{\test{\i}}\show\temp

\begin{document}
\test{\i}
\end{document}

which then does work in the document preamble and in the body.

josephwright · 2020-02-10T17:21:54Z

@u-fischer Yes, the idea is to process \cdp@list in two places. In the preamble, read it during case changing to pick up the currently-valid encodings and allow for them. One we get to the start of the document, save that information as a series of control sequences, then alter the lookup as \cdp@list will no longer be available.

blefloch · 2020-02-10T17:34:21Z

Stupid question perhaps but in my very minimal tests it seems \T1-cmd, \OML-cmd etc only ever have two definitions. Could we simply test with \if_meaning:w?

josephwright · 2020-02-10T17:36:58Z

@blefloch Huh? I'm not sure what you mean

FrankMittelbach · 2020-02-10T17:59:07Z

@josephwright the definition is either "stay with the current encoding and use what follows" or change t oa different encoding and use what follows to determine what to do. I thin @blefloch is right it is basically 2 different meanings only only. it is either \@current@cmd or \@changed@cmd

josephwright · 2020-02-10T18:03:37Z

@FrankMittelbach, @blefloch Ah, right: so what you are getting at is rather than check for \<enc>-cmd directly, do an \ifx to see the token is one of \@current@cmd or \@changed@cmd, and if so branch for that?

FrankMittelbach · 2020-02-10T18:23:11Z

I only answered your "Huh" :-) but the point is regardless of how many encodings are loaded to figure out if something is an encoding-specific command all you need to do is to check against those two definitions. If that is enough later on, I don't know and I haven't checked what you code does in detail.

josephwright · 2020-02-10T18:41:24Z

@FrankMittelbach OK, I've checked something in that should do the job

moewew · 2020-02-10T20:41:47Z

Thank you very much for the quick fix.

I doubt I can manage to test the version from the master branch in reasonable time (does it involve rebuilding the formats?), but I will report back when the next release is out.

We are hoping to switch biblatex to the expl3 case changing functions, so I will be doing some more testing soon. Of course that means that we would be extremely happy if the intended scope of l3text would be as wide as possible.

moewew · 2020-02-17T16:59:04Z

Sorry for not getting back to you earlier. The MWE works fine after the update (but you knew that already), the larger document, where I noticed this issue originally also compiles fine.

I understand that in the following example (when compiled with pdfLaTeX) the UTF8 letters are not lowercased because they are not in T1, but it feels a bit odd that the macro versions are lowercased.

\documentclass{article}
\usepackage[T1,T2A]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{expl3}

\ExplSyntaxOn
\def\test{\text_lowercase:n}
\ExplSyntaxOff

\begin{document}
\test{\.I İ \CYRI И}
\end{document}

Obviously I would love to see the range of supported characters with pdfLaTeX extend beyond T1, but I accept that you have to draw the line somewhere.

josephwright · 2020-02-17T17:10:52Z

@moewew Nice example. Looking at the trace for \MakeLowercase, I think this can be covered with more effort. What I'll need to do is check for the \u:... definitions to get closer to \protected@edef, and to take out \IeC. That yields the same thing as the macro version, which is how this then works.

josephwright self-assigned this Feb 10, 2020

josephwright added bug Something isn't working expl3 labels Feb 10, 2020

josephwright added a commit that referenced this issue Feb 10, 2020

Allow for full range of encodings when expanding text (see #671)

b99866a

josephwright added a commit that referenced this issue Feb 10, 2020

More efficient handling of encodings (see #671)

3c84ec0

josephwright closed this as completed Feb 17, 2020

josephwright mentioned this issue Feb 17, 2020

Case changing for Cyrillic #675

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[l3text] Case change for \i breaks with hyperref #671

[l3text] Case change for \i breaks with hyperref #671

moewew commented Feb 10, 2020

josephwright commented Feb 10, 2020

u-fischer commented Feb 10, 2020

josephwright commented Feb 10, 2020

u-fischer commented Feb 10, 2020

josephwright commented Feb 10, 2020

FrankMittelbach commented Feb 10, 2020 via email

josephwright commented Feb 10, 2020

car222222 commented Feb 10, 2020 •

edited

Loading

FrankMittelbach commented Feb 10, 2020 via email

FrankMittelbach commented Feb 10, 2020

car222222 commented Feb 10, 2020 •

edited

Loading

car222222 commented Feb 10, 2020

FrankMittelbach commented Feb 10, 2020

josephwright commented Feb 10, 2020 via email

car222222 commented Feb 10, 2020

josephwright commented Feb 10, 2020

josephwright commented Feb 10, 2020

u-fischer commented Feb 10, 2020

josephwright commented Feb 10, 2020

u-fischer commented Feb 10, 2020

josephwright commented Feb 10, 2020

josephwright commented Feb 10, 2020

blefloch commented Feb 10, 2020 via email

josephwright commented Feb 10, 2020

FrankMittelbach commented Feb 10, 2020

josephwright commented Feb 10, 2020

FrankMittelbach commented Feb 10, 2020

josephwright commented Feb 10, 2020

moewew commented Feb 10, 2020

moewew commented Feb 17, 2020

josephwright commented Feb 17, 2020

[l3text] Case change for \i breaks with hyperref #671

[l3text] Case change for \i breaks with hyperref #671

Comments

moewew commented Feb 10, 2020

josephwright commented Feb 10, 2020

u-fischer commented Feb 10, 2020

josephwright commented Feb 10, 2020

u-fischer commented Feb 10, 2020

josephwright commented Feb 10, 2020

FrankMittelbach commented Feb 10, 2020 via email

josephwright commented Feb 10, 2020

car222222 commented Feb 10, 2020 • edited Loading

FrankMittelbach commented Feb 10, 2020 via email

FrankMittelbach commented Feb 10, 2020

car222222 commented Feb 10, 2020 • edited Loading

car222222 commented Feb 10, 2020

FrankMittelbach commented Feb 10, 2020

josephwright commented Feb 10, 2020 via email

car222222 commented Feb 10, 2020

josephwright commented Feb 10, 2020

josephwright commented Feb 10, 2020

u-fischer commented Feb 10, 2020

josephwright commented Feb 10, 2020

u-fischer commented Feb 10, 2020

josephwright commented Feb 10, 2020

josephwright commented Feb 10, 2020

blefloch commented Feb 10, 2020 via email

josephwright commented Feb 10, 2020

FrankMittelbach commented Feb 10, 2020

josephwright commented Feb 10, 2020

FrankMittelbach commented Feb 10, 2020

josephwright commented Feb 10, 2020

moewew commented Feb 10, 2020

moewew commented Feb 17, 2020

josephwright commented Feb 17, 2020

car222222 commented Feb 10, 2020 •

edited

Loading

car222222 commented Feb 10, 2020 •

edited

Loading