Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[l3text] Case change for \i breaks with hyperref #671

Closed
moewew opened this issue Feb 10, 2020 · 31 comments
Closed

[l3text] Case change for \i breaks with hyperref #671

moewew opened this issue Feb 10, 2020 · 31 comments
Assignees
Labels
bug Something isn't working

Comments

@moewew
Copy link
Contributor

moewew commented Feb 10, 2020

When compiled with pdfLaTeX (with LaTeX2e <2020-02-02> patch level 1, L3 programming layer <2020-02-08>) the following MWE fails for me

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{expl3}[2020/01/12]

\ExplSyntaxOn
\def\test{\text_lowercase:n}
\ExplSyntaxOff

\usepackage{hyperref}

\begin{document}
\test{\i}
\end{document}

with

! Undefined control sequence.
\GenericError  ...                                
                                                    #4  \errhelp \@err@     ...
l.13 \test{\i}
              
? X

Full log file dotlessihyperref.log

The problem goes away if hyperref is dropped.

@josephwright josephwright self-assigned this Feb 10, 2020
@josephwright josephwright added bug Something isn't working expl3 labels Feb 10, 2020
@josephwright
Copy link
Member

Ah, I forgot to cover the hyperref encodings: fix coming up.

@u-fischer
Copy link
Member

@josephwright it isn't only hyperref. This here fails too:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{expl3}[2020/01/12]

\ExplSyntaxOn
\def\test{\text_lowercase:n}
\ExplSyntaxOff

\DeclareTextSymbol{\i}{OT1}{25}
\begin{document}

\test{\i}

\end{document}

@josephwright
Copy link
Member

@u-fischer Well if you use OT1 you deserve what you get! More seriously, I'm not sure whether that should be fixed or not: the expl3 code tends to work on the basis of T1 with 8-bit engines, TU otherwise.

@u-fischer
Copy link
Member

@josephwright OT1 was only an example. It fails as soon as one try to add a definition for \i in some other encoding. (I started with PD1 from hyperref). E.g.

\documentclass{article}
\usepackage[LGR,T1]{fontenc}
\usepackage{expl3}[2020/01/12]

\ExplSyntaxOn
\def\test{\text_lowercase:n}
\ExplSyntaxOff

\DeclareTextSymbol{\i}{LGR}{25}
\begin{document}
%\tracingmacros=1
\test{\i}

\end{document}

@josephwright
Copy link
Member

@u-fischer Yes, but the general point stands: encodings other than T1 and TU, plus the hyperref ones, are really not supported, certainly by the case changer. We can add more encodings to the 'known' list, but they do have to be pre-defined: there's no way to pick up \<name>-cmd other than knowing it in advance.

If we want to go for 'all the obvious encodings', we can, but then we have to start to worry about putting in the case changing data too (LGR for example would be completely wrong at present).

@FrankMittelbach
Copy link
Member

FrankMittelbach commented Feb 10, 2020 via email

@josephwright
Copy link
Member

@FrankMittelbach Fair enough. Two parts to the issue

  • Making sure the expansion code is safe: easy as it just needs to know \<name>-cmd
  • Making the case changing itself work: more tricky as there needs to be encoding-specific
    code for the mappings

@car222222
Copy link
Contributor

car222222 commented Feb 10, 2020

if you want it to become a replacement for current \MakeUppercase then it has to, sorry

Sorry, but we do not want this to happen so this is not needed.
Whilst It could be that one day a non-OT1 version of \MakeUpperCase will use this fairly directly, the current functionality thereof will need to be provided by use of a list of LICRs, or whatever.

Note that I am making no comment here on what the l3 text processing module should provide, or how it should do so, but I am pointing out that these are distinct questions.

But It does not need to provide low-level replacements for anything.

@FrankMittelbach
Copy link
Member

FrankMittelbach commented Feb 10, 2020 via email

@FrankMittelbach
Copy link
Member

if you want it to become a replacement for current \MakeUppercase then it has to, sorry

Sorry, but we do not want this to happen. It could be that one day \

????

@car222222
Copy link
Contributor

car222222 commented Feb 10, 2020

if you want it to become a replacement for current \MakeUppercase then it has to, sorry
Sorry, but we do not want this to happen so this is not needed.
Whilst It could be that one day a non-OT1 version of \MakeUpperCase will use this fairly directly, the current functionality thereof will need to be provided by use of a list of LICRs, or whatever.

Note that I am making no comment here on what the l3 text processing module should provide, or how it should do so, but I am pointing out that these are distinct questions.

But It does not need to provide low-level replacements for anything.

@car222222
Copy link
Contributor

What are these bizarre beasts, and why are they supported: the hyperref ones ??

@FrankMittelbach
Copy link
Member

Two parts to the issue

* Making sure the expansion code is safe: easy as it just needs to know `\<name>-cmd`

which we get from \DeclareFontEncoding (I guess)

* Making the case changing itself work: more tricky as there needs to be encoding-specific
  code for the mappings

conceptually (though not very efficient) I see this sequence

LICR (in some encoding X) -> "LICR in TU" -> do casing -> "new LICR in TU" -> "new LICR in X"

with probably a lot of headache but ...

@josephwright
Copy link
Member

josephwright commented Feb 10, 2020 via email

@car222222
Copy link
Contributor

My other concern is that if we are case-changing text, font encoding
should be irrelevant as we don't know that at the point we do the case
changing (expansion vs typesetting).

An important point: the text may finally get typeset more than once, in different fonts worth maybe different encodings.

So best to think of this text module as dealing with pure text (Unicode streams encoded as utf-8). Very different from any text model used in 2e.

The latter model needs to be supported by the commands used with 2e text, but this support may not be appropriate for inclusion in this l3 model.

Different needs of different models. But the main Take Home is that precise and explicit definitions of such things as models, alphabets and syntax are important in software engineering.

@josephwright
Copy link
Member

Suggested improved approach:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{expl3}[2020/01/12]
\makeatletter
%\cdp@elt {OML}{cmm}{m}{it}
\ExplSyntaxOn
\cs_set:Npn \__text_expand_cs:N #1
  {
    \str_if_eq:nnTF {#1} { \protect }
      { \__text_expand_protect:N }
      { \__text_expand_encoding:N #1 }
  }
\cs_new:Npn \__text_expand_encoding:N #1
  {
    \exp_after:wN \__text_expand_encoding:Nnnnn \exp_after:wN #1
      \cdp@elt { \q_recursion_tail } { } { } { } \q_recursion_stop
  }
\cs_new:Npn \__text_expand_encoding:Nnnnn #1#2#3#4#5
  {
    \quark_if_recursion_tail_stop_do:nn {#2} { \__text_expand_replace:N #1 }
    \str_if_eq:eeTF { \exp_not:N #1 } { \exp_not:c { #2 - cmd } }
      { \__text_expand_loop:w \__text_expand_textcomp:NN #1 }
      { \__text_expand_encoding:Nnnnn #1 }
  }
\AtBeginDocument
  {
    \cs_set:Npn \__text_expand_cs:N #1
      {
        \str_if_eq:nnTF {#1} { \protect }
          { \__text_expand_protect:N }
          { \__text_expand_replace:N #1 }
      }
    \cs_set_protected:Npn \cdp@elt #1#2#3#4
      {
        \text_declare_expand_equivalent:cn { #1 -cmd }
          { \__text_expand_textcomp:NN }
      }
    \cdp@list
  }
\ExplSyntaxOff
\makeatother

\ExplSyntaxOn
\def\test{\text_lowercase:n}
\ExplSyntaxOff

\usepackage{hyperref}

\begin{document}
\test{\i}
\end{document}

This checks dynamically for encoding in the preamble, and finalised the list for the document body. Thus load order should not be an issue: all available encodings are declared.

@josephwright
Copy link
Member

I'll probably just add the above ...

@u-fischer
Copy link
Member

Why is there \__text_expand_cs:N twice with slightly different definition?

@josephwright
Copy link
Member

@u-fischer in the preamble, we have a dynamic \cdp@list which we therefore have to parse. Once we get to the start of the document, we can 'finalise' the list, which in any case is \@onlypreamble so vanishes if we don't capture it. I think I'll actually go for a sequence-based approach.

@u-fischer
Copy link
Member

@josephwright I'm not sure if I understand the preamble reference. Are you processing \cdp@list in more places? When? It seems not to be when an encoding is defined, at least I can't do \usepackage{hyperref}\edef\temp{\test{\i}} in the preamble.

@josephwright
Copy link
Member

@u-fischer There are a few gremlins in the above! Try

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{expl3}[2020/01/12]
\makeatletter
%\cdp@elt {OML}{cmm}{m}{it}
\ExplSyntaxOn
\cs_set:Npx \__text_expand_cs:N #1
  {
    \exp_not:N \str_if_eq:nnTF {#1} { \exp_not:N \protect }
      { \exp_not:N \__text_expand_protect:N }
      {
        \cs_if_exist:NTF \cdp@list
          { \exp_not:N \__text_expand_encoding:N #1 }
          { \exp_not:N \__text_expand_expand:N #1 }
      }
  }
\cs_if_exist:NT \cdp@list
  {
    \cs_new:Npn \__text_expand_encoding:N #1
      {
        \exp_after:wN \__text_expand_encoding:NNnnnn \exp_after:wN #1
          \cdp@list \q_recursion_tail { } { } { } { } \q_recursion_stop
      }
    \cs_new:Npn \__text_expand_encoding:NNnnnn #1#2#3#4#5#6
      {
        \quark_if_recursion_tail_stop_do:Nn #2 { \__text_expand_replace:N #1 }
        \str_if_eq:eeTF { \exp_not:N #1 } { \exp_not:c { #3 - cmd } }
          {
            \use_i_delimit_by_q_recursion_stop:nw
              { \__text_expand_loop:w \__text_expand_textcomp:NN #1 }
          }
          { \__text_expand_encoding:NNnnnn #1 }
      }
    \AtBeginDocument
      {
        \cs_set:Npn \__text_expand_cs:N #1
          {
            \str_if_eq:nnTF {#1} { \protect }
              { \__text_expand_protect:N }
              { \__text_expand_replace:N #1 }
          }
        \cs_set_protected:Npn \cdp@elt #1#2#3#4
          {
            \text_declare_expand_equivalent:cn { #1 -cmd }
              { \__text_expand_textcomp:NN }
          }
        \cdp@list
      }
  }
\ExplSyntaxOff
\makeatother

\ExplSyntaxOn
\def\test{\text_lowercase:n}
\ExplSyntaxOff

\usepackage{hyperref}

\edef\temp{\test{\i}}\show\temp

\begin{document}
\test{\i}
\end{document}

which then does work in the document preamble and in the body.

@josephwright
Copy link
Member

@u-fischer Yes, the idea is to process \cdp@list in two places. In the preamble, read it during case changing to pick up the currently-valid encodings and allow for them. One we get to the start of the document, save that information as a series of control sequences, then alter the lookup as \cdp@list will no longer be available.

@blefloch
Copy link
Member

blefloch commented Feb 10, 2020 via email

@josephwright
Copy link
Member

@blefloch Huh? I'm not sure what you mean

@FrankMittelbach
Copy link
Member

@josephwright the definition is either "stay with the current encoding and use what follows" or change t oa different encoding and use what follows to determine what to do. I thin @blefloch is right it is basically 2 different meanings only only. it is either \@current@cmd or \@changed@cmd

@josephwright
Copy link
Member

@FrankMittelbach, @blefloch Ah, right: so what you are getting at is rather than check for \<enc>-cmd directly, do an \ifx to see the token is one of \@current@cmd or \@changed@cmd, and if so branch for that?

@FrankMittelbach
Copy link
Member

I only answered your "Huh" :-) but the point is regardless of how many encodings are loaded to figure out if something is an encoding-specific command all you need to do is to check against those two definitions. If that is enough later on, I don't know and I haven't checked what you code does in detail.

@josephwright
Copy link
Member

@FrankMittelbach OK, I've checked something in that should do the job

@moewew
Copy link
Contributor Author

moewew commented Feb 10, 2020

Thank you very much for the quick fix.

I doubt I can manage to test the version from the master branch in reasonable time (does it involve rebuilding the formats?), but I will report back when the next release is out.

We are hoping to switch biblatex to the expl3 case changing functions, so I will be doing some more testing soon. Of course that means that we would be extremely happy if the intended scope of l3text would be as wide as possible.

@moewew
Copy link
Contributor Author

moewew commented Feb 17, 2020

Sorry for not getting back to you earlier. The MWE works fine after the update (but you knew that already), the larger document, where I noticed this issue originally also compiles fine.

I understand that in the following example (when compiled with pdfLaTeX) the UTF8 letters are not lowercased because they are not in T1, but it feels a bit odd that the macro versions are lowercased.

\documentclass{article}
\usepackage[T1,T2A]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{expl3}

\ExplSyntaxOn
\def\test{\text_lowercase:n}
\ExplSyntaxOff

\begin{document}
\test{\.I İ \CYRI И}
\end{document}

Obviously I would love to see the range of supported characters with pdfLaTeX extend beyond T1, but I accept that you have to draw the line somewhere.

@josephwright
Copy link
Member

@moewew Nice example. Looking at the trace for \MakeLowercase, I think this can be covered with more effort. What I'll need to do is check for the \u:... definitions to get closer to \protected@edef, and to take out \IeC. That yields the same thing as the macro version, which is how this then works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants