-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[l3text] Case change for \i breaks with hyperref #671
Comments
Ah, I forgot to cover the |
@josephwright it isn't only hyperref. This here fails too:
|
@u-fischer Well if you use OT1 you deserve what you get! More seriously, I'm not sure whether that should be fixed or not: the |
@josephwright OT1 was only an example. It fails as soon as one try to add a definition for \i in some other encoding. (I started with PD1 from hyperref). E.g.
|
@u-fischer Yes, but the general point stands: encodings other than If we want to go for 'all the obvious encodings', we can, but then we have to start to worry about putting in the case changing data too ( |
Am 10.02.20 um 09:43 schrieb Joseph Wright:
@u-fischer <https://github.com/u-fischer> Well if you use OT1 you
deserve what you get! More seriously, I'm not sure whether that should
be fixed or not: the |expl3| code tends to work on the basis of |T1|
with 8-bit engines, |TU| otherwise.
if you want it to become a replacement for current \MakeUppercase then
it has to, sorry
|
@FrankMittelbach Fair enough. Two parts to the issue
|
Sorry, but we do not want this to happen so this is not needed. Note that I am making no comment here on what the l3 text processing module should provide, or how it should do so, but I am pointing out that these are distinct questions. But It does not need to provide low-level replacements for anything. |
Am 10.02.20 um 09:50 schrieb Joseph Wright:
@u-fischer <https://github.com/u-fischer> Yes, but the general point
stands: encodings other than |T1| and |TU|, plus the |hyperref| ones,
are really not supported, certainly by the case changer. We can add more
encodings to the 'known' list, but they do have to be pre-defined:
there's no way to pick up |\<name>-cmd| other than knowing it in advance.
shouldn't you be able to detect if something is an encoding specific
command in a general way?
I really don't think it would fly if \MakeUppercase was able to handle
Cyrillic LGR OT1 ... and that would stop working
Maybe it is enough to restrict to "known established encodings" but in
theory anything that comes along via
\DeclareFontEncoding
should be treated (somehow).
|
???? |
Note that I am making no comment here on what the l3 text processing module should provide, or how it should do so, but I am pointing out that these are distinct questions. But It does not need to provide low-level replacements for anything. |
What are these bizarre beasts, and why are they supported: the hyperref ones ?? |
which we get from \DeclareFontEncoding (I guess)
conceptually (though not very efficient) I see this sequence LICR (in some encoding X) -> "LICR in TU" -> do casing -> "new LICR in TU" -> "new LICR in X" with probably a lot of headache but ... |
On 10/02/2020 09:12, Frank Mittelbach wrote:
> Two parts to the issue
>
> * Making sure the expansion code is safe: easy as it just needs to know `\<name>-cmd`
which we get from \DeclareFontEncoding (I guess)
Currently I've just got a block of hard-coded statements
\cs_new:Npn \@@_expand_textcomp:NN #1#2 { \exp_not:n {#1} }
\text_declare_expand_equivalent:cn { ?-cmd } { \@@_expand_textcomp:NN }
\text_declare_expand_equivalent:cn { T1-cmd } { \@@_expand_textcomp:NN }
\text_declare_expand_equivalent:cn { TS1-cmd } { \@@_expand_textcomp:NN }
\text_declare_expand_equivalent:cn { TU-cmd } { \@@_expand_textcomp:NN }
We could generate additional entries automatically, though presumably
this would be \AtBeginDocument so would be problematic in the preamble.
Of course, \MakeUppercase is not expandable; it's unlikely that people
are doing
\MakeUppercase{\title{fo\aa{}}}
in the preamble with encoding-specific commands in the text.
The alternative would be to look at each cs token and look for
\<thing>-cmd, then check <thing> at run-time. That's slightly more
painful but is doable.
> * Making the case changing itself work: more tricky as there needs to be encoding-specific
> code for the mappings
conceptually (though not very efficient) I see this sequence
LICR (in some encoding X) -> "LICR in TU" -> do casing -> "new LICR in TU" -> "new LICR in X"
with probably a lot of headache but ...
My concern isn't really this, it's simple characters. For the former,
presumably we can parse \@uclclist and ensure all the mappings are set up.
The expl3 code starts from the assumption that we can work on UTF-8 for
characters, so A-Za-z are themselves and the upper half of the 8-bit
range is all \active. Now, it's possible that in LGR or whatever the
changes to \lccode/\uccode mean things still 'appear' to work, but I'm
not certain.
My other concern is that if we are case-changing *text*, font encoding
should be irrelevant as we don't know that at the point we do the case
changing (expansion vs typesetting).
Basically, I was imagining that we'd move toward use expl3 for
\MakeUppercase *but* first we'd need testing.
Joseph
|
An important point: the text may finally get typeset more than once, in different fonts worth maybe different encodings. So best to think of this text module as dealing with pure text (Unicode streams encoded as utf-8). Very different from any text model used in 2e. The latter model needs to be supported by the commands used with 2e text, but this support may not be appropriate for inclusion in this l3 model. Different needs of different models. But the main Take Home is that precise and explicit definitions of such things as models, alphabets and syntax are important in software engineering. |
Suggested improved approach: \documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{expl3}[2020/01/12]
\makeatletter
%\cdp@elt {OML}{cmm}{m}{it}
\ExplSyntaxOn
\cs_set:Npn \__text_expand_cs:N #1
{
\str_if_eq:nnTF {#1} { \protect }
{ \__text_expand_protect:N }
{ \__text_expand_encoding:N #1 }
}
\cs_new:Npn \__text_expand_encoding:N #1
{
\exp_after:wN \__text_expand_encoding:Nnnnn \exp_after:wN #1
\cdp@elt { \q_recursion_tail } { } { } { } \q_recursion_stop
}
\cs_new:Npn \__text_expand_encoding:Nnnnn #1#2#3#4#5
{
\quark_if_recursion_tail_stop_do:nn {#2} { \__text_expand_replace:N #1 }
\str_if_eq:eeTF { \exp_not:N #1 } { \exp_not:c { #2 - cmd } }
{ \__text_expand_loop:w \__text_expand_textcomp:NN #1 }
{ \__text_expand_encoding:Nnnnn #1 }
}
\AtBeginDocument
{
\cs_set:Npn \__text_expand_cs:N #1
{
\str_if_eq:nnTF {#1} { \protect }
{ \__text_expand_protect:N }
{ \__text_expand_replace:N #1 }
}
\cs_set_protected:Npn \cdp@elt #1#2#3#4
{
\text_declare_expand_equivalent:cn { #1 -cmd }
{ \__text_expand_textcomp:NN }
}
\cdp@list
}
\ExplSyntaxOff
\makeatother
\ExplSyntaxOn
\def\test{\text_lowercase:n}
\ExplSyntaxOff
\usepackage{hyperref}
\begin{document}
\test{\i}
\end{document} This checks dynamically for encoding in the preamble, and finalised the list for the document body. Thus load order should not be an issue: all available encodings are declared. |
I'll probably just add the above ... |
Why is there |
@u-fischer in the preamble, we have a dynamic |
@josephwright I'm not sure if I understand the preamble reference. Are you processing |
@u-fischer There are a few gremlins in the above! Try \documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{expl3}[2020/01/12]
\makeatletter
%\cdp@elt {OML}{cmm}{m}{it}
\ExplSyntaxOn
\cs_set:Npx \__text_expand_cs:N #1
{
\exp_not:N \str_if_eq:nnTF {#1} { \exp_not:N \protect }
{ \exp_not:N \__text_expand_protect:N }
{
\cs_if_exist:NTF \cdp@list
{ \exp_not:N \__text_expand_encoding:N #1 }
{ \exp_not:N \__text_expand_expand:N #1 }
}
}
\cs_if_exist:NT \cdp@list
{
\cs_new:Npn \__text_expand_encoding:N #1
{
\exp_after:wN \__text_expand_encoding:NNnnnn \exp_after:wN #1
\cdp@list \q_recursion_tail { } { } { } { } \q_recursion_stop
}
\cs_new:Npn \__text_expand_encoding:NNnnnn #1#2#3#4#5#6
{
\quark_if_recursion_tail_stop_do:Nn #2 { \__text_expand_replace:N #1 }
\str_if_eq:eeTF { \exp_not:N #1 } { \exp_not:c { #3 - cmd } }
{
\use_i_delimit_by_q_recursion_stop:nw
{ \__text_expand_loop:w \__text_expand_textcomp:NN #1 }
}
{ \__text_expand_encoding:NNnnnn #1 }
}
\AtBeginDocument
{
\cs_set:Npn \__text_expand_cs:N #1
{
\str_if_eq:nnTF {#1} { \protect }
{ \__text_expand_protect:N }
{ \__text_expand_replace:N #1 }
}
\cs_set_protected:Npn \cdp@elt #1#2#3#4
{
\text_declare_expand_equivalent:cn { #1 -cmd }
{ \__text_expand_textcomp:NN }
}
\cdp@list
}
}
\ExplSyntaxOff
\makeatother
\ExplSyntaxOn
\def\test{\text_lowercase:n}
\ExplSyntaxOff
\usepackage{hyperref}
\edef\temp{\test{\i}}\show\temp
\begin{document}
\test{\i}
\end{document} which then does work in the document preamble and in the body. |
@u-fischer Yes, the idea is to process |
Stupid question perhaps but in my very minimal tests it seems \T1-cmd,
\OML-cmd etc only ever have two definitions. Could we simply test with
\if_meaning:w?
|
@blefloch Huh? I'm not sure what you mean |
@josephwright the definition is either "stay with the current encoding and use what follows" or change t oa different encoding and use what follows to determine what to do. I thin @blefloch is right it is basically 2 different meanings only only. it is either |
@FrankMittelbach, @blefloch Ah, right: so what you are getting at is rather than check for |
I only answered your "Huh" :-) but the point is regardless of how many encodings are loaded to figure out if something is an encoding-specific command all you need to do is to check against those two definitions. If that is enough later on, I don't know and I haven't checked what you code does in detail. |
@FrankMittelbach OK, I've checked something in that should do the job |
Thank you very much for the quick fix. I doubt I can manage to test the version from the master branch in reasonable time (does it involve rebuilding the formats?), but I will report back when the next release is out. We are hoping to switch |
Sorry for not getting back to you earlier. The MWE works fine after the update (but you knew that already), the larger document, where I noticed this issue originally also compiles fine. I understand that in the following example (when compiled with pdfLaTeX) the UTF8 letters are not lowercased because they are not in \documentclass{article}
\usepackage[T1,T2A]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{expl3}
\ExplSyntaxOn
\def\test{\text_lowercase:n}
\ExplSyntaxOff
\begin{document}
\test{\.I İ \CYRI И}
\end{document} Obviously I would love to see the range of supported characters with pdfLaTeX extend beyond |
@moewew Nice example. Looking at the trace for |
When compiled with pdfLaTeX (with
LaTeX2e <2020-02-02> patch level 1, L3 programming layer <2020-02-08>
) the following MWE fails for mewith
Full log file dotlessihyperref.log
The problem goes away if
hyperref
is dropped.The text was updated successfully, but these errors were encountered: