-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
robustifying \ensureascii and adding \asciiensure #263
Conversation
See this conversation for a use of |
When The original explanation for this set of macros was:
Which clearly shows how accessory and linked to encodings like
I don’t fully understand the last two comments in the liked discussion. Can you provide an example? |
The main issue Günter, the maintainer of babel-greek, and I are facing is how to set the correct font encoding when switching from hebrew/greek to another language. The example file I posted at the start of the linked ticked produce the following code (process with pdfTeX): \documentclass[english,greek]{article}
\usepackage[T1,LGR]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{babel}
\begin{document}
Hello
\selectlanguage{english}
Hello
\end{document} As you can see, the font encoding did not switch to an ascii compatible one, so the output is wrong. There are several ways to solve it. I suggested Günter to use A second option would be to require from the users to load encodings such as As a third option, we can require from all the And lastly we can drop all support from Just in case it will be useful, here is how some of the non-latin languages are currently dealing with this problem Hebrew: \if@rl%
\let\encodingdefault=\lr@encodingdefault%
\fi%
\fontencoding{\encodingdefault}%
\selectfont%
\@rlfalse which is not really a good solution (I'm not sure why it is a part of Arabic: \addto\noextrasarabic{%
\@rlfalse
\@arabicfalse
\latintext\normalfont %enough ??
% Restore the lplain.tex penalties??
\hyphenpenalty=50%
\binoppenalty=700%
\relpenalty=500%
} Which I'm not if this is really good, maybe with Greek: \def\BabelGreekRestoreFontEncoding{%
\ifx\cf@encoding\BabelGreekPreviousFontEncoding
\else
\let\encodingdefault\BabelGreekPreviousFontEncoding
\fontencoding{\encodingdefault}\selectfont
\fi
}
\addto\extrasgreek{%
\let\BabelGreekPreviousFontEncoding\cf@encoding
\greekscript} Which is facing the problem demonstrated above. |
In any case, guessing what should be the encoding when exiting the language is hard, the best solution would be if each language will ensure the correct encoding for itself. If it will be the case there can be a uniform solution, that can be part of the interface provided by |
Interestingly, if you don’t load explicitly |
do you mean using \documentclass{article}
\usepackage[T1,LGR]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[greek]{babel}
\babelprovide[import]{english}
\begin{document}
Hello
\selectlanguage{english}
Hello
\end{document} I get This is with version
It looks as if each language maintainer dealt with it differently. One of the main advantages of the new
as always... |
No, no, without any explicit declaration (see the manual, sec. “Mostly monolingual documents”): \documentclass[greek]{article}
\usepackage[T2A,T1,LGR]{fontenc}
\usepackage{babel}
\begin{document}
Ελληνικά \foreignlanguage{bulgarian}{български} Ελληνικά
\selectlanguage{english}
English \foreignlanguage{greek}{Ελληνικά} English
\end{document} |
a) The problem is new: Up to 2023/03/04,
b) It only happens if LGR is the document's main font encoding (loaded as last font encoding with fontenc):
¹ Simple in hand-authored documents but requires special-casing in LyX. |
Intresting.
Actually, while |
The font encodings for each language is declared in the corresponding .ini file,
If I understand correctly, this is what I tried to do with |
The rules are here: https://latex3.github.io/babel/news/whats-new-in-babel-3.84.html. Perhaps a new rule is necessary: if no encoding is found, fallback to Something like:
doesn’t work if the main encoding is I’m thinking of some solution – perhaps making semi-public the code for the on-the-fly loading, so that it can be used in In the meanwhile, well, we have a non standard encoding, loaded in a non standard way, so a non standard solution (with a deprecated macro) doesn’t seem too severe 🙂. Note the ASCII encoding can be, for example, T2A (for Cyrillic), because the range 32-127 are the ASCII characters, and therefore it’s not the same as a Latin encoding. In fact, a document can have several Latin encodings ( |
You probably need an additional test:
|
Generally, a Babel language should save/restore the previous state. However, with LGR, we have an exception: We know that is is almost always an error to use LGR after switching to a secondary language in a Greek document. I propose a test "AtBeginDocument":
For this, I need a) to know how to check in an *.ldf file whether "greek" is the main language of the document, and For b), it would be good to re-use the logic hidden in the I don't need a font-encoding-changing command for greek.ldf, only the font encoding's name. ¹ Actually, it would be OK if the main language is belarusian, bulgarian, |
Generally, but not in this case, before when the first (implicit)
|
Babel-greek cannot use the font settings from ".ini" files, as it does not know which language is used after exiting "greek".
The repository version of babel-greek has now implemented a fallback solution for the case that the |
This is indeed the problem. I’m working on extending the encoding selector to
👌 |
The problem is even more basic: We have several possible scenarios for the font encoding switches: traditional: All languages can savely assume OT1, a standard text font encoding (T1, T2A, ...), or compatible (LY, QX). clean up after you: switch if required, switch back to previous font encoding when leaving (greek, hebrew). check before use: switch to one of the supported font encodings (imported languages, languages on-the-fly). Changing the scenario has consequences when document authors or classes select a special font encoding (e.g. QX for Polish or L7x for Lithuanian). It may render documents uncompilable or lead to strange font substitutions after a section in a "foreign" language. For backwards compatibility reasons, I would recommend to stick with the traditional scenario. I am considering whether to revert "greek.ldf" to the traditional scenario, too. If deemed generally useful, the check before use scenario may be offerd as an opt-in variant (but this would not help with "greek.ldf" to decide which font encoding to switch to when leavin "greek"). ²macedonian.ldf has a line |
@gmilde Good analysis. The ‘traditional’ way is fine for me. |
The example below shows the problem with the "traditional" approach:
This is why I prefer the clean up after you scenario. |
Digression. For the completeness sake, it is not only about the hyphenation. It is about the font itself as well, check the following example and the differences. \documentclass{article}
\usepackage{lmodern}
\usepackage[L7x,T1]{fontenc}
\usepackage[utf8]{inputenc}
\input{glyphtounicode}% EDIT
\pdfgentounicode=1% EDIT
\begin{document}
T1
\fontencoding{T1}\selectfont
ŲųĮįĄąĘę
L7x
\fontencoding{L7x}\selectfont
ŲųĮįĄąĘę
\end{document} |
To fix the issue with non-Greek text parts in Greek documents, Unfortunately, the fix has to make use of the deprecated The other two issues don't apply for theuse in
The |
Yes, pre-composed characters have several advantages. One more: drag and drop from the PDF generated by your example:
This also affects text search in the PDF. One example where babel-greek using the OTOH, the workaround to use Edited: I mixed the problematic font encoding order and the fix. Corrected. |
@gmilde I added two more lines in my original comment that resolve the copy/paste issues. EDIT: Actually, they do not. Good point! |
Problems related to how the I'm closing this pull request because it has been merged (partially) by hand. With existing
There is no encoding to switch back, and the new encoding isn’t known until either Vietnamese or English is selected. |
@gmilde I just want to verify one thing (using Ų and LM, for example)...
For L7x we have the following "chain": \DeclareUnicodeCharacter{0172}{\k U}
=>
\DeclareTextCommand{\k}{L7x}[1]{\oalign{\null#1\crcr\hidewidth\char12}}
% the latter is then "overridden" by (which uses the pre-composed glyph for Ų)
\DeclareTextComposite{\k}{L7x}{U}{216}% "D8
=>
% and has a name
/enclml7x[... /Uogonek ...] % at position 216 On the other hand, for T1 we have only \DeclareUnicodeCharacter{0172}{\k U}
=>
\DeclareTextCommand{\k}{T1}[1]{\hmode@bgroup\ooalign{\null#1\crcr\hidewidth\char12}\egroup}
% and there is no \DeclareTextComposite{\k} for U defined so the latter is used to mimic the Ų
% additionally, /enclmec[...] does not contain anything about Ų which is insufficient. On top of everything, \pdfglyphtounicode{Uogonek}{0172} as Am I right? |
I don't know the details of For languages that use accented letters, I would recommend the "*.ldf" file to switch to an encoding with Example
vs.
|
Although Polish is covered by I’m still ruminating about this whole |
I thought it will be useful to have
\asciiensure
(similar to\latintext
). I also think\ensureascii
should be robust as it is not expandable (maybe some variant of\protected
will be preferable? I'm not really sure what is the best way to protect macros in LaTeX these days, but\DeclareRobustCommand
is how\latintext
is defined...)BTW, why is
\latintext
considered deprecated? just because it does not consider all font encodings or is there anything more fundamental?