# Default to UTF-8 in 8-bit TeX #24

Closed
opened this Issue Mar 25, 2018 · 17 comments

Projects
None yet
5 participants
Contributor

### davidcarlisle commented Mar 25, 2018

 Placeholder issue for updating LaTeX to default to \usepackage[utf8]{inputenc} processing.

### davidcarlisle added a commit that referenced this issue Mar 25, 2018

 update to default to UTF-8 see issue #24 
 4090b0b 
Contributor

### davidcarlisle commented Mar 25, 2018

 @aminophen If defaulting utf8 inputenc causes issues with *platex processing can you let us know, I did some basic sanity checks but it is possible I missed something.
Contributor

### aminophen commented Mar 25, 2018

 Thanks for information; ok, I’ll see it later

Closed

### davidcarlisle added a commit that referenced this issue Mar 25, 2018

 do not change the default encoding with mltex (issue #24) 
 fd8ef2a 
Contributor

### aminophen commented Mar 25, 2018

 I examined whether something incompatible happens or not, and found one: the following source compiles fine with latex + dvips + ps2pdf on current TeX Live (at least on macOS 10.11) \documentclass{article} \usepackage{graphicx} \begin{document} [\includegraphics{image-å.eps}] \end{document} but if I add \usepackage[utf8]{inputenc} then LaTeX Warning: File image-\IeC {\r a}.eps' not found on input line 5. ! Missing \endcsname inserted. \unhbox  The workaround is adding \usepackage[space]{grffile} \grffilesetup{ encoding, inputencoding=utf8, filenameencoding=utf8, } (Though I don't think anybody would have done like this, but just information)
Contributor

### davidcarlisle commented Mar 25, 2018

 @aminophen thanks for this comment. there were some plans to have a more extensive change that would also make utf8 filenames work (generally not just in graphics) work "out of the box" but we decided to be a bit more cautious here and "just" work as if inputenc had been used. Another way, without using grfffile (and would work with \input etc) would be to use \includegraphics{\detokenize{image-å}.eps} But maybe we can adjust things here (at the very least we should document this in the notes for this release) thanks for the example.

### davidcarlisle added a commit that referenced this issue Mar 25, 2018

 update test results for issue #24 
 0352e3e 
Contributor

### aminophen commented Mar 26, 2018

 Now we (Japanese) concluded that there would be no issue for “valid” input sources for platex and uplatex. In pTeX/upTeX, Latin characters and non-Latin (= CJK) characters are distinguished from each other (based on the \kcatcode value) during tokenization. It should mean that inputenc status does not affect CJK token processing. There might be a few users who need to make adjustments; those who have relied on the default will get an error (for example) with Shift_JIS half-width katakana characters. However, the raw input of such characters is no more “valid” than that of Latin-1, so we should ignore that.
Contributor

### davidcarlisle commented Mar 26, 2018

 Thank you for that, so that sounds good. … On 26 March 2018 at 13:45, Hironobu Yamashita ***@***.***> wrote: Now we (Japanese) concluded that there would be *no* issue for “valid” input sources for platex and uplatex. In pTeX/upTeX, Latin characters and non-Latin (= CJK) characters are distinguished from each other (based on the \kcatcode value) *during tokenization*. It should mean that inputenc status does not affect CJK token processing. There might be a few users who need to make adjustments; those who have relied on the default will get an error (for example) with Shift_JIS half-width *katakana* characters. However, the raw input of such characters is no more “valid” than that of Latin-1, so we should ignore that. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#24 (comment)>, or mute the thread .

Closed

Contributor

### aminophen commented Apr 6, 2018

 Slightly off-topic but let me ask a simple question: is it known to ucs.sty authors that utf8x.def (or, the package "ucs") is incompatible with the new UTF-8 LaTeX? There had been an old issue that ucs.sty and utf8.def cannot be mixed. It rarely happened before, but now it matters since utf8.def is already loaded by LaTeX. \documentclass{article} \usepackage[utf8x]{inputenc} \usepackage[LY1]{fontenc} \begin{document} La lingvo {\TeX} estas danĝera, ne alproksimiĝu! \end{document} This causes "Missing \begin{document}" error with new UTF-8 LaTeX. Both utf8.def and utf8x.def (or ucs.sty) provides a command \DeclareUnicodeCharacter, and the usage is incompatible (the utf8x.def one requires decimal charcode instead of hexadecimal). utf8.def redefines \DeclareFontEncoding to load a .dfu file, but utf8x.def does not handle that; this ends up reading wrong usage (in respect of utf8x.def) of \DeclareUnicodeCharacter.
Contributor

### FrankMittelbach commented Apr 6, 2018

 thanks @aminophen for the heads up. No, they aren't ware and we overlooked that case. I have added code to handle utf8x specially so the above should work now. Midterm they should change slightly (might be enough to use \UseRawInputEncoding in utf8x. We'll contact them.
Contributor

### aminophen commented Apr 6, 2018

 I have added code to handle utf8x specially so the above should work now. Confirmed, thank you very much. Midterm they should change slightly (might be enough to use \UseRawInputEncoding in utf8x. We'll contact them. Sounds good!
Contributor

### aminophen commented Apr 6, 2018

 Checked 7e5a425 and noticed that \RequirePackage[2017/04/15]{latexrelease} \documentclass{article} \usepackage[utf8x]{inputenc} causes "! Undefined control sequence." for \UseRawInputEncoding
Contributor

### FrankMittelbach commented Apr 6, 2018

 good catch :-(
Contributor

### aminophen commented Apr 6, 2018

 \DeclareOption{utf8x}{\ifdefined\UseRawInputEncoding \UseRawInputEncoding \fi \inputencoding{\CurrentOption}}  maybe?

### moewew commented Apr 7, 2018

 I was just looking at this for biblatex (plk/biblatex#734) and I was wondering what the expected output for \UseRawInputEncoding \documentclass{article} \begin{document} \inputencodingname \end{document} was. On my tests with TL2018 this somewhat unexpectedly gave "utf8". Should \UseRawInputEncoding maybe also do \let\inputencodingname\@undefined or am I missing something here?
Contributor

### davidcarlisle commented Apr 7, 2018

 Yes I think it should, that, we'll get that in the next patch release as a workaround if you detect the encodingname is utf8 you could check the catcode of any 8bit character and if it's 12 then inputenc has been disabled, but \userawInputEncoding undefining encodingname (or defining it to raw ? would be more natural

### moewew commented Apr 7, 2018

 Ah, good. Thank you for the quick reply. For biblatex it would be more convenient if \UseRawInputEncoding were to do \let\inputencodingname\@undefined rather than setting it to a value such as raw. But I'm not sure about other packages and their uses for \inputencodingname.

### davidcarlisle added a commit that referenced this issue Apr 7, 2018

 undefine \inputencodingname in RawInputEncoding for issue #24 
 10f0961 `
Contributor

### davidcarlisle commented Apr 8, 2018

 @moewew we have (with exceptional service from the ctan team) got a 2018-04-01 patch level 1 release on to ctan which undefines \inputencodingname if rolling back via latexrelease package or via \UseRawInputEncoding

Closed

### moewew commented Apr 8, 2018

 Brilliant! Thank you very much.

Closed