Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-ascii file name regression #32

Closed
edocevoli opened this issue Apr 8, 2018 · 9 comments
Closed

non-ascii file name regression #32

edocevoli opened this issue Apr 8, 2018 · 9 comments

Comments

@edocevoli
Copy link

Brief outline of the bug

One of my test cases breaks with the the current 2018-04-01 release.

What I have done:

  1. update only the LaTeX base package (MiKTeX: ltxbase)
  2. open a command-prompt window and then:
chcp 65001    
pdflatex tèst.tex

This gives:

This is pdfTeX, Version 3.14159265-2.6-1.40.19 (MiKTeX 2.9.6655 NEXT 64-bit)
entering extended mode
! I can't find file `./t'.
<to be read again> 
                   \global 
<*> ./tè
         st.tex
Please type another input file name: 
! Emergency stop.
<to be read again> 
                   \global 
<*> ./tè
         st.tex
!  ==> Fatal error occurred, no output PDF file produced!
Transcript written on texput.log.

Maybe this is a MiKTeX-specific Windows bug. I will do further tests on macOS and Linux.

Minimal example showing the bug

\documentclass{article}
\begin{document}
Hallo Welt
\end{document}

Log file (required) and possibly PDF file

texput.log

@aminophen
Copy link
Contributor

Already reported by me and \detokenize should be used to avoid that (from #24 (comment))

@josephwright
Copy link
Member

@aminophen Not quite that simple ... but I have to say I'm surprised that the binaries treat the file name argument at the TeX level (it's not \input tést, after all).

@josephwright
Copy link
Member

Something like pdflatex \input\detokenize{tést}\relax works, but that's not ideal.

@aminophen
Copy link
Contributor

the binaries treat the file name argument at the TeX level

Any arguments to *tex is treated as TeX code;-) When the first token is a character, *tex treats it as if \input is prefixed; when the first token is a control sequence, \input is not prefixed.

@josephwright
Copy link
Member

@aminophen I have to say I've always imagined the logic differently :) 'If the first char is the escape char, treat as TeX code, otherwise read as a filename'

@davidcarlisle
Copy link
Member

davidcarlisle commented Apr 8, 2018 via email

@aminophen
Copy link
Contributor

aminophen commented Apr 8, 2018

Delaying utf8.def etc. to \everyjob might be an only solution to this (not tested well for all engines, and if so I will have to adjust platex as well)

(edit: it will change a log filename opened by $pdflatex \\relax to utf8.log instead of texput.log)

--- latex.ltx.orig	2018-04-07 06:33:45.000000000 +0900
+++ latex.ltx	2018-04-08 19:15:09.000000000 +0900
@@ -8641,12 +8641,6 @@
 \catcode10=12 % ctrl J
 \catcode12=13 % ctrl L
 \catcode13=5  % newline
-\@tempcnta=128
-\loop
-  \catcode\@tempcnta=13
-  \advance\@tempcnta\@ne
-\ifnum\@tempcnta<256
-\repeat
 \def\UseRawInputEncoding{%
 \let\DeclareFontEncoding@\DeclareFontEncoding@saved   % revert
 \let\DeclareUnicodeCharacter\@undefined               % revert
@@ -8669,10 +8663,6 @@
 \repeat
 }
 \let\DeclareFontEncoding@saved\DeclareFontEncoding@
-\edef\inputencodingname{utf8}%
-\input{utf8.def}
-\let\@inpenc@test\@undefined
-\let\saved@space@catcode\@undefined
 \else
 \@tempcnta=0
 \loop
@@ -8793,6 +8783,18 @@
   \endgroup}
 \let\@filelist\@gobble
 \def\@addtofilelist#1{\xdef\@filelist{\@filelist,#1}}%
+\everyjob\expandafter{\the\everyjob
+\@tempcnta=128
+\loop
+  \catcode\@tempcnta=13
+  \advance\@tempcnta\@ne
+\ifnum\@tempcnta<256
+\repeat
+\edef\inputencodingname{utf8}%
+\input{utf8.def}
+\let\@inpenc@test\@undefined
+\let\saved@space@catcode\@undefined
+}
 \makeatother
 \errorstopmode
 \dump

@davidcarlisle
Copy link
Member

davidcarlisle commented Apr 8, 2018

@aminophen yes I'm actually currently running some tests with ltfinal changed as

%    \begin{macrocode}
\edef\inputencodingname{utf8}%
\input{utf8.def}
\let\UTFviii@two@octets@@\UTFviii@two@octets
\long\def\UTFviii@two@octets#1#2{\string#1\string#2}
\everyjob\expandafter{\the\everyjob
\let\UTFviii@two@octets\UTFviii@two@octets@@
}
\let\@inpenc@test\@undefined
\let\saved@space@catcode\@undefined
%    \end{macrocode}

would need the longer cases as well, not just the two byte of course. delaying the catcode activation until everyjob would work on the commandline but if we can make it work without that it may give a path to accepting utf8 filenames more generally in the document (which did not work in previous releases after inputenc was loaded)

@aminophen
Copy link
Contributor

724013b works as expected for pdfLaTeX; I commited a support for that change in pLaTeX texjporg/platex@8b6c518 and it’s ok on both pLaTeX and upLaTeX. I’ll upload the new version of pLaTeX, when LaTeX is ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants