New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-ascii file name regression #32

Closed
edocevoli opened this Issue Apr 8, 2018 · 9 comments

Comments

Projects
None yet
4 participants
@edocevoli

edocevoli commented Apr 8, 2018

Brief outline of the bug

One of my test cases breaks with the the current 2018-04-01 release.

What I have done:

  1. update only the LaTeX base package (MiKTeX: ltxbase)
  2. open a command-prompt window and then:
chcp 65001    
pdflatex tèst.tex

This gives:

This is pdfTeX, Version 3.14159265-2.6-1.40.19 (MiKTeX 2.9.6655 NEXT 64-bit)
entering extended mode
! I can't find file `./t'.
<to be read again> 
                   \global 
<*> ./tè
         st.tex
Please type another input file name: 
! Emergency stop.
<to be read again> 
                   \global 
<*> ./tè
         st.tex
!  ==> Fatal error occurred, no output PDF file produced!
Transcript written on texput.log.

Maybe this is a MiKTeX-specific Windows bug. I will do further tests on macOS and Linux.

Minimal example showing the bug

\documentclass{article}
\begin{document}
Hallo Welt
\end{document}

Log file (required) and possibly PDF file

texput.log

@aminophen

This comment has been minimized.

Contributor

aminophen commented Apr 8, 2018

Already reported by me and \detokenize should be used to avoid that (from #24 (comment))

@josephwright

This comment has been minimized.

Member

josephwright commented Apr 8, 2018

@aminophen Not quite that simple ... but I have to say I'm surprised that the binaries treat the file name argument at the TeX level (it's not \input tést, after all).

@josephwright

This comment has been minimized.

Member

josephwright commented Apr 8, 2018

Something like pdflatex \input\detokenize{tést}\relax works, but that's not ideal.

@aminophen

This comment has been minimized.

Contributor

aminophen commented Apr 8, 2018

the binaries treat the file name argument at the TeX level

Any arguments to *tex is treated as TeX code;-) When the first token is a character, *tex treats it as if \input is prefixed; when the first token is a control sequence, \input is not prefixed.

@josephwright

This comment has been minimized.

Member

josephwright commented Apr 8, 2018

@aminophen I have to say I've always imagined the logic differently :) 'If the first char is the escape char, treat as TeX code, otherwise read as a filename'

@davidcarlisle

This comment has been minimized.

Contributor

davidcarlisle commented Apr 8, 2018

@aminophen

This comment has been minimized.

Contributor

aminophen commented Apr 8, 2018

Delaying utf8.def etc. to \everyjob might be an only solution to this (not tested well for all engines, and if so I will have to adjust platex as well)

(edit: it will change a log filename opened by $pdflatex \\relax to utf8.log instead of texput.log)

--- latex.ltx.orig	2018-04-07 06:33:45.000000000 +0900
+++ latex.ltx	2018-04-08 19:15:09.000000000 +0900
@@ -8641,12 +8641,6 @@
 \catcode10=12 % ctrl J
 \catcode12=13 % ctrl L
 \catcode13=5  % newline
-\@tempcnta=128
-\loop
-  \catcode\@tempcnta=13
-  \advance\@tempcnta\@ne
-\ifnum\@tempcnta<256
-\repeat
 \def\UseRawInputEncoding{%
 \let\DeclareFontEncoding@\DeclareFontEncoding@saved   % revert
 \let\DeclareUnicodeCharacter\@undefined               % revert
@@ -8669,10 +8663,6 @@
 \repeat
 }
 \let\DeclareFontEncoding@saved\DeclareFontEncoding@
-\edef\inputencodingname{utf8}%
-\input{utf8.def}
-\let\@inpenc@test\@undefined
-\let\saved@space@catcode\@undefined
 \else
 \@tempcnta=0
 \loop
@@ -8793,6 +8783,18 @@
   \endgroup}
 \let\@filelist\@gobble
 \def\@addtofilelist#1{\xdef\@filelist{\@filelist,#1}}%
+\everyjob\expandafter{\the\everyjob
+\@tempcnta=128
+\loop
+  \catcode\@tempcnta=13
+  \advance\@tempcnta\@ne
+\ifnum\@tempcnta<256
+\repeat
+\edef\inputencodingname{utf8}%
+\input{utf8.def}
+\let\@inpenc@test\@undefined
+\let\saved@space@catcode\@undefined
+}
 \makeatother
 \errorstopmode
 \dump
@davidcarlisle

This comment has been minimized.

Contributor

davidcarlisle commented Apr 8, 2018

@aminophen yes I'm actually currently running some tests with ltfinal changed as

%    \begin{macrocode}
\edef\inputencodingname{utf8}%
\input{utf8.def}
\let\UTFviii@two@octets@@\UTFviii@two@octets
\long\def\UTFviii@two@octets#1#2{\string#1\string#2}
\everyjob\expandafter{\the\everyjob
\let\UTFviii@two@octets\UTFviii@two@octets@@
}
\let\@inpenc@test\@undefined
\let\saved@space@catcode\@undefined
%    \end{macrocode}

would need the longer cases as well, not just the two byte of course. delaying the catcode activation until everyjob would work on the commandline but if we can make it work without that it may give a path to accepting utf8 filenames more generally in the document (which did not work in previous releases after inputenc was loaded)

@aminophen

This comment has been minimized.

Contributor

aminophen commented Apr 8, 2018

724013b works as expected for pdfLaTeX; I commited a support for that change in pLaTeX texjporg/platex@8b6c518 and it’s ok on both pLaTeX and upLaTeX. I’ll upload the new version of pLaTeX, when LaTeX is ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment