Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

l3text-case: Undefined \@@_change_case_char_UTFviii:nnnNNNN #939

Closed
aminophen opened this issue May 23, 2021 · 13 comments
Closed

l3text-case: Undefined \@@_change_case_char_UTFviii:nnnNNNN #939

aminophen opened this issue May 23, 2021 · 13 comments
Assignees
Labels
bug Something isn't working

Comments

@aminophen
Copy link
Contributor

The following code generates an error on pLaTeX:

\RequirePackage{latexbug}
\documentclass{article}
\begin{document}
\ExplSyntaxOn
\text_lowercase:n{日本語}
\ExplSyntaxOff
\end{document}
! Undefined control sequence.
<argument> ...xt_change_case_char_UTFviii:nnnNNNN 

Here \@@_change_case_char_UTFviii:nnnNNNN appears:

{ \@@_change_case_char_UTFviii:nnnNNNN }

the defined name is wrong as \@@_change_case_char_UTFviii:nnnNNNNN (surplus "N")

% \begin{macro}[EXP]{\@@_change_case_char_UTFviii:nnnNNNNN}

\cs_new:Npn \@@_change_case_char_UTFviii:nnnNNNNN #1#2#3#4#5#6#7

@car222222
Copy link
Contributor

But you do not have cases for such glyphs:-)!

Thanks for discovering this and counting the Ns.

@aminophen
Copy link
Contributor Author

Actually I encountered this error when using biblatex; some BIB files containing Japanese entry cannot be processed by biblatex+biber, and adding an option casechanger=latex2e resolved the issue.

@FrankMittelbach
Copy link
Member

@aminophen what exactly is the intended result of a lowercased kanji? unchanged? or is there a convention?

@aminophen
Copy link
Contributor Author

@FrankMittelbach Unchanged. Actually \lowercase / \uppercase primitives has no effect on Japanese characters on (u)pTeX.

@FrankMittelbach
Copy link
Member

@aminophen so I thought, was just checking, but of course unchanged \neq error :-)

@FrankMittelbach FrankMittelbach added the bug Something isn't working label May 23, 2021
@josephwright josephwright self-assigned this May 23, 2021
@josephwright
Copy link
Member

There's also some other bug ... I'll fix the lot

@aminophen
Copy link
Contributor Author

aminophen commented May 23, 2021

some other bug

That can be specific to pTeX/upTeX, not on 8-bit pdfTeX. The problem lies in the unnecessary and wrong handling of JP character tokens, which should be simply passed as-is.

It may be necessary to consider the method of handling JP token: the safe way is a bit different between pTeX (simple) and upTeX (extended to allow storing a catcode information also for JP token).

for pTeX

\documentclass{article}
\makeatletter
\def\CHARS#1{%
  \@tfor\xx@char:=#1\do{%
    % from here on, \xx@char is a \def'ed single character token
    % ===== case of pTeX
    % it's really simple:
    %   * 2 byte code = JP token
    %   * 1 byte code = Latin token
    \expandafter\@tempcnta\expandafter=\expandafter`\xx@char\relax
    \ifnum\@tempcnta>255\relax
      \typeout{[\xx@char]: 2 byte = JP}%
    \else
      \typeout{[\xx@char]: 1 byte = Latin}%
    \fi
    % =====
  }%
}
\makeatother
\begin{document}

\CHARS{日あ、αA}% => JP, JP, JP, JP, Latin

\end{document}

for upTeX

\documentclass{article}
\makeatletter
\def\CHARS#1{%
  \@tfor\xx@char:=#1\do{%
    % from here on, \xx@char is a \def'ed single character token
    % ===== case of upTeX
    % concept: using \Ucharcat, generate a character token
    % which has a charcode=256 (outside ASCII) and
    % a kcatcode=16,17,18,19 which represents a JP token
    \expandafter\ifcat\Ucharcat256 16 \xx@char\relax
      \typeout{[\xx@char]: This is 16 = JP}%
    \else
    \expandafter\ifcat\Ucharcat256 17 \xx@char\relax
      \typeout{[\xx@char]: This is 17 = JP}%
    \else
    \expandafter\ifcat\Ucharcat256 18 \xx@char\relax
      \typeout{[\xx@char]: This is 18 = JP}%
    \else
    \expandafter\ifcat\Ucharcat256 19 \xx@char\relax
      \typeout{[\xx@char]: This is 19 = JP}%
    \else
      \typeout{This is not 16--19 = Latin}%
    \fi\fi\fi\fi
    % =====
  }%
}
\makeatother
\begin{document}

\CHARS{日、あ☃é}% => 16, 18, 17, 18, Latin (2 bytes)

\def\JP{あ}% kcatcode is stored as 17

\expandafter\CHARS\JP % => 17

\kcatcode`あ=15 % change "あ" into non-JP

\CHARS{あ}% => Latin (3 bytes)
\expandafter\CHARS\JP % =>17

\end{document}

@blefloch
Copy link
Member

@aminophen Perhaps you could also comment on the question I just asked on stackexchange about kcatcode? I'm hoping to make various pieces of expl3 (such as l3tl-analysis, the peek analysis code, and l3regex) "do the right thing" in pTeX and upTeX.

@aminophen
Copy link
Contributor Author

Within expl3, you can simply pass Japanese character tokens as-is (without changing anything). Anyway, OK I will answer for TeX.SX

@josephwright
Copy link
Member

I'm hoping to at least avoid the hard error - I should have time today

josephwright added a commit that referenced this issue Jun 14, 2021
@josephwright
Copy link
Member

Hmm, just changing the incorrect name looks OK.

@josephwright
Copy link
Member

Ah, I see ...

@josephwright
Copy link
Member

Could someone check my idea ... I think I've excluded the right things

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants