Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

\ifnumeral chokes on a single Cyrillic letter #559

Closed
eg9 opened this issue Apr 8, 2017 · 8 comments

Comments

Projects
None yet
4 participants
@eg9
Copy link

commented Apr 8, 2017

Consider the example file

%\errorcontextlines=1000
\documentclass{article}

\usepackage[utf8]{inputenc}
\usepackage[T2A]{fontenc}
\usepackage[russian]{babel}

\usepackage{biblatex}

\begin{document}

\makeatletter
\blx@imc@ifnumeral{Н}{\typeout{numeral}}{\typeout{not numeral}}
\blx@imc@ifnumeral{О}{\typeout{numeral}}{\typeout{not numeral}}
\blx@imc@ifnumeral{П}{\typeout{numeral}}{\typeout{not numeral}}
\blx@imc@ifnumeral{н}{\typeout{numeral}}{\typeout{not numeral}}
\blx@imc@ifnumeral{о}{\typeout{numeral}}{\typeout{not numeral}}
\blx@imc@ifnumeral{п}{\typeout{numeral}}{\typeout{not numeral}}
\makeatother

\end{document}

As far as I understand, \blx@imc@ifnumeral becomes \ifnumeral in certain contexts. However this produces an error with О (U+041E CYRILLIC CAPITAL LETTER O), but no other Cyrillic letter.

Source: http://tex.stackexchange.com/questions/362762/biblatex-a-problem-with-cyrillic-О-in-cite

The error message:

! Missing ) inserted for expression.
<to be read again> 
                   (
l.14 \blx@imc@ifnumeral{О}
                           {\typeout{numeral}}{\typeout{not numeral}}

When the code in the TeX.StackExchange question is reduced to

\cite[О]{A}

the error message is (with a high value for \errorcontextlines)

! Missing ) inserted for expression.
<to be read again> 
                   (
\dec@de@UTFviii ...((`#1-"E0)\else \ifnum `#1>"BF(
                                                  (`#1-"C0)\else \ifnum `#1>...

\decode@UTFviii ...the \numexpr \dec@de@UTFviii #1
                                                  \relax )))))\@empty 
\UTFviii@splitcsname ... \decode@UTFviii #2\relax 
                                                  })
<argument> ...i@splitcsname \string \u8:?? \relax 
                                                  \MessageBreak not\space se...

\PackageError ...s \@spaces }{Package #1 Error: #2
                                                  }{See the #1 package docum...

\UTFviii@defined ...\space with\space LaTeX}\@eha 
                                                  \else \expandafter #1\fi 
<to be read again> \edef \blx@tempa {??
                                       }
\blx@ifnum ...t \uppercase {\edef \blx@tempa {#2}}
                                                  \ifx \blx@tempa \@empty \a...

\blx@mkpageprefix #1[#2]#3->\ifnumeral {#3}
                                            {\bibstring {#1}\ppspace } {\ifn...
<argument> ...sname abx@field@postnote\endcsname }
                                                  \blx@endunit 
\@secondoftwo #1#2->#2
                      
\blx@citeprint ...estcite \@gobble }\blx@postcode 
                                                  \fi \blx@endlangcite \endg...

\etb@forlistloop@i ...&->\ifblank {#2} {} {#1{#2}}
                                                  \ifblank {#3} {\listbreak ...

\blx@citei@cite ...{postnote}#4}\blx@citeloop {#3}
                                                  \endgroup 
\mkbibbrackets ...lx@setsfcodes \bibopenbracket #1
                                                  \bibclosebracket \endgroup 
\blx@cite@cite ...lxciteicmd {cite}{#1}{#2}{#3}{}}
                                                  #4\endgroup 
l.36 \cite[О]{A}.

making it clear that \ifnumeral is the culprit.

@plk

This comment has been minimized.

Copy link
Owner

commented Aug 12, 2017

This is another UTF8 issue with pdflatex - with lualatex it's fine. I am loathe to mess about with \ifnumeral for this since there is a workaround and the long term change to a utf8 aware engine for everyone is inevitable.

@plk plk closed this Aug 12, 2017

@moewew moewew added the wontfix label Aug 13, 2017

@moewew

This comment has been minimized.

Copy link
Collaborator

commented Dec 27, 2017

Mhhh, the problem is really that with pdfLaTeX

\uppercase{О}

throws an error, while its Cyrillic friends П and others are OK.

@moewew

This comment has been minimized.

Copy link
Collaborator

commented Dec 27, 2017

It seems as though in general \uppercase/lowercase'ing of Cyrillic letters might be a bit of a risk.
See

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T2A]{fontenc}

\def\procchar#1{#1--\uppercase{#1}--\lowercase{#1}\par}

\begin{document}
\procchar{А}\procchar{а}\procchar{Б}\procchar{б}\procchar{В}\procchar{в}
\procchar{Г}\procchar{г}\procchar{Д}\procchar{д}\procchar{Е}\procchar{е}
\procchar{Ё}\procchar{ё}\procchar{Ж}\procchar{ж}\procchar{З}\procchar{з}
\procchar{И}\procchar{и}\procchar{Й}\procchar{й}\procchar{К}\procchar{к}
\procchar{Л}\procchar{л}\procchar{М}\procchar{м}\procchar{Н}\procchar{н}
\procchar{О}\procchar{о}\procchar{П}\procchar{п}\procchar{Р}\procchar{р}
\procchar{С}\procchar{с}\procchar{Т}\procchar{т}\procchar{У}\procchar{у}
\procchar{Ф}\procchar{ф}\procchar{Х}\procchar{х}\procchar{Ц}\procchar{ц}
\procchar{Ч}\procchar{ч}\procchar{Ш}\procchar{ш}\procchar{Щ}\procchar{щ}
\procchar{Ъ}\procchar{ъ}\procchar{Ы}\procchar{ы}\procchar{Ь}\procchar{ь}
\procchar{Э}\procchar{э}\procchar{Ю}\procchar{ю}\procchar{Я}\procchar{я}
\end{document}

As I understand it we need \uppercase to perform our \blx@ifnum checks

\long\def\blx@ifnum#1#2{%
\begingroup
\let\protect\@unexpandable@protect
\uppercase{\edef\blx@tempa{#2}}%
\ifx\blx@tempa\@empty
\aftergroup\@secondoftwo
\else
\makeatletter
\catcode`\%=9
\endlinechar\m@ne
\everyeof{\noexpand}#1%
\uppercase{\edef\blx@tempa{\scantokens{#2}}}%
\ifx\blx@tempa\@empty
\aftergroup\@firstoftwo
\else
\aftergroup\@secondoftwo
\fi
\fi
\endgroup}
\def\blx@hook@ifnum{%
\def\do##1{\uccode`##1=`\%}%
\do\ \do\0\do\1\do\2\do\3\do\4\do\5\do\6\do\7\do\8\do\9%
\do\i\do\v\do\x\do\l\do\c\do\d\do\m
\do\I\do\V\do\X\do\L\do\C\do\D\do\M
\blx@donumchars
\let\RN\@firstofone
\let\Rn\@firstofone}
\def\blx@hook@ifnums{%
\blx@hook@ifnum
\def\do##1{\uccode`##1=`\%}%
\blx@dorangechars
\def\do##1{\let##1\@empty}%
\blx@dorangecmds}
\def\blx@hook@ifpages{%
\blx@hook@ifnum
\blx@hook@ifnums
\def\do##1{\let##1\@empty}%
\blx@dopagecmds}

Any idea what we should do to get this right @eg9?

@eg9

This comment has been minimized.

Copy link
Author

commented Dec 27, 2017

\uppercase is definitely wrong with non ASCII characters. FWIW, here's a working expl3 version:

\documentclass[twocolumn]{article}
\usepackage[utf8]{inputenc}
\usepackage[T2A]{fontenc}
\usepackage{xparse}

\ExplSyntaxOn
\NewDocumentCommand{\procchar}{m}
 {
  #1--\tl_upper_case:n{#1}--\tl_lower_case:n{#1}
  \par
 }
\ExplSyntaxOff

\begin{document}
\procchar{А}\procchar{а}\procchar{Б}\procchar{б}\procchar{В}\procchar{в}
\procchar{Г}\procchar{г}\procchar{Д}\procchar{д}\procchar{Е}\procchar{е}
\procchar{Ё}\procchar{ё}\procchar{Ж}\procchar{ж}\procchar{З}\procchar{з}
\procchar{И}\procchar{и}\procchar{Й}\procchar{й}\procchar{К}\procchar{к}
\procchar{Л}\procchar{л}\procchar{М}\procchar{м}\procchar{Н}\procchar{н}
\procchar{О}\procchar{о}\procchar{П}\procchar{п}\procchar{Р}\procchar{р}
\procchar{С}\procchar{с}\procchar{Т}\procchar{т}\procchar{У}\procchar{у}
\procchar{Ф}\procchar{ф}\procchar{Х}\procchar{х}\procchar{Ц}\procchar{ц}
\procchar{Ч}\procchar{ч}\procchar{Ш}\procchar{ш}\procchar{Щ}\procchar{щ}
\procchar{Ъ}\procchar{ъ}\procchar{Ы}\procchar{ы}\procchar{Ь}\procchar{ь}
\procchar{Э}\procchar{э}\procchar{Ю}\procchar{ю}\procchar{Я}\procchar{я}
\end{document}
@moewew

This comment has been minimized.

Copy link
Collaborator

commented Dec 27, 2017

@eg9 Thank you for confirming this.

That's not brilliant. \ifnumerals and friends rely on \uppercase to process text input.

Since we are stuck with LaTeX2e we can't use \tl_upper_case:n (plus, I don't know if it would give us what we need anyway).

So I guess we will have to stand by 'if you need full Unicode support, use a Unicode engine'.

@perstar

This comment has been minimized.

Copy link

commented Dec 27, 2017

Is it not possible to roll your own \uppercase and \lowercase that takes the argument apart like the one in the answer here?

@moewew

This comment has been minimized.

Copy link
Collaborator

commented Dec 28, 2017

@perstar Thanks for looking into this. I'm not sure, however, if that would help us. I'm no TeX wizard and I certainly can't say I understand what the answer you linked to does exactly, but here we are not really looking for a macro that turns letters into their actual uppercase counterparts. \uppercase is rather used as a sly trick to remove certain characters from the argument.

@moewew

This comment has been minimized.

Copy link
Collaborator

commented Oct 8, 2018

All biblatex-related examples in this tread seem to work with a current LaTeX installation. So I guess this was fixed by changes beyond out control. The relevant code in biblatex.sty has not changed and it still doing \uppercase on its input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.