Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain unicode in Matplotlib + LaTeX #20262

Open
numpde opened this issue May 19, 2021 · 4 comments
Open

Certain unicode in Matplotlib + LaTeX #20262

numpde opened this issue May 19, 2021 · 4 comments

Comments

@numpde
Copy link

numpde commented May 19, 2021

The following code fails at set_ylabel:

import matplotlib.pyplot as plt

with plt.style.context({'text.usetex': True}):
    (fig, ax) = plt.subplots()
    ax.set_xlabel("€")
    ax.set_ylabel("Δ")
    plt.show()

It works with $\Delta$ instead of Δ and it works if I unset text.usetex.

The generated tex is this:

\documentclass{article}
\newcommand{\mathdefault}[1]{#1}
\usepackage{type1cm}
\usepackage{type1ec}
\usepackage{type1ec}
\usepackage{type1ec}
\usepackage[utf8]{inputenc}
\DeclareUnicodeCharacter{2212}{\ensuremath{-}}
\usepackage[papersize=72in, margin=1in]{geometry}

\makeatletter\@ifpackageloaded{textcomp}{}{\usepackage{textcomp}}\makeatother
\pagestyle{empty}
\begin{document}
% The empty hbox ensures that a page is printed even for empty inputs, except
% when using psfrag which gets confused by it.
\fontsize{10.000000}{12.500000}%
\ifdefined\psfrag\else\hbox{}\fi%
{\sffamily Δ}
\end{document}

and it compiles fine with pdflatex if I replace \usepackage[utf8]{inputenc} by \usepackage[utf8x]{inputenc}. Otherwise the error is the same:

! Package inputenc Error: Unicode character Δ (U+0394)
(inputenc)                not set up for use with LaTeX.

I think this is on the side of unexpected behavior.

Versions: Python 3.8.5 and matplotlib==3.4.1.

@QuLogic
Copy link
Member

QuLogic commented May 20, 2021

It's generally said not to use utf8x. Since Python strings are pretty much Unicode safe, I do wonder if maybe we should start using lualatex or xelatex instead of trying to hack around pdflatex deficiencies.

@anntzer
Copy link
Contributor

anntzer commented May 23, 2021

We actually use latex, not pdflatex. The reason is that we need to get dvi output, not pdf output, so that we can parse it and extract glyphs and then position them correctly for pdf and svg output. Currently agg uses dvipng to convert the dvi to png and embed the bitmap and ps uses a completely different codepath, but I would like them to also switch to the "extract-glyphs-from-dvi" approach; I guess that may be part of @aitikgupta's gsoc (mplcairo uses that approach for everything).

In fact both xetex and luatex can generate output in a dvi-like format (xdv for xetex) with extensions to support specifying any font on the filesystem, so they should be usable for that purpose as well, but that would require quite a bit of reworking of dviread to support their extensions (I have some patches lying around exploring that idea). So while I certainly support the idea of moving to xetex/luatex, it would be quite a bit of work...

@anntzer
Copy link
Contributor

anntzer commented May 27, 2021

One simpler possibility, though, may be to move the user preamble to the top and rewrite all our \usepackage to the same format as used for textcomp (i.e. skip the usepackage if the package has already been loaded). This way users can always use \usepackage[utf8x]{inputenc} as custom preamble if they want to live dangerously.

@anntzer
Copy link
Contributor

anntzer commented Oct 6, 2021

Looking at this again, one solution may be to just add a bunch of \DeclareUnicodeCharacter entries to our tex preamble, e.g.

rcParams["text.latex.preamble"] = r"\DeclareUnicodeCharacter{0394}{\ensuremath{\Delta}}"
figtext(.5, .5, "Δ", usetex=True)

works. In fact we already mostly have a tex<->unicode mapping in _mathtext_data (tex2uni), so we could use that to generate the preamble. The main problem I can see is that this table actually contains too many entries, some of which (e.g. \minus) only work if specific packages are loaded; I think we should split that table to into two parts, one of which correspond to all glyphs that exist by default, without loading any extra tex packages. (From some archeology, the current table seems to have been autogenerated from http://www.ams.org/STIX/bnb/stix-tbl.ascii-2005-09-24; see https://discourse.matplotlib.org/t/unicode-to-tex-symbols-type1-names-and-vice-versa/5407 and https://discourse.matplotlib.org/t/mathtext-patch/5423)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants