Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected replacement of \right) with exlamation point in MathTextParser output #5210

Closed
rsnape opened this issue Oct 8, 2015 · 27 comments
Closed
Assignees
Milestone

Comments

@rsnape
Copy link
Contributor

rsnape commented Oct 8, 2015

This bug report originates from my attempts to answer this question on Stack Overflow. Minimal example thanks to the other answer there by Baptiste.


With certain LaTeX strings, the right paranthesis indicates by \right) is replaced with an exclamation point when passed to MathTextParser and rendered as a bitmap or png. This behaviour appears to occur when the parantheses are around an expression comprising multiple layered fractions. It does not occur if \right] or \right\} are used

The minimal code to reproduce this is as follows:

import matplotlib.mathtext as mt

s = r'$\left(\frac{\frac{\frac{M}{I}}{N}}' \
     r'{\frac{\frac{B}{U}}{G}}\right)$'
parser = mt.MathTextParser("Bitmap")
for size in range(1, 30):
    filename = "figure{0}.png".format(size)
    parser.to_png(filename, s, fontsize=size)

This gives output like

Image with exclamation in

Investigations indicate that this occurs only for certain combinations of expression and font size, but the example given above seems to give the most consistent failure conditions. It may be related to dpi in the rendering, however this has not yet been conclusively proven or disproven.

@rsnape
Copy link
Contributor Author

rsnape commented Oct 8, 2015

A bit more investigation - I monkey patched my matplotlib installation - putting in a print(result) directly after the call to parseString() here. With a working expression that goes fine and prints out a textual representation. With the bugged scenario, I see:

Traceback (most recent call last):
  File "D:\baptiste_test.py", line 9, in <module>
    parser.to_png(filename, s, fontsize=size)
  File "C:\Python27\lib\site-packages\matplotlib\mathtext.py", line 3101, in to_
png
    rgba, depth = self.to_rgba(texstr, color=color, dpi=dpi, fontsize=fontsize)
  File "C:\Python27\lib\site-packages\matplotlib\mathtext.py", line 3066, in to_
rgba
    x, depth = self.to_mask(texstr, dpi=dpi, fontsize=fontsize)
  File "C:\Python27\lib\site-packages\matplotlib\mathtext.py", line 3039, in to_
mask
    ftimage, depth = self.parse(texstr, dpi=dpi, prop=prop)
  File "C:\Python27\lib\site-packages\matplotlib\mathtext.py", line 3012, in par
se
    box = self._parser.parse(s, font_output, fontsize, dpi)
  File "C:\Python27\lib\site-packages\matplotlib\mathtext.py", line 2339, in par
se
    print(result[0])
  File "C:\Python27\lib\site-packages\matplotlib\mathtext.py", line 1403, in __r
epr__
    ' '.join([repr(x) for x in self.children]))
  File "C:\Python27\lib\site-packages\matplotlib\mathtext.py", line 1403, in __r
epr__
    ' '.join([repr(x) for x in self.children]))
  File "C:\Python27\lib\site-packages\matplotlib\mathtext.py", line 1403, in __r
epr__
    ' '.join([repr(x) for x in self.children]))
  File "C:\Python27\lib\site-packages\matplotlib\mathtext.py", line 1403, in __r
epr__
    ' '.join([repr(x) for x in self.children]))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb3' in position 1:
ordinal not in range(128)

This indicates that the bug might originate in a mis-translated character - maybe a missing codepoint in the font?

@tacaswell tacaswell added this to the next bug fix release (2.0.1) milestone Oct 8, 2015
@tacaswell
Copy link
Member

attn @mdboom @zblz

@rsnape
Copy link
Contributor Author

rsnape commented Oct 8, 2015

For info - I just built off master and replicated this. It's also been replicated on the versions mentioned on SO - mpl 1.4.3 and 1.5.0. I don't think it's relevant, but I've seen it on both Windows and Linux.

@QuLogic
Copy link
Member

QuLogic commented Oct 8, 2015

Strangely, it works correctly at fontsize=5, but no other sizes.

@rsnape
Copy link
Contributor Author

rsnape commented Oct 8, 2015

@QuLogic ha - yes - I hadn't noticed that but you're right certainly on the build I just did from master

@mdboom mdboom self-assigned this Oct 8, 2015
@rsnape
Copy link
Contributor Author

rsnape commented Oct 9, 2015

I've done some further digging - sticking some debug prints in _get_glyph within the BakomaFonts class. In the failing case, the code seems to be looking up an exclamation (u'!') when you'd expect it to be looking for u'\xc4' and returning parenrightBigg (i.e. where the corresponding left bracket is looking up u'\xc3' and returning parenleftBigg). In situations where it only uses parenrightbigg, there is no problem (this occurs for fontsize=5 in the given example but no others). The debug line I put in _get_glyph was:

    print('Getting glyph for symbol',repr(unicode(sym)))
    print('Returning',cached_font, num, symbol_name, fontsize, slanted)

I guess whether it needs the bigg or Bigg version is based on a combination of fontsize and dpi

@rsnape
Copy link
Contributor Author

rsnape commented Oct 9, 2015

OK - I think problem is in this line : https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/mathtext.py#L727

this reads

 ('ex', '\xb6'), ('ex', '\x21')],

the '\x21' is wrong - but I can't work out what the right one is '\x29' is close, but "too curved". Hopefully one of the core devs can easily lookup this number (decimal 195) against the glyph it renders and correct.

@zblz
Copy link
Member

zblz commented Oct 9, 2015

@rsnape: Your comment that \x29 was not the correct one had me thinking: a couple of lines underneath \x29 is given as a large size alternative for }, but it is obviously a large parens, no a bracket, and sticking \left\{ .... \right\} around your code gives this at size 5:
figure05

but this at larger sizes:
figure08

So, yeah, we should check that all those codes are correct. I don't know exactly how to do that, though; I haven found a list of the Bakoma glyphs anywhere. @mdboom?

@zblz
Copy link
Member

zblz commented Oct 9, 2015

In L735 of mathtext.py there is this comment:

# The fourth size of '[' is mysteriously missing from the BaKoMa
# font, so I've ommitted it for both '[' and ']'

It seemed to me that the fourth size was the one giving problems in {} and () as well, so removing them as initially done for [], I got:

figure11

at all sizes! Maybe we should consider removing the fourth size outright.

@rsnape
Copy link
Contributor Author

rsnape commented Oct 9, 2015

@zblz - my understanding is that 28 and 29 are just the basic paren glyphs and the \Bigg braces fall back on that. They actually give the same glyph as the lookup for '(' and ')' as far as I can tell. I'd like to find the table too - having done quite a bit of legwork to isolate, I'd like to submit a PR for this one :)

@rsnape
Copy link
Contributor Author

rsnape commented Oct 9, 2015

@zblz Yes - maybe a good call to remove them for all, unless it's easy to find exactly which are available and why. The removal solution seems better than falling back on () for braces too (IMHO)

@rsnape
Copy link
Contributor Author

rsnape commented Oct 9, 2015

Well - I found this which seems to show the lookup, but it does show 33 or 0x21 mapping to rightparenBigg - and 0x28 and 0x29 mapping to leftbraceBigg and rightbraceBigg respectively. This implies that the line I highlighted is correct. So maybe the bug is in the lookup or font definition, rather than that line mapping sized delimiters to glyphs.

_EDIT_ Pretty sure this may be relevant - in the ascii table 0x21 is !, 0x28 is ( and 0x29 is ). This looks like a lookup defaulting to ascii lookup rather than glyph index, I think.

@mdboom
Copy link
Member

mdboom commented Oct 9, 2015

To see the glyph table, I usually just open up the fonts in FontForge. You can open the glyph properties for a glyph and it will display the glyph index.

@zblz
Copy link
Member

zblz commented Oct 9, 2015

Looking at the cmex10.ttf shipped with matplotlib in FontForge, parenrightBigg is indeed 0x21, and the big brackets are 0x28 and 0x29, so the mapping in mathtext.py is correct. As you mention, there might be a bug in the lookup.

@mdboom
Copy link
Member

mdboom commented Oct 9, 2015

In the cmex10 font, parenrightBigg is actually mapped to ASCII code point x21 for "!". The source of the bug here is that "get_size_alternatives" returns "!" for this (which is actually correct), but then when the glyph is actually looked up (inside BakomaFonts._get_glyph), that "!" is going through the latex_to_bakoma mapping again and being mapped to the exclamation point glyph in cmr10, rather than remaining as the intended character from cmex10.

As for the fix, I think the thing to do is to add the special names for all of the sized delimiters to latex_to_bakoma (some, but not all are there), and then change the _size_alternatives mapping so that rather than returning individual ascii characters for a particular font (which can be ambiguous), it returns full LaTeX names. A fair bit of busy work in that solution, but I can't see a non-hacky alternative.

Thanks for finding this! It's been lurking there in plain sight for a long time!

@rsnape
Copy link
Contributor Author

rsnape commented Oct 12, 2015

I guess it depends how deeply you want to change this, but I have got a working version by adding to latex_to_bakoma

r'\parenleftBigg'           : ('cmex10', 88),
r'\parenrightBigg'          : ('cmex10', 126),
r'\bracketleftBigg'         : ('cmex10', 94),
r'\bracketrightBigg'        : ('cmex10', 132),
r'\braceleftBigg'           : ('cmex10', 108),
r'\bracerightBigg'          : ('cmex10', 77),

and in _size_alternatives

    '('          : [('rm', '('), ('ex', '\xa1'), ('ex', '\xb3'),
                    ('ex', '\xb5'), ('ex', r'\parenleftBigg')],
    ')'          : [('rm', ')'), ('ex', '\xa2'), ('ex', '\x2e'),
                    ('ex', '\xb6'), ('ex', r'\parenrightBigg')],
    '{'          : [('cal', '{'), ('ex', '\xa9'), ('ex', '\x6e'),
                    ('ex', '\xbd'), ('ex', r'\braceleftBigg')],
    '}'          : [('cal', '}'), ('ex', '\xaa'), ('ex', '\x6f'),
                    ('ex', '\xbe'), ('ex', r'\bracerightBigg')],
    # The fourth size of '[' is mysteriously missing from the BaKoMa
    # font, so I've ommitted it for both '[' and ']' <- COMMENT NO LONGER NEEDED
    '['          : [('rm', '['), ('ex', '\xa3'), ('ex', '\x68'),
                    ('ex', '\x22'), ('ex', r'\bracketleftBigg')],
    ']'          : [('rm', ']'), ('ex', '\xa4'), ('ex', '\x69'),
                    ('ex', '\x23'), ('ex', r'\bracketrightBigg')],

Tested locally. Happy to turn that into a PR if you like - can get rid of the caveat comment about brackets with that change, too - which I guess adds some commonality.

However - I understand if you want to go through doing this for _all_ LaTeX symbol names, or at least checking for potential ASCII clashes. If I'm honest - I'm not exactly sure why I don't see more of the clashes - e.g. I can't get \leftceil to produce an & that I might expect through the same process.

@zblz
Copy link
Member

zblz commented Oct 12, 2015

Yes, it's probably better if you turn this into a PR and we discuss the implementation there. Thanks!

@mdboom
Copy link
Member

mdboom commented Oct 12, 2015

Yes -- go ahead and make a PR. I think what you have above is the right idea. I'd like to do this for everything, though, not just the last in each set (just for consistency and to avoid falling into this trap in the future). And, for example, there is an analogous problem with the third size of ) which is mapped to .. Once doing that, one could probably remove the first part of all the tuple pairs (ex, rm, etc.) and just have the symbol names. That will require some Python code changes in a few places, however.

@mdboom
Copy link
Member

mdboom commented Oct 12, 2015

Also, I'm not sure I follow why you think the comment is no longer needed. I don't see 5 different sizes of open square brackets in the Bakoma fonts... It's bracketleftbigg that seems to be missing.

@rsnape
Copy link
Contributor Author

rsnape commented Oct 12, 2015

Yes - you're right - I had made a flawed assumption that it was the largest that was believed to be missing (counting the first size as zero) - I didn't notice that the last two in the list were already the Bigg versions. If I can find some time, I'll make a more considered PR.

@rsnape
Copy link
Contributor Author

rsnape commented Oct 13, 2015

@mdboom - \bracketleftbigg is defined - it's in an isolated point in the ttf downloaded from the repo - position 8729 == U+2219. Its counterpart bracketrightbigg is at 184 == U+00B8. I cannot imagine why the ordering is so strange - all the other defined glyphs finish at U+00C4.

@mdboom
Copy link
Member

mdboom commented Oct 13, 2015

Wow. Good find. Seems like a bug in the font, particularly since there's a missing glyph at U+00B7 where I expected it to be. I was going to suggest seeing if there was a newer version of the font available, but it hasn't been updated since 1995, so I think we can safely say we already are using the latest. There have been other Computer Modern TTF fonts made since then (e.g. http://cm-unicode.sourceforge.net/), but they use different codepoint layouts so moving to that is well beyond the scope of this PR.

I think using the character at U+2219 is a totally acceptable solution -- matplotlib should have no problem addressing a 16-bit codepoint like that (but who knows what broken assumption that may reveal).

Thanks for digging into this rat's nest!

@mdboom
Copy link
Member

mdboom commented Oct 27, 2015

@rsnape: Are you still planning on submitting a PR for this?

@rsnape
Copy link
Contributor Author

rsnape commented Oct 27, 2015

@mdboom Hi. I was, but I have been given a hard deadline to submit my PhD thesis by end of next week - so won't be able to get to it until late in November. I have a rough PR done, but haven't tested properly. If you can wait, I'll do it late November..

@mdboom
Copy link
Member

mdboom commented Oct 27, 2015

Cool, thanks. That's great.

@QuLogic
Copy link
Member

QuLogic commented Dec 11, 2016

This appears to be working as of latest v2.x; perhaps it is due to changing default fonts.

@QuLogic
Copy link
Member

QuLogic commented Feb 6, 2017

This is working in 2.0 for me, and there hasn't been any reply for 3 months, so I'm going to close this.

@QuLogic QuLogic closed this as completed Feb 6, 2017
@QuLogic QuLogic modified the milestones: 2.0 (style change major release), 2.0.1 (next bug fix release) Feb 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants