MacOSX backend unicode problems in python 3.3 #1737

Closed
dougalsutherland opened this Issue Feb 4, 2013 · 8 comments

6 participants

@dougalsutherland

My matplotlib install on Python 3.3.0 seems to be cutting basically all strings in half in the MacOSX backend. This happens with both matplotlib 1.2.0 and the latest git; it doesn't happen on 2.7. I don't have a 3.2 install anymore, but I didn't see this problem with slightly earlier versions of matplotlib.

For example, if I simply run plt.xlabel('xlabel'), only xla shows up in the xlabel position. This problem also applies to the ticks, so that labels that would normally say eg 0.2 say just 0.. In general, it seems that the length of displayed strings is half the length of the passed string, rounded up. If I try to save the resulting figure through the backend, I get this exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.3/site-packages/matplotlib/backends/backend_macosx.py", line 467, in save_figure
    self.canvas.get_default_filename())
import matplotlib
ValueError: character U+55002f is not in range [U+0000; U+10ffff]

The Qt4Agg backend seems normal, and matplotlib.test() succeeds with only known failures and irrelevant warnings.

I'm on OSX 10.8.2 with the most recent Xcode command line tools (clang --version says Apple LLVM version 4.2 (clang-425.0.24) (based on LLVM 3.2svn)); dependencies are installed via homebrew.

The full log of setup.py build is here, but only this quite suspcious-looking warning stands out:

src/_macosx.m:4979:51: warning: incompatible pointer types passing 'unichar *' (aka 'unsigned short *') to parameter of type
      'const Py_UNICODE *' (aka 'const int *') [-Wincompatible-pointer-types]
        PyObject* string =  PyUnicode_FromUnicode(buffer, n);
                                                  ^~~~~~
/usr/local/Cellar/python3/3.3.0/Frameworks/Python.framework/Versions/3.3/include/python3.3m/unicodeobject.h:702:23: note: passing argument to
      parameter 'u' here
    const Py_UNICODE *u,        /* Unicode buffer */
                      ^

Since the referenced line is in the function choose_save_file, it seems like that probably explains at least the problem with saving files, and unichar being an unsigned short vs Py_UNICODE being an int seems likely to explain the general problem (since a short is 2 bytes and an int is 4).

This presumably broke in 3.3 because of something related to PEP 393, which notes that "The Py_UNICODE type is still supported but deprecated."

@mdboom
Matplotlib Developers member

I'm not a Mac user, so I'm only speculating based on reading the code, but it appears as if the macosx.m assumes that unichar (from Apple's libraries) and Py_UNICODE (from Python) are the same size. This was true for all versions of Python prior to 3.2, but with 3.3, Python went 4-bytes across the board (at least at the API level). I suspect what needs to happen is the appropriate UTF-16 to/from UCS-4 conversions everywhere Python talks to Cocoa.

@mdboom
Matplotlib Developers member

Any Mac users want to take this on?

@pelson
Matplotlib Developers member

Any Mac users want to take this on?

@mdehoon is an obvious candidate - although that doesn't mean that others shouldn't step-up. 😄

@mdehoon

I'd be happy if somebody else gives a try. This doesn't sound like a very complicated bug, at least not one that would need much code reorganization. If nobody steps up over the next couple of months or so, I can have a look at it.

@bingjeff

I came across this yesterday and made a few changes to the v1.2.0 commit of _macosx.m that resolve the problems as far as I can tell. My disclaimer is that I have never worked with Objective-C and I am just beginning with python. My changes were to GraphicsContext_get_text_width_height_descent and GraphicsContext_draw_text, I changed PyArg_ParseTuple to pass back UTF8 encoded strings (s# instead of u#).

if(!PyArg_ParseTuple(args, "ffs#Ofssf", &x, &y, &text, &n, &family, &size, &weight, &italic, &angle)) return NULL;
CFStringRef s = CFStringCreateWithCString(kCFAllocatorDefault, text, kCFStringEncodingUTF8);

This would require a review as I am not confident about best practices. I added a pull request with my changes if it is useful to anyone.

@mdehoon

@bingjeff : Could you post a diff so we can have a look at your changes? (or better yet, create a github branch with the fix?)

@bingjeff

@mdehoon : Thanks for taking a look. I posted a pull request here: #1753

@mdehoon

@bingjeff : Thanks for providing the pull request. It seems to work fine for Python 3.3, but it breaks unicode handling for Python 2. For example, try running unicode_demo.py in examples/pylab_examples. So I think we need to add #ifdef's to GraphicsContext_draw_text to keep using the old code for Python 2.

@dmcdougall dmcdougall was assigned Mar 29, 2013
@mdboom mdboom closed this May 20, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment