Use io.open to enable encoding kwarg in Py2 open() #9692

Merged
merged 4 commits into from Aug 5, 2015

Projects

None yet

6 participants

@bjodah
Member
bjodah commented Jul 18, 2015

This was failing in my Python 2 jupyter notebook when using sympy git master:

from IPython.display import display, Latex
from sympy.interactive import printing
printing.init_printing()

from sympy import *
params = symbols('k l m'.split())
display(params)

encoding='utf-8' kwarg passed to open() was to blame. This commit should fix this bug and adds a test for sympy.printing.preview (maybe there was some other test(s) somewhere but I failed to find those).

Note the use of unicode_literals. There maybe gotchas associated with this method if people try to pass str objects (in e.g. packages or preamble kwarg in preview) which will not mix with unicode.

The encoding kwarg was added by @dustyrockpyle in 723289c introduced in #9108 which fixed #9107

@hargup
Member
hargup commented Jul 27, 2015

@asmeurer maybe you can review this.

@bsdz
Contributor
bsdz commented Jul 29, 2015

I can confirm this patch fixes encoding exceptions on Jupyter using IPython 4.0.0-dev / Python 2.7.10 when outputing latex.

@hargup
Member
hargup commented Jul 29, 2015

I can confirm this patch fixes encoding exceptions on Jupyter using IPython 4.0.0-dev / Python 2.7.10 when outputing latex.

Thanks!

@asmeurer
Member

-1 to using unicode_literals.

@asmeurer
Member

Also, it's better to just import io and use io.open. There can be some bugs with open in Python 2.6 when using binary mode (see http://stackoverflow.com/a/4897088/161801), and indeed, some open calls in this file do use 'b'.

@asmeurer
Member

Why does the latex file need to be encoded utf-8? Are latex strings ever not ASCII? Would they even work (are we using XeTeX)?

@asmeurer
Member
asmeurer commented Jul 30, 2015 edited

And to reiterate, please don't use unicode_literals. from __future__ import unicode_literals is a terrible feature that breaks strings in Python 2. It sounds like a good idea, until you realize that unicode and str objects are incompatible with one another, and code that isn't using unicode_literals (or just basic standard Python functions) return str objects, not unicode.

If you just use str objects in Python 3, Unicode (read: non-ASCII) stuff sort of just works, as long as you decode('utf-8') or encode('utf-8') in the right places (Python 3 will guide you to what those "right" places are because decode and encode move from bytes to str, and it's impossible to mix the two without using them). For example (in Python 2):

In [14]: 'γ' + 'a'
Out[14]: '\xce\xb3a'

In [15]: print 'γ' + 'a'
γa

The 'γ' is being read in as literal bytes, which happen to be utf-8 bytes because that's what my terminal is using (I believe). I can decode it to get a unicode string:

In [17]: 'γ'.decode('utf-8')
Out[17]: u'\u03b3'

Note that Python 2 str sort of just handles the Unicode characters correctly, because it keeps them as bytes. That's fine, but what happens if you try to mix "bytes" strings (str) with encoded "unicode" strings (unicode):

In [13]: 'γ' + u'a'
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-13-a72ecf37f203> in <module>()
----> 1 'γ' + u'a'

UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 0: ordinal not in range(128)

In Python 3 this would be like b'\xce\xb3' + 'a', except the reason that it can't work is more obvious (adding bytes and a string is like adding an array and a string).

Again, though, in Python 2, things sort of work out fine because everything is just "bytes" str objects, which, if they do happen to contain non-ASCII utf-8 characters, work out in the end when you decode them.

But this is all spoiled if code is using unicode_literals, because you end up trying to mix unicode strings with str bytes strings. unicode strings are like a virus. They infect everything they touch:

In [18]: u'a' + 'b'
Out[18]: u'ab'

In [19]: '%s' % u'a'
Out[19]: u'a'

What you invariably end up with is code breaking because code that would "just work" with bytes strings breaks because an "infected" unicode string gets pulled in from somewhere. You end up having to "sanitize" every. single. string. you encounter to make sure it is either all str (and bytes in Python 3, to make things work the best) or unicode (and str in Python 2). You might just think "oh, that's fine, we'll just use unicode_literals everywhere," but remember that functions in the Python standard library and builtins like repr all return str, not unicode.

Plus, those u prefixes on every string that is printed anywhere start to get real ugly.

SymPy doesn't deal very heavily with strings outside of the printing module, so it's unlikely to be a huge issue. But having dealt with this in a huge way with conda (to the point that conda is completely broken for people with non-ASCII usernames (which is common on Windows) in Python 2 because of requests' use of unicode_literals and I'm not going to even attempt to fix it), I want to warn against it.

io.open is great, but be aware that it is different from open in Python 2, and (apparently) has some issues in Python 2.6, which we still support.

@bjodah
Member
bjodah commented Jul 30, 2015

Thank you @asmeurer for the feedback.
I tried to come up with a fallback solution using a try/except block.
In #9108 you wrote that you were able to verify the problem, I haven't managed to write a failing test case corresponding to #9107. The test I wrote here seem to work on Python 3.4 with SymPy 0.7.6. How did you verify #9107?

@dustyrockpyle
Contributor

Hi @bjodah - the error occurs when you use a unicode literal in python3 - In the #9107 test case it's actually using the greek letter nu, not the letter v (I probably should have picked a more distinct character...).

@bjodah
Member
bjodah commented Jul 30, 2015

Hmm... Did #9107 get fixed going from IPython 3.0.0 to IPython 3.1.0?
I am unable to reproduce #9107 with IPython 3.1.0:

http://nbviewer.ipython.org/urls/gist.githubusercontent.com/bjodah/1708eae0fb1e27e7a5fa/raw/5209b9280056a6493d4b8fec0e438b7b2446236f/unable_to_reproduce_9107.ipynb

We really need to add tests when fixing issues...

@bjodah
Member
bjodah commented Jul 30, 2015

@dustyrockpyle thank you for your feedback. I tried with ASCII and non-ASCII characters.
I am still unable to reproduce this even using IPython 3.0.0:

http://nbviewer.ipython.org/urls/gist.githubusercontent.com/bjodah/f9aa769e28c80e5b2bc0/raw/64741b142bd3689305e6dee3d75b927284cb30b2/unable_to_reproduce_9107_also_3.0.0.ipynb

edit: btw, the test contains an "alpha" character, i.e. #x3b1

@bsdz
Contributor
bsdz commented Jul 30, 2015

For what it's worth, I reverted the patch and back to master and the issue re-appeared. I am using Jupyter with IPython 4.0.0-dev / Python 2.7.10.

from sympy import init_session
init_session(quiet=True, ipython=True) 
a, b = symbols("a b")
expr = (1-a*b)**2 - (1-((a+b)**2)/4)**2
expr

Which outputs the following error along with the correctly rendered latex:

IPython console for SymPy 0.7.7.dev (Python 2.7.10-32-bit) (ground types: python)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
C:\Python27\lib\site-packages\IPython\core\formatters.pyc in __call__(self, obj)
    335                 pass
    336             else:
--> 337                 return printer(obj)
    338             # Finally look for special method names
    339             method = _safe_get_formatter_method(obj, self.print_method)

C:\Python27\lib\site-packages\sympy-0.7.7.dev0-py2.7.egg\sympy\interactive\printing.pyc in _print_latex_png(o)
    121             s = latex(o, mode=latex_mode)
    122             try:
--> 123                 return _preview_wrapper(s)
    124             except RuntimeError:
    125                 if latex_mode != 'inline':

C:\Python27\lib\site-packages\sympy-0.7.7.dev0-py2.7.egg\sympy\interactive\printing.pyc in _preview_wrapper(o)
     75             preview(o, output='png', viewer='BytesIO',
     76                     outputbuffer=exprbuffer, preamble=preamble,
---> 77                     dvioptions=dvioptions)
     78         except Exception as e:
     79             # IPython swallows exceptions

C:\Python27\lib\site-packages\sympy-0.7.7.dev0-py2.7.egg\sympy\printing\preview.pyc in preview(expr, output, viewer, euler, packages, filename, outputbuffer, preamble, dvioptions, outputTexFile, **latex_settings)
    184         workdir = tempfile.mkdtemp()
    185 
--> 186         with open(join(workdir, 'texput.tex'), 'w', encoding='utf-8') as fh:
    187             fh.write(latex_main % latex_string)
    188 

TypeError: 'encoding' is an invalid keyword argument for this function

Here's a snapshot of the output from Jupyter.

mezer_07-30_14-58-42

@bjodah
Member
bjodah commented Jul 30, 2015

@bsdz yes, this is the behaviour I got with master branch after 723289c, hence my PR.
Does this PR in its current state (aff09d1) solve the problem for you?

Now I am worried that my fix might reintroduce #9107 since I don't know what caused that issue in the first place. @dustyrockpyle and @asmeurer have both experienced #9107, but I am unable to reproduce it.

@asmeurer asmeurer and 1 other commented on an outdated diff Jul 30, 2015
sympy/printing/preview.py
@@ -183,8 +183,12 @@ def preview(expr, output='png', viewer=None, euler=True, packages=(),
try:
workdir = tempfile.mkdtemp()
- with open(join(workdir, 'texput.tex'), 'w', encoding='utf-8') as fh:
- fh.write(latex_main % latex_string)
+ try:
+ with open(join(workdir, 'texput.tex'), 'w', encoding='utf-8') as fh:
+ fh.write(latex_main % latex_string)
+ except TypeError: # Python 2's open() does not support encoding
@asmeurer
asmeurer Jul 30, 2015 Member

I think it's better to do it as you had it before. import io at the top and use io.open here.

@bsdz
bsdz Jul 30, 2015 Contributor

@bjodah, I tested the new patch and I don't think it works as you expect. It will definitely use encoding support on Python 3+ where open defaults to io.open. However, in Python 2.7, one must import io before it overrides the default open method. That said, Python 2.6 doesn't support io.open at all. The io module only being released in 2.7 or greater.

I've replaced the whole open code as follows and this appears to work and should properly fallback to codec.open in Python 2.6 (this part untested).

        try:
            from io import open
            with open(join(workdir, 'texput.tex'), 'w', encoding='utf-8') as fh:
                fh.write(latex_main % latex_string)
        except:
            # use 2.6 codecs
            from codecs import open
            with open(join(workdir, 'texput.tex'), mode='w', encoding='utf-8') as fh:
                fh.write(latex_main % latex_string)
@asmeurer
Member

Going back to a question I asked above, why is the encoding call even there? Does this even work with non-ASCII characters (in Python 3) in the first place?

@bjodah
Member
bjodah commented Jul 30, 2015

I was hoping you could tell me, you merged #9108 which added the encoding kwarg.
As you pointed out, only xetex supports unicode, not latex which we use.
I know very little about how sympy interacts with the IPython notebook, but I think whatever is used to render latex in the notebook supports unicode.
And I guess the call to preview is redundant since we still get the output (even though the exception is raised).

@asmeurer
Member

I see. If preview works for you (it doesn't actually work for me, other than using BytesIO), and you have latex installed, can you see what preview(Symbol(u'ν')) gives you? Really, I would expect latex(Symbol(''ν'')) to give '\\nu' (same as latex(Symbol('nu'))), but it just gives'ν'`.

If Unicode characters actually do work with preview, then this is needed. If not, then this probably needs to be fixed in another location.

@bjodah
Member
bjodah commented Jul 30, 2015

@bsdz are you sure about that? https://docs.python.org/2.6/library/io.html#io.open
I don't think it will work for Py2 the way you wrote it, if we specify encoding='utf-8' we will need to pass
unicode to it (that was the reason I imported unicode_literals at the top, which I was hesitant about and @asmeurer adviced against)

@asmeurer
Member

unicode_literals wouldn't even convert the input to unicode. It only affects string literals. Use the u() function from sympy.core.compatibility if you need to ensure your string is unicode.

@bsdz
Contributor
bsdz commented Jul 30, 2015

Mea culpa. Yes it is supported in 2.6. That makes it even easier. No need to try import. Just a simple import io would do. I believe later versions of LaTeX support Unicode.

@bjodah
Member
bjodah commented Jul 30, 2015

Python 2.7.6, SymPy 0.7.6:

>>> import sympy
>>> sympy.__version__, sympy.__file__
('0.7.6', 'sympy/__init__.pyc')
>>> from sympy.printing.preview import preview
>>> preview(sympy.Symbol(u'ν'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sympy/printing/preview.py", line 181, in preview
    latex_string = latex(expr, mode='inline', **latex_settings)
  File "sympy/printing/latex.py", line 1911, in latex
    return LatexPrinter(settings).doprint(expr)
  File "sympy/printing/latex.py", line 152, in doprint
    tex = Printer.doprint(self, expr)
  File "sympy/printing/printer.py", line 233, in doprint
    return self._str(self._print(expr))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u03bd' in position 0: ordinal not in range(128)

Using Python 3.4, SymPy 0.7.6 the above code works (except the preview is a blank white image). In Python3 I also get back the unicode character from applying sympy.latex on our unicode Symbol:

>>> sympy.latex(sympy.Symbol(u'ν'))
'ν'

The reason for why the preview is blank is that latex (at least the version shipped with TeX Live 2013) doesn't support unicode. @bsdz I'd love to be proven wrong, I was trying to render this (this is as far as my LaTeX skills gets me) but I cannot make it work with either latex, pdflatex or xelatex:

\documentclass{article}
\usepackage{fixltx2e}[2006/09/13]
\usepackage[T1]{fontenc}
\usepackage{ifpdf,ifxetex}
\ifxetex
  \usepackage{fontspec}           % XeLaTeX specific
  %\setmainfont{Linux Libertine O}  % LaTeX default (Computer Modern Unicode)
\else
  \usepackage[utf8]{inputenc}     % support utf8 (if possible: use xetex)
  \usepackage{lmodern}
  \usepackage{newunicodechar}
\fi
\begin{document}
\begin{math}
  x = α^2
\end{math}
\end{document}
@bjodah
Member
bjodah commented Jul 30, 2015

Maybe we should figure out why preview is called at all in the IPython notebook (is it maybe a fallback if mathjax isn't available?)
Should we make the latex printer "smart" in the sense it converts modern unicode to old-school latex code? (I am skeptical since it would mean that we need to add lots of code and eventually xetex or whatever comes after will support unicode).
As a temporary fix I try to use io.open and u().

@bsdz
Contributor
bsdz commented Jul 30, 2015

@bjodah I think they introduced Unicode support last May. See http://latex-project.org/ltnews/ltnews21.pdf. I don't really use LaTeX much these days so I can't really comment how much it is used. If we're outputting Unicode TeX then perhaps the following preamble should be in the output.

\usepackage[utf8]{inputenc}  
@bjodah
Member
bjodah commented Jul 30, 2015

@bsdz I see, thanks for the link. The preamble in my example should be correct with respect to the inputenc package.

regarding using u() - this was tricky:

In [4]: u(r'\usepackage{amsfont}')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-4-776101d04d06> in <module>()
----> 1 u(r'\usepackage{amsfont}')

/home/bjorn/vc/sympy/sympy/core/compatibility.pyc in u(x)
    107     unichr = unichr
    108     def u(x):
--> 109         return codecs.unicode_escape_decode(x)[0]
    110     def u_decode(x):
    111         return x.decode('utf-8')

UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape

I did a .replace(r'\u', r'\\u') to escape latex's \u

@hargup hargup referenced this pull request in hargup/WIGI-website Aug 2, 2015
Merged

Refactor code, add README #12

@asmeurer
Member
asmeurer commented Aug 5, 2015

Preview is a fallback from mathjax, for things like the qtconsole which don't support it. The way the notebook works, it generates all renderings of an expression even if they aren't used by the current frontend.

@bjodah
Member
bjodah commented Aug 5, 2015

OK, then I think this is good for now. What do you think?
I could rebase away intermediate commits.

@asmeurer
Member
asmeurer commented Aug 5, 2015

No, don't rebase.

@asmeurer asmeurer commented on the diff Aug 5, 2015
sympy/printing/preview.py
@@ -183,8 +186,9 @@ def preview(expr, output='png', viewer=None, euler=True, packages=(),
try:
workdir = tempfile.mkdtemp()
- with open(join(workdir, 'texput.tex'), 'w', encoding='utf-8') as fh:
- fh.write(latex_main % latex_string)
+ with io.open(join(workdir, 'texput.tex'), 'w', encoding='utf-8') as fh:
+ rendered = latex_main % latex_string
+ fh.write(u(rendered.replace(r'\u', r'\\u')))
@asmeurer
asmeurer Aug 5, 2015 Member

What is this doing?

@bjodah
bjodah Aug 5, 2015 Member

u() parses \uXXXX as unicode. hence this line guards LaTeX commands such as \usepackage

@thisch
thisch Sep 5, 2015 Contributor

I have to remove this call to .replace(). Otherwise I get the following latex error: LaTeX Error: There's no line here to end., which is due to the additional backslash before the \usepackage command.

(Tested on python3.4, ipython 4.0)

@thisch
thisch Sep 5, 2015 Contributor

It looks like #9801 might fix the LaTex Error as well.

@bjodah
bjodah Sep 5, 2015 Member

Have you tried that branch?

@thisch
thisch Sep 5, 2015 Contributor

I have just tested the branch and it doesn't seem to work. I get the same error message as before.

here is my test example:

import os
os.environ['SYMPY_DEBUG'] = 'True'
from IPython.display import display, Latex
from sympy.interactive import printing
printing.init_printing(use_latex='png', forecolor='Red')
from sympy import *
params = symbols('k l m'.split())
display(params)
@thisch
thisch Sep 5, 2015 Contributor

and here the content of the generated tex file

\\documentclass[10pt]{article}
\\pagestyle{empty}
\\usepackage{amsmath,amsfonts}\\begin{document}
\\begin{equation*}\\left [ k, \\quad l, \\quad m\\right ]\\end{equation*}

\\end{document}

As I've written above, by removing the .replace() code, preview generates a valid tex file.

@bjodah
Member
bjodah commented Aug 5, 2015

I won't. Any other changes needed?

@asmeurer
Member
asmeurer commented Aug 5, 2015

The \u thing seems weird, but assuming you tested it, I'm +1.

@asmeurer asmeurer merged commit 21b106e into sympy:master Aug 5, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@bjodah bjodah deleted the bjodah:fix_open_encoding branch Aug 5, 2015
@bjodah
Member
bjodah commented Aug 5, 2015

Thank you for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment