New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use io.open to enable encoding kwarg in Py2 open() #9692

Merged
merged 4 commits into from Aug 5, 2015

Conversation

Projects
None yet
6 participants
@bjodah
Copy link
Member

bjodah commented Jul 18, 2015

This was failing in my Python 2 jupyter notebook when using sympy git master:

from IPython.display import display, Latex
from sympy.interactive import printing
printing.init_printing()

from sympy import *
params = symbols('k l m'.split())
display(params)

encoding='utf-8' kwarg passed to open() was to blame. This commit should fix this bug and adds a test for sympy.printing.preview (maybe there was some other test(s) somewhere but I failed to find those).

Note the use of unicode_literals. There maybe gotchas associated with this method if people try to pass str objects (in e.g. packages or preamble kwarg in preview) which will not mix with unicode.

The encoding kwarg was added by @dustyrockpyle in 723289c introduced in #9108 which fixed #9107

@hargup

This comment has been minimized.

Copy link
Member

hargup commented Jul 27, 2015

@asmeurer maybe you can review this.

@bsdz

This comment has been minimized.

Copy link
Contributor

bsdz commented Jul 29, 2015

I can confirm this patch fixes encoding exceptions on Jupyter using IPython 4.0.0-dev / Python 2.7.10 when outputing latex.

@hargup

This comment has been minimized.

Copy link
Member

hargup commented Jul 29, 2015

I can confirm this patch fixes encoding exceptions on Jupyter using IPython 4.0.0-dev / Python 2.7.10 when outputing latex.

Thanks!

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Jul 30, 2015

-1 to using unicode_literals.

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Jul 30, 2015

Also, it's better to just import io and use io.open. There can be some bugs with open in Python 2.6 when using binary mode (see http://stackoverflow.com/a/4897088/161801), and indeed, some open calls in this file do use 'b'.

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Jul 30, 2015

Why does the latex file need to be encoded utf-8? Are latex strings ever not ASCII? Would they even work (are we using XeTeX)?

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Jul 30, 2015

And to reiterate, please don't use unicode_literals. from __future__ import unicode_literals is a terrible feature that breaks strings in Python 2. It sounds like a good idea, until you realize that unicode and str objects are incompatible with one another, and code that isn't using unicode_literals (or just basic standard Python functions) return str objects, not unicode.

If you just use str objects in Python 3, Unicode (read: non-ASCII) stuff sort of just works, as long as you decode('utf-8') or encode('utf-8') in the right places (Python 3 will guide you to what those "right" places are because decode and encode move from bytes to str, and it's impossible to mix the two without using them). For example (in Python 2):

In [14]: 'γ' + 'a'
Out[14]: '\xce\xb3a'

In [15]: print 'γ' + 'a'
γa

The 'γ' is being read in as literal bytes, which happen to be utf-8 bytes because that's what my terminal is using (I believe). I can decode it to get a unicode string:

In [17]: 'γ'.decode('utf-8')
Out[17]: u'\u03b3'

Note that Python 2 str sort of just handles the Unicode characters correctly, because it keeps them as bytes. That's fine, but what happens if you try to mix "bytes" strings (str) with encoded "unicode" strings (unicode):

In [13]: 'γ' + u'a'
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-13-a72ecf37f203> in <module>()
----> 1 'γ' + u'a'

UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 0: ordinal not in range(128)

In Python 3 this would be like b'\xce\xb3' + 'a', except the reason that it can't work is more obvious (adding bytes and a string is like adding an array and a string).

Again, though, in Python 2, things sort of work out fine because everything is just "bytes" str objects, which, if they do happen to contain non-ASCII utf-8 characters, work out in the end when you decode them.

But this is all spoiled if code is using unicode_literals, because you end up trying to mix unicode strings with str bytes strings. unicode strings are like a virus. They infect everything they touch:

In [18]: u'a' + 'b'
Out[18]: u'ab'

In [19]: '%s' % u'a'
Out[19]: u'a'

What you invariably end up with is code breaking because code that would "just work" with bytes strings breaks because an "infected" unicode string gets pulled in from somewhere. You end up having to "sanitize" every. single. string. you encounter to make sure it is either all str (and bytes in Python 3, to make things work the best) or unicode (and str in Python 2). You might just think "oh, that's fine, we'll just use unicode_literals everywhere," but remember that functions in the Python standard library and builtins like repr all return str, not unicode.

Plus, those u prefixes on every string that is printed anywhere start to get real ugly.

SymPy doesn't deal very heavily with strings outside of the printing module, so it's unlikely to be a huge issue. But having dealt with this in a huge way with conda (to the point that conda is completely broken for people with non-ASCII usernames (which is common on Windows) in Python 2 because of requests' use of unicode_literals and I'm not going to even attempt to fix it), I want to warn against it.

io.open is great, but be aware that it is different from open in Python 2, and (apparently) has some issues in Python 2.6, which we still support.

@bjodah

This comment has been minimized.

Copy link
Member

bjodah commented Jul 30, 2015

Thank you @asmeurer for the feedback.
I tried to come up with a fallback solution using a try/except block.
In #9108 you wrote that you were able to verify the problem, I haven't managed to write a failing test case corresponding to #9107. The test I wrote here seem to work on Python 3.4 with SymPy 0.7.6. How did you verify #9107?

@bjodah bjodah force-pushed the bjodah:fix_open_encoding branch from eed38ae to aff09d1 Jul 30, 2015

@dustyrockpyle

This comment has been minimized.

Copy link
Contributor

dustyrockpyle commented Jul 30, 2015

Hi @bjodah - the error occurs when you use a unicode literal in python3 - In the #9107 test case it's actually using the greek letter nu, not the letter v (I probably should have picked a more distinct character...).

@bjodah

This comment has been minimized.

Copy link
Member

bjodah commented Jul 30, 2015

Hmm... Did #9107 get fixed going from IPython 3.0.0 to IPython 3.1.0?
I am unable to reproduce #9107 with IPython 3.1.0:

http://nbviewer.ipython.org/urls/gist.githubusercontent.com/bjodah/1708eae0fb1e27e7a5fa/raw/5209b9280056a6493d4b8fec0e438b7b2446236f/unable_to_reproduce_9107.ipynb

We really need to add tests when fixing issues...

@bjodah

This comment has been minimized.

Copy link
Member

bjodah commented Jul 30, 2015

@dustyrockpyle thank you for your feedback. I tried with ASCII and non-ASCII characters.
I am still unable to reproduce this even using IPython 3.0.0:

http://nbviewer.ipython.org/urls/gist.githubusercontent.com/bjodah/f9aa769e28c80e5b2bc0/raw/64741b142bd3689305e6dee3d75b927284cb30b2/unable_to_reproduce_9107_also_3.0.0.ipynb

edit: btw, the test contains an "alpha" character, i.e. #x3b1

@bsdz

This comment has been minimized.

Copy link
Contributor

bsdz commented Jul 30, 2015

For what it's worth, I reverted the patch and back to master and the issue re-appeared. I am using Jupyter with IPython 4.0.0-dev / Python 2.7.10.

from sympy import init_session
init_session(quiet=True, ipython=True) 
a, b = symbols("a b")
expr = (1-a*b)**2 - (1-((a+b)**2)/4)**2
expr

Which outputs the following error along with the correctly rendered latex:

IPython console for SymPy 0.7.7.dev (Python 2.7.10-32-bit) (ground types: python)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
C:\Python27\lib\site-packages\IPython\core\formatters.pyc in __call__(self, obj)
    335                 pass
    336             else:
--> 337                 return printer(obj)
    338             # Finally look for special method names
    339             method = _safe_get_formatter_method(obj, self.print_method)

C:\Python27\lib\site-packages\sympy-0.7.7.dev0-py2.7.egg\sympy\interactive\printing.pyc in _print_latex_png(o)
    121             s = latex(o, mode=latex_mode)
    122             try:
--> 123                 return _preview_wrapper(s)
    124             except RuntimeError:
    125                 if latex_mode != 'inline':

C:\Python27\lib\site-packages\sympy-0.7.7.dev0-py2.7.egg\sympy\interactive\printing.pyc in _preview_wrapper(o)
     75             preview(o, output='png', viewer='BytesIO',
     76                     outputbuffer=exprbuffer, preamble=preamble,
---> 77                     dvioptions=dvioptions)
     78         except Exception as e:
     79             # IPython swallows exceptions

C:\Python27\lib\site-packages\sympy-0.7.7.dev0-py2.7.egg\sympy\printing\preview.pyc in preview(expr, output, viewer, euler, packages, filename, outputbuffer, preamble, dvioptions, outputTexFile, **latex_settings)
    184         workdir = tempfile.mkdtemp()
    185 
--> 186         with open(join(workdir, 'texput.tex'), 'w', encoding='utf-8') as fh:
    187             fh.write(latex_main % latex_string)
    188 

TypeError: 'encoding' is an invalid keyword argument for this function

Here's a snapshot of the output from Jupyter.

mezer_07-30_14-58-42

@bjodah

This comment has been minimized.

Copy link
Member

bjodah commented Jul 30, 2015

@bsdz yes, this is the behaviour I got with master branch after 723289c, hence my PR.
Does this PR in its current state (aff09d1) solve the problem for you?

Now I am worried that my fix might reintroduce #9107 since I don't know what caused that issue in the first place. @dustyrockpyle and @asmeurer have both experienced #9107, but I am unable to reproduce it.

try:
with open(join(workdir, 'texput.tex'), 'w', encoding='utf-8') as fh:
fh.write(latex_main % latex_string)
except TypeError: # Python 2's open() does not support encoding

This comment has been minimized.

@asmeurer

asmeurer Jul 30, 2015

Member

I think it's better to do it as you had it before. import io at the top and use io.open here.

This comment has been minimized.

@bsdz

bsdz Jul 30, 2015

Contributor

@bjodah, I tested the new patch and I don't think it works as you expect. It will definitely use encoding support on Python 3+ where open defaults to io.open. However, in Python 2.7, one must import io before it overrides the default open method. That said, Python 2.6 doesn't support io.open at all. The io module only being released in 2.7 or greater.

I've replaced the whole open code as follows and this appears to work and should properly fallback to codec.open in Python 2.6 (this part untested).

        try:
            from io import open
            with open(join(workdir, 'texput.tex'), 'w', encoding='utf-8') as fh:
                fh.write(latex_main % latex_string)
        except:
            # use 2.6 codecs
            from codecs import open
            with open(join(workdir, 'texput.tex'), mode='w', encoding='utf-8') as fh:
                fh.write(latex_main % latex_string)
@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Jul 30, 2015

Going back to a question I asked above, why is the encoding call even there? Does this even work with non-ASCII characters (in Python 3) in the first place?

@bjodah

This comment has been minimized.

Copy link
Member

bjodah commented Jul 30, 2015

I was hoping you could tell me, you merged #9108 which added the encoding kwarg.
As you pointed out, only xetex supports unicode, not latex which we use.
I know very little about how sympy interacts with the IPython notebook, but I think whatever is used to render latex in the notebook supports unicode.
And I guess the call to preview is redundant since we still get the output (even though the exception is raised).

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Jul 30, 2015

I see. If preview works for you (it doesn't actually work for me, other than using BytesIO), and you have latex installed, can you see what preview(Symbol(u'ν')) gives you? Really, I would expect latex(Symbol(''ν'')) to give '\\nu' (same as latex(Symbol('nu'))), but it just gives'ν'`.

If Unicode characters actually do work with preview, then this is needed. If not, then this probably needs to be fixed in another location.

@bjodah

This comment has been minimized.

Copy link
Member

bjodah commented Jul 30, 2015

@bsdz are you sure about that? https://docs.python.org/2.6/library/io.html#io.open
I don't think it will work for Py2 the way you wrote it, if we specify encoding='utf-8' we will need to pass
unicode to it (that was the reason I imported unicode_literals at the top, which I was hesitant about and @asmeurer adviced against)

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Jul 30, 2015

unicode_literals wouldn't even convert the input to unicode. It only affects string literals. Use the u() function from sympy.core.compatibility if you need to ensure your string is unicode.

@bsdz

This comment has been minimized.

Copy link
Contributor

bsdz commented Jul 30, 2015

Mea culpa. Yes it is supported in 2.6. That makes it even easier. No need to try import. Just a simple import io would do. I believe later versions of LaTeX support Unicode.

@bjodah

This comment has been minimized.

Copy link
Member

bjodah commented Jul 30, 2015

Python 2.7.6, SymPy 0.7.6:

>>> import sympy
>>> sympy.__version__, sympy.__file__
('0.7.6', 'sympy/__init__.pyc')
>>> from sympy.printing.preview import preview
>>> preview(sympy.Symbol(u'ν'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sympy/printing/preview.py", line 181, in preview
    latex_string = latex(expr, mode='inline', **latex_settings)
  File "sympy/printing/latex.py", line 1911, in latex
    return LatexPrinter(settings).doprint(expr)
  File "sympy/printing/latex.py", line 152, in doprint
    tex = Printer.doprint(self, expr)
  File "sympy/printing/printer.py", line 233, in doprint
    return self._str(self._print(expr))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u03bd' in position 0: ordinal not in range(128)

Using Python 3.4, SymPy 0.7.6 the above code works (except the preview is a blank white image). In Python3 I also get back the unicode character from applying sympy.latex on our unicode Symbol:

>>> sympy.latex(sympy.Symbol(u'ν'))
'ν'

The reason for why the preview is blank is that latex (at least the version shipped with TeX Live 2013) doesn't support unicode. @bsdz I'd love to be proven wrong, I was trying to render this (this is as far as my LaTeX skills gets me) but I cannot make it work with either latex, pdflatex or xelatex:

\documentclass{article}
\usepackage{fixltx2e}[2006/09/13]
\usepackage[T1]{fontenc}
\usepackage{ifpdf,ifxetex}
\ifxetex
  \usepackage{fontspec}           % XeLaTeX specific
  %\setmainfont{Linux Libertine O}  % LaTeX default (Computer Modern Unicode)
\else
  \usepackage[utf8]{inputenc}     % support utf8 (if possible: use xetex)
  \usepackage{lmodern}
  \usepackage{newunicodechar}
\fi
\begin{document}
\begin{math}
  x = α^2
\end{math}
\end{document}
@bjodah

This comment has been minimized.

Copy link
Member

bjodah commented Jul 30, 2015

Maybe we should figure out why preview is called at all in the IPython notebook (is it maybe a fallback if mathjax isn't available?)
Should we make the latex printer "smart" in the sense it converts modern unicode to old-school latex code? (I am skeptical since it would mean that we need to add lots of code and eventually xetex or whatever comes after will support unicode).
As a temporary fix I try to use io.open and u().

@bsdz

This comment has been minimized.

Copy link
Contributor

bsdz commented Jul 30, 2015

@bjodah I think they introduced Unicode support last May. See http://latex-project.org/ltnews/ltnews21.pdf. I don't really use LaTeX much these days so I can't really comment how much it is used. If we're outputting Unicode TeX then perhaps the following preamble should be in the output.

\usepackage[utf8]{inputenc}  
@bjodah

This comment has been minimized.

Copy link
Member

bjodah commented Jul 30, 2015

@bsdz I see, thanks for the link. The preamble in my example should be correct with respect to the inputenc package.

regarding using u() - this was tricky:

In [4]: u(r'\usepackage{amsfont}')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-4-776101d04d06> in <module>()
----> 1 u(r'\usepackage{amsfont}')

/home/bjorn/vc/sympy/sympy/core/compatibility.pyc in u(x)
    107     unichr = unichr
    108     def u(x):
--> 109         return codecs.unicode_escape_decode(x)[0]
    110     def u_decode(x):
    111         return x.decode('utf-8')

UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape

I did a .replace(r'\u', r'\\u') to escape latex's \u

@hargup hargup referenced this pull request Aug 2, 2015

Merged

Refactor code, add README #12

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Aug 5, 2015

Preview is a fallback from mathjax, for things like the qtconsole which don't support it. The way the notebook works, it generates all renderings of an expression even if they aren't used by the current frontend.

@bjodah

This comment has been minimized.

Copy link
Member

bjodah commented Aug 5, 2015

OK, then I think this is good for now. What do you think?
I could rebase away intermediate commits.

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Aug 5, 2015

No, don't rebase.

fh.write(latex_main % latex_string)
with io.open(join(workdir, 'texput.tex'), 'w', encoding='utf-8') as fh:
rendered = latex_main % latex_string
fh.write(u(rendered.replace(r'\u', r'\\u')))

This comment has been minimized.

@asmeurer

asmeurer Aug 5, 2015

Member

What is this doing?

This comment has been minimized.

@bjodah

bjodah Aug 5, 2015

Member

u() parses \uXXXX as unicode. hence this line guards LaTeX commands such as \usepackage

This comment has been minimized.

@thisch

thisch Sep 5, 2015

Contributor

I have to remove this call to .replace(). Otherwise I get the following latex error: LaTeX Error: There's no line here to end., which is due to the additional backslash before the \usepackage command.

(Tested on python3.4, ipython 4.0)

This comment has been minimized.

@thisch

thisch Sep 5, 2015

Contributor

It looks like #9801 might fix the LaTex Error as well.

This comment has been minimized.

@bjodah

bjodah Sep 5, 2015

Member

Have you tried that branch?

This comment has been minimized.

@thisch

thisch Sep 5, 2015

Contributor

I have just tested the branch and it doesn't seem to work. I get the same error message as before.

here is my test example:

import os
os.environ['SYMPY_DEBUG'] = 'True'
from IPython.display import display, Latex
from sympy.interactive import printing
printing.init_printing(use_latex='png', forecolor='Red')
from sympy import *
params = symbols('k l m'.split())
display(params)

This comment has been minimized.

@thisch

thisch Sep 5, 2015

Contributor

and here the content of the generated tex file

\\documentclass[10pt]{article}
\\pagestyle{empty}
\\usepackage{amsmath,amsfonts}\\begin{document}
\\begin{equation*}\\left [ k, \\quad l, \\quad m\\right ]\\end{equation*}

\\end{document}

As I've written above, by removing the .replace() code, preview generates a valid tex file.

@bjodah

This comment has been minimized.

Copy link
Member

bjodah commented Aug 5, 2015

I won't. Any other changes needed?

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Aug 5, 2015

The \u thing seems weird, but assuming you tested it, I'm +1.

asmeurer added a commit that referenced this pull request Aug 5, 2015

Merge pull request #9692 from bjodah/fix_open_encoding
Use io.open to enable encoding kwarg in Py2 open()

@asmeurer asmeurer merged commit 21b106e into sympy:master Aug 5, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@bjodah bjodah deleted the bjodah:fix_open_encoding branch Aug 5, 2015

@bjodah

This comment has been minimized.

Copy link
Member

bjodah commented Aug 5, 2015

Thank you for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment