# Use io.open to enable encoding kwarg in Py2 open() #9692

Merged
merged 4 commits into from Aug 5, 2015

## Conversation

Projects
None yet
6 participants
Member

### bjodah commented Jul 18, 2015

 This was failing in my Python 2 jupyter notebook when using sympy git master: from IPython.display import display, Latex from sympy.interactive import printing printing.init_printing() from sympy import * params = symbols('k l m'.split()) display(params)  encoding='utf-8' kwarg passed to open() was to blame. This commit should fix this bug and adds a test for sympy.printing.preview (maybe there was some other test(s) somewhere but I failed to find those). Note the use of unicode_literals. There maybe gotchas associated with this method if people try to pass str objects (in e.g. packages or preamble kwarg in preview) which will not mix with unicode. The encoding kwarg was added by @dustyrockpyle in 723289c introduced in #9108 which fixed #9107

### bjodah added some commits Jul 18, 2015

 Use io.open to enable encoding kwarg in Py2 open() 
 7e9d14b 
 Allow preview to raise RuntimeError in test_preview. 
 efd8b63 

Member

### hargup commented Jul 27, 2015

 @asmeurer maybe you can review this.
Contributor

### bsdz commented Jul 29, 2015

 I can confirm this patch fixes encoding exceptions on Jupyter using IPython 4.0.0-dev / Python 2.7.10 when outputing latex.
Member

### hargup commented Jul 29, 2015

 I can confirm this patch fixes encoding exceptions on Jupyter using IPython 4.0.0-dev / Python 2.7.10 when outputing latex. Thanks!
Member

### asmeurer commented Jul 30, 2015

 -1 to using unicode_literals.
Member

### asmeurer commented Jul 30, 2015

 Also, it's better to just import io and use io.open. There can be some bugs with open in Python 2.6 when using binary mode (see http://stackoverflow.com/a/4897088/161801), and indeed, some open calls in this file do use 'b'.
Member

### asmeurer commented Jul 30, 2015

 Why does the latex file need to be encoded utf-8? Are latex strings ever not ASCII? Would they even work (are we using XeTeX)?
Member

### asmeurer commented Jul 30, 2015 • edited

 And to reiterate, please don't use unicode_literals. from __future__ import unicode_literals is a terrible feature that breaks strings in Python 2. It sounds like a good idea, until you realize that unicode and str objects are incompatible with one another, and code that isn't using unicode_literals (or just basic standard Python functions) return str objects, not unicode. If you just use str objects in Python 3, Unicode (read: non-ASCII) stuff sort of just works, as long as you decode('utf-8') or encode('utf-8') in the right places (Python 3 will guide you to what those "right" places are because decode and encode move from bytes to str, and it's impossible to mix the two without using them). For example (in Python 2): In [14]: 'γ' + 'a' Out[14]: '\xce\xb3a' In [15]: print 'γ' + 'a' γa The 'γ' is being read in as literal bytes, which happen to be utf-8 bytes because that's what my terminal is using (I believe). I can decode it to get a unicode string: In [17]: 'γ'.decode('utf-8') Out[17]: u'\u03b3' Note that Python 2 str sort of just handles the Unicode characters correctly, because it keeps them as bytes. That's fine, but what happens if you try to mix "bytes" strings (str) with encoded "unicode" strings (unicode): In [13]: 'γ' + u'a' --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) in () ----> 1 'γ' + u'a' UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 0: ordinal not in range(128)  In Python 3 this would be like b'\xce\xb3' + 'a', except the reason that it can't work is more obvious (adding bytes and a string is like adding an array and a string). Again, though, in Python 2, things sort of work out fine because everything is just "bytes" str objects, which, if they do happen to contain non-ASCII utf-8 characters, work out in the end when you decode them. But this is all spoiled if code is using unicode_literals, because you end up trying to mix unicode strings with str bytes strings. unicode strings are like a virus. They infect everything they touch: In [18]: u'a' + 'b' Out[18]: u'ab' In [19]: '%s' % u'a' Out[19]: u'a' What you invariably end up with is code breaking because code that would "just work" with bytes strings breaks because an "infected" unicode string gets pulled in from somewhere. You end up having to "sanitize" every. single. string. you encounter to make sure it is either all str (and bytes in Python 3, to make things work the best) or unicode (and str in Python 2). You might just think "oh, that's fine, we'll just use unicode_literals everywhere," but remember that functions in the Python standard library and builtins like repr all return str, not unicode. Plus, those u prefixes on every string that is printed anywhere start to get real ugly. SymPy doesn't deal very heavily with strings outside of the printing module, so it's unlikely to be a huge issue. But having dealt with this in a huge way with conda (to the point that conda is completely broken for people with non-ASCII usernames (which is common on Windows) in Python 2 because of requests' use of unicode_literals and I'm not going to even attempt to fix it), I want to warn against it. io.open is great, but be aware that it is different from open in Python 2, and (apparently) has some issues in Python 2.6, which we still support.

Member

### bjodah commented Jul 30, 2015

 Thank you @asmeurer for the feedback. I tried to come up with a fallback solution using a try/except block. In #9108 you wrote that you were able to verify the problem, I haven't managed to write a failing test case corresponding to #9107. The test I wrote here seem to work on Python 3.4 with SymPy 0.7.6. How did you verify #9107?

### bjodahforce-pushed the bjodah:fix_open_encoding branch from eed38ae to aff09d1Jul 30, 2015

 Use try/except for open in preview. Test issue 9107. 
 aff09d1 
Contributor

### dustyrockpyle commented Jul 30, 2015

 Hi @bjodah - the error occurs when you use a unicode literal in python3 - In the #9107 test case it's actually using the greek letter nu, not the letter v (I probably should have picked a more distinct character...).
Member

### bjodah commented Jul 30, 2015

 Hmm... Did #9107 get fixed going from IPython 3.0.0 to IPython 3.1.0? I am unable to reproduce #9107 with IPython 3.1.0: http://nbviewer.ipython.org/urls/gist.githubusercontent.com/bjodah/1708eae0fb1e27e7a5fa/raw/5209b9280056a6493d4b8fec0e438b7b2446236f/unable_to_reproduce_9107.ipynb We really need to add tests when fixing issues...
Member

### bjodah commented Jul 30, 2015

 @dustyrockpyle thank you for your feedback. I tried with ASCII and non-ASCII characters. I am still unable to reproduce this even using IPython 3.0.0: http://nbviewer.ipython.org/urls/gist.githubusercontent.com/bjodah/f9aa769e28c80e5b2bc0/raw/64741b142bd3689305e6dee3d75b927284cb30b2/unable_to_reproduce_9107_also_3.0.0.ipynb edit: btw, the test contains an "alpha" character, i.e. #x3b1
Contributor

### bsdz commented Jul 30, 2015

 For what it's worth, I reverted the patch and back to master and the issue re-appeared. I am using Jupyter with IPython 4.0.0-dev / Python 2.7.10. from sympy import init_session init_session(quiet=True, ipython=True) a, b = symbols("a b") expr = (1-a*b)**2 - (1-((a+b)**2)/4)**2 expr  Which outputs the following error along with the correctly rendered latex: IPython console for SymPy 0.7.7.dev (Python 2.7.10-32-bit) (ground types: python) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) C:\Python27\lib\site-packages\IPython\core\formatters.pyc in __call__(self, obj) 335 pass 336 else: --> 337 return printer(obj) 338 # Finally look for special method names 339 method = _safe_get_formatter_method(obj, self.print_method) C:\Python27\lib\site-packages\sympy-0.7.7.dev0-py2.7.egg\sympy\interactive\printing.pyc in _print_latex_png(o) 121 s = latex(o, mode=latex_mode) 122 try: --> 123 return _preview_wrapper(s) 124 except RuntimeError: 125 if latex_mode != 'inline': C:\Python27\lib\site-packages\sympy-0.7.7.dev0-py2.7.egg\sympy\interactive\printing.pyc in _preview_wrapper(o) 75 preview(o, output='png', viewer='BytesIO', 76 outputbuffer=exprbuffer, preamble=preamble, ---> 77 dvioptions=dvioptions) 78 except Exception as e: 79 # IPython swallows exceptions C:\Python27\lib\site-packages\sympy-0.7.7.dev0-py2.7.egg\sympy\printing\preview.pyc in preview(expr, output, viewer, euler, packages, filename, outputbuffer, preamble, dvioptions, outputTexFile, **latex_settings) 184 workdir = tempfile.mkdtemp() 185 --> 186 with open(join(workdir, 'texput.tex'), 'w', encoding='utf-8') as fh: 187 fh.write(latex_main % latex_string) 188 TypeError: 'encoding' is an invalid keyword argument for this function  Here's a snapshot of the output from Jupyter.

Closed

Member

### bjodah commented Jul 30, 2015

 @bsdz yes, this is the behaviour I got with master branch after 723289c, hence my PR. Does this PR in its current state (aff09d1) solve the problem for you? Now I am worried that my fix might reintroduce #9107 since I don't know what caused that issue in the first place. @dustyrockpyle and @asmeurer have both experienced #9107, but I am unable to reproduce it.

### asmeurer reviewed Jul 30, 2015

 try: with open(join(workdir, 'texput.tex'), 'w', encoding='utf-8') as fh: fh.write(latex_main % latex_string) except TypeError: # Python 2's open() does not support encoding

#### asmeurer Jul 30, 2015

Member

I think it's better to do it as you had it before. import io at the top and use io.open here.

#### bsdz Jul 30, 2015

Contributor

@bjodah, I tested the new patch and I don't think it works as you expect. It will definitely use encoding support on Python 3+ where open defaults to io.open. However, in Python 2.7, one must import io before it overrides the default open method. That said, Python 2.6 doesn't support io.open at all. The io module only being released in 2.7 or greater.

I've replaced the whole open code as follows and this appears to work and should properly fallback to codec.open in Python 2.6 (this part untested).

        try:
from io import open
with open(join(workdir, 'texput.tex'), 'w', encoding='utf-8') as fh:
fh.write(latex_main % latex_string)
except:
# use 2.6 codecs
from codecs import open
with open(join(workdir, 'texput.tex'), mode='w', encoding='utf-8') as fh:
fh.write(latex_main % latex_string)
Member

### asmeurer commented Jul 30, 2015

 Going back to a question I asked above, why is the encoding call even there? Does this even work with non-ASCII characters (in Python 3) in the first place?
Member

### bjodah commented Jul 30, 2015

 I was hoping you could tell me, you merged #9108 which added the encoding kwarg. As you pointed out, only xetex supports unicode, not latex which we use. I know very little about how sympy interacts with the IPython notebook, but I think whatever is used to render latex in the notebook supports unicode. And I guess the call to preview is redundant since we still get the output (even though the exception is raised).
Member

### asmeurer commented Jul 30, 2015

 I see. If preview works for you (it doesn't actually work for me, other than using BytesIO), and you have latex installed, can you see what preview(Symbol(u'ν')) gives you? Really, I would expect latex(Symbol(''ν'')) to give '\\nu' (same as latex(Symbol('nu'))), but it just gives'ν'. If Unicode characters actually do work with preview, then this is needed. If not, then this probably needs to be fixed in another location.
Member

### bjodah commented Jul 30, 2015

 @bsdz are you sure about that? https://docs.python.org/2.6/library/io.html#io.open I don't think it will work for Py2 the way you wrote it, if we specify encoding='utf-8' we will need to pass unicode to it (that was the reason I imported unicode_literals at the top, which I was hesitant about and @asmeurer adviced against)
Member

### asmeurer commented Jul 30, 2015

 unicode_literals wouldn't even convert the input to unicode. It only affects string literals. Use the u() function from sympy.core.compatibility if you need to ensure your string is unicode.
Contributor

### bsdz commented Jul 30, 2015

 Mea culpa. Yes it is supported in 2.6. That makes it even easier. No need to try import. Just a simple import io would do. I believe later versions of LaTeX support Unicode.
Member

### bjodah commented Jul 30, 2015

 Python 2.7.6, SymPy 0.7.6: >>> import sympy >>> sympy.__version__, sympy.__file__ ('0.7.6', 'sympy/__init__.pyc') >>> from sympy.printing.preview import preview >>> preview(sympy.Symbol(u'ν')) Traceback (most recent call last): File "", line 1, in File "sympy/printing/preview.py", line 181, in preview latex_string = latex(expr, mode='inline', **latex_settings) File "sympy/printing/latex.py", line 1911, in latex return LatexPrinter(settings).doprint(expr) File "sympy/printing/latex.py", line 152, in doprint tex = Printer.doprint(self, expr) File "sympy/printing/printer.py", line 233, in doprint return self._str(self._print(expr)) UnicodeEncodeError: 'ascii' codec can't encode character u'\u03bd' in position 0: ordinal not in range(128)  Using Python 3.4, SymPy 0.7.6 the above code works (except the preview is a blank white image). In Python3 I also get back the unicode character from applying sympy.latex on our unicode Symbol: >>> sympy.latex(sympy.Symbol(u'ν')) 'ν'  The reason for why the preview is blank is that latex (at least the version shipped with TeX Live 2013) doesn't support unicode. @bsdz I'd love to be proven wrong, I was trying to render this (this is as far as my LaTeX skills gets me) but I cannot make it work with either latex, pdflatex or xelatex: \documentclass{article} \usepackage{fixltx2e}[2006/09/13] \usepackage[T1]{fontenc} \usepackage{ifpdf,ifxetex} \ifxetex \usepackage{fontspec} % XeLaTeX specific %\setmainfont{Linux Libertine O} % LaTeX default (Computer Modern Unicode) \else \usepackage[utf8]{inputenc} % support utf8 (if possible: use xetex) \usepackage{lmodern} \usepackage{newunicodechar} \fi \begin{document} \begin{math} x = α^2 \end{math} \end{document} 
Member

### bjodah commented Jul 30, 2015

 Maybe we should figure out why preview is called at all in the IPython notebook (is it maybe a fallback if mathjax isn't available?) Should we make the latex printer "smart" in the sense it converts modern unicode to old-school latex code? (I am skeptical since it would mean that we need to add lots of code and eventually xetex or whatever comes after will support unicode). As a temporary fix I try to use io.open and u().
Contributor

### bsdz commented Jul 30, 2015

 @bjodah I think they introduced Unicode support last May. See http://latex-project.org/ltnews/ltnews21.pdf. I don't really use LaTeX much these days so I can't really comment how much it is used. If we're outputting Unicode TeX then perhaps the following preamble should be in the output. \usepackage[utf8]{inputenc} 
 Use io.open in preview. Convert to unicode. 
 a2e1585 
Member

### bjodah commented Jul 30, 2015

 @bsdz I see, thanks for the link. The preamble in my example should be correct with respect to the inputenc package. regarding using u() - this was tricky: In [4]: u(r'\usepackage{amsfont}') --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) in () ----> 1 u(r'\usepackage{amsfont}') /home/bjorn/vc/sympy/sympy/core/compatibility.pyc in u(x) 107 unichr = unichr 108 def u(x): --> 109 return codecs.unicode_escape_decode(x)[0] 110 def u_decode(x): 111 return x.decode('utf-8') UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape  I did a .replace(r'\u', r'\\u') to escape latex's \u

Merged

Member

### asmeurer commented Aug 5, 2015

 Preview is a fallback from mathjax, for things like the qtconsole which don't support it. The way the notebook works, it generates all renderings of an expression even if they aren't used by the current frontend.
Member

### bjodah commented Aug 5, 2015

 OK, then I think this is good for now. What do you think? I could rebase away intermediate commits.
Member

### asmeurer commented Aug 5, 2015

 No, don't rebase.

### asmeurer reviewed Aug 5, 2015

 fh.write(latex_main % latex_string) with io.open(join(workdir, 'texput.tex'), 'w', encoding='utf-8') as fh: rendered = latex_main % latex_string fh.write(u(rendered.replace(r'\u', r'\\u')))

#### asmeurer Aug 5, 2015

Member

What is this doing?

#### bjodah Aug 5, 2015

Member

u() parses \uXXXX as unicode. hence this line guards LaTeX commands such as \usepackage

#### thisch Sep 5, 2015

Contributor

I have to remove this call to .replace(). Otherwise I get the following latex error: LaTeX Error: There's no line here to end., which is due to the additional backslash before the \usepackage command.

(Tested on python3.4, ipython 4.0)

#### thisch Sep 5, 2015

Contributor

It looks like #9801 might fix the LaTex Error as well.

#### bjodah Sep 5, 2015

Member

Have you tried that branch?

#### thisch Sep 5, 2015

Contributor

I have just tested the branch and it doesn't seem to work. I get the same error message as before.

here is my test example:

import os
os.environ['SYMPY_DEBUG'] = 'True'
from IPython.display import display, Latex
from sympy.interactive import printing
printing.init_printing(use_latex='png', forecolor='Red')
from sympy import *
params = symbols('k l m'.split())
display(params)

#### thisch Sep 5, 2015

Contributor

and here the content of the generated tex file

\\documentclass[10pt]{article}
\\pagestyle{empty}
\\usepackage{amsmath,amsfonts}\\begin{document}

\\end{document}


As I've written above, by removing the .replace() code, preview generates a valid tex file.

Member

### bjodah commented Aug 5, 2015

 I won't. Any other changes needed?
Member

### asmeurer commented Aug 5, 2015

 The \u thing seems weird, but assuming you tested it, I'm +1.

### asmeurer added a commit that referenced this pull request Aug 5, 2015

 Merge pull request #9692 from bjodah/fix_open_encoding 
Use io.open to enable encoding kwarg in Py2 open()
 21b106e 

### asmeurer merged commit 21b106e` into sympy:master Aug 5, 2015 1 check passed

#### 1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

Member

### bjodah commented Aug 5, 2015

 Thank you for the review.

Closed

Merged

Merged

Closed