Find encoding for Python files #1526

takluyver · 2012-03-25T22:05:00Z

As I promised some time ago, this uses the encoding cookie defined in PEP 263 to correctly decode .py files in encodings other than utf-8.

Most of the code is borrowed from Python 3.2's tokenize module, but there are some additions.

Try, for example %loadpy https://raw.github.com/python-excel/xlrd/master/xlrd/examples/xlrdnameAPIdemo.py (encoded in cp1252).

takluyver · 2012-03-25T22:07:34Z

IPython/core/tests/nonascii.py

+# (Unlikely to be the default encoding for most testers.)
+# ±¶ÿàáâãäåæçèéêëìíîï <- Cyrillic characters
+from __future__ import unicode_literals
+u = '®âðÄ'


Note that Github displays this file using a default encoding (probably latin-1 or cp1252), so these characters don't look like cyrillic characters. They are compared with a literal in the UTF-8 encoded test file.

takluyver · 2012-03-25T22:13:41Z

I also noticed %run on Python 3 was choking on non-utf8 files. The fix there was simply to read files as bytes, and let compile() find the encoding.

fperez · 2012-04-14T04:35:38Z

IPython/utils/openpy.py

+        yield line
+
+def read_py_file(filename, errors='replace', skip_encoding_cookie=True):
+    with open(filename) as f:   # the open function defined in this module.


For this function and the next, let's add at least proper docstrings (i.e. with full Parameters and Returns descriptions), as they are likely to be useful for others in general.

fperez · 2012-04-14T04:41:26Z

This looks very good, and will be far more robust and useful than our current 'punt and call everything utf-8' approach used for urls in %loadpy. I only had one minor nit, which should just take a minute to address, and otherwise it looks good to go. Thanks for the work!

takluyver · 2012-04-14T18:45:53Z

Added those docstrings, and thrown in a couple of tests for the openpy module.

fperez · 2012-04-14T20:25:55Z

Looks great, thanks! Merging now.

Find encoding for Python files; this uses the encoding cookie defined in PEP 263 to correctly decode .py files in encodings other than utf-8.

takluyver added 4 commits March 14, 2012 00:24

Add IPython.utils.openpy to decode Python files.

8ec813a

Fix for %run on a Python file using non-default encoding.

bb13c29

Use openpy module for %loadpy magic.

6357257

Add file required for Unicode test.

198fed8

takluyver reviewed Mar 25, 2012
View reviewed changes

takluyver added 2 commits March 25, 2012 23:09

Add encoding cookie to test_run.

9c22fc1

Remove unused encoding declaration regex in IPython.core.magic.

0cb09db

fperez reviewed Apr 14, 2012
View reviewed changes

takluyver added 2 commits April 14, 2012 19:15

Add docstrings for read_py_file and read_py_url.

6811272

Add tests for IPython.utils.openpy

b9b8a6f

fperez added a commit that referenced this pull request Apr 14, 2012

Merge pull request #1526 from takluyver/openpy

0fed70c

Find encoding for Python files; this uses the encoding cookie defined in PEP 263 to correctly decode .py files in encodings other than utf-8.

fperez merged commit 0fed70c into ipython:master Apr 14, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find encoding for Python files #1526

Find encoding for Python files #1526

takluyver commented Mar 25, 2012

takluyver Mar 25, 2012

takluyver commented Mar 25, 2012

fperez Apr 14, 2012

fperez commented Apr 14, 2012

takluyver commented Apr 14, 2012

fperez commented Apr 14, 2012

Find encoding for Python files #1526

Find encoding for Python files #1526

Conversation

takluyver commented Mar 25, 2012

takluyver Mar 25, 2012

Choose a reason for hiding this comment

takluyver commented Mar 25, 2012

fperez Apr 14, 2012

Choose a reason for hiding this comment

fperez commented Apr 14, 2012

takluyver commented Apr 14, 2012

fperez commented Apr 14, 2012