Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find encoding for Python files #1526

Merged
merged 8 commits into from Apr 14, 2012
Merged

Find encoding for Python files #1526

merged 8 commits into from Apr 14, 2012

Conversation

takluyver
Copy link
Member

As I promised some time ago, this uses the encoding cookie defined in PEP 263 to correctly decode .py files in encodings other than utf-8.

Most of the code is borrowed from Python 3.2's tokenize module, but there are some additions.

Try, for example %loadpy https://raw.github.com/python-excel/xlrd/master/xlrd/examples/xlrdnameAPIdemo.py (encoded in cp1252).

# (Unlikely to be the default encoding for most testers.)
# ±¶ÿàáâãäåæçèéêëìíîï <- Cyrillic characters
from __future__ import unicode_literals
u = '®âðÄ'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that Github displays this file using a default encoding (probably latin-1 or cp1252), so these characters don't look like cyrillic characters. They are compared with a literal in the UTF-8 encoded test file.

@takluyver
Copy link
Member Author

I also noticed %run on Python 3 was choking on non-utf8 files. The fix there was simply to read files as bytes, and let compile() find the encoding.

yield line

def read_py_file(filename, errors='replace', skip_encoding_cookie=True):
with open(filename) as f: # the open function defined in this module.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this function and the next, let's add at least proper docstrings (i.e. with full Parameters and Returns descriptions), as they are likely to be useful for others in general.

@fperez
Copy link
Member

fperez commented Apr 14, 2012

This looks very good, and will be far more robust and useful than our current 'punt and call everything utf-8' approach used for urls in %loadpy. I only had one minor nit, which should just take a minute to address, and otherwise it looks good to go. Thanks for the work!

@takluyver
Copy link
Member Author

Added those docstrings, and thrown in a couple of tests for the openpy module.

@fperez
Copy link
Member

fperez commented Apr 14, 2012

Looks great, thanks! Merging now.

fperez added a commit that referenced this pull request Apr 14, 2012
Find encoding for Python files; this uses the encoding cookie defined in PEP 263 to correctly decode .py files in encodings other than utf-8.
@fperez fperez merged commit 0fed70c into ipython:master Apr 14, 2012
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014
Find encoding for Python files; this uses the encoding cookie defined in PEP 263 to correctly decode .py files in encodings other than utf-8.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants