New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find encoding for Python files #1526
Conversation
# (Unlikely to be the default encoding for most testers.) | ||
# ±¶ÿàáâãäåæçèéêëìíîï <- Cyrillic characters | ||
from __future__ import unicode_literals | ||
u = '®âðÄ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that Github displays this file using a default encoding (probably latin-1 or cp1252), so these characters don't look like cyrillic characters. They are compared with a literal in the UTF-8 encoded test file.
I also noticed %run on Python 3 was choking on non-utf8 files. The fix there was simply to read files as bytes, and let |
yield line | ||
|
||
def read_py_file(filename, errors='replace', skip_encoding_cookie=True): | ||
with open(filename) as f: # the open function defined in this module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this function and the next, let's add at least proper docstrings (i.e. with full Parameters and Returns descriptions), as they are likely to be useful for others in general.
This looks very good, and will be far more robust and useful than our current 'punt and call everything utf-8' approach used for urls in |
Added those docstrings, and thrown in a couple of tests for the |
Looks great, thanks! Merging now. |
Find encoding for Python files; this uses the encoding cookie defined in PEP 263 to correctly decode .py files in encodings other than utf-8.
Find encoding for Python files; this uses the encoding cookie defined in PEP 263 to correctly decode .py files in encodings other than utf-8.
As I promised some time ago, this uses the encoding cookie defined in PEP 263 to correctly decode .py files in encodings other than utf-8.
Most of the code is borrowed from Python 3.2's
tokenize
module, but there are some additions.Try, for example
%loadpy https://raw.github.com/python-excel/xlrd/master/xlrd/examples/xlrdnameAPIdemo.py
(encoded in cp1252).