Python 2.3 encoding parsing bug #39944

edream · 2004-02-17T14:36:28Z

BPO	898757
Nosy	@malemburg, @loewis

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2004-02-17.22:59:30.000>
created_at = <Date 2004-02-17.14:36:28.000>
labels = ['interpreter-core', 'invalid']
title = 'Python 2.3 encoding parsing bug'
updated_at = <Date 2004-02-17.22:59:30.000>
user = 'https://bugs.python.org/edream'

bugs.python.org fields:

activity = <Date 2004-02-17.22:59:30.000>
actor = 'edream'
assignee = 'none'
closed = True
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2004-02-17.14:36:28.000>
creator = 'edream'
dependencies = []
files = []
hgrepos = []
issue_num = 898757
keywords = []
message_count = 4.0
messages = ['20020', '20021', '20022', '20023']
nosy_count = 3.0
nosy_names = ['lemburg', 'loewis', 'edream']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue898757'
versions = []

edream · 2004-02-17T14:36:28Z

The documentation for encoding lines at

C:\Python23\Doc\Python-Docs-2.3.1\whatsnew\section-
encodings.html

states:

"Encodings are declared by including a specially
formatted comment in the first or second line of the
source file."

In fact, contrary to the implication, the Python 2.3
parser does not look for lines of the form:

# -- coding: <encoding> --

For example, Python improperly scans the following line
for an encoding

#@+leo-ver=4-encoding=iso-8859-1.

and reports that iso-8859-1. (note trailing dot) is an
invalid encoding!

The workaround for my app is to precede this line with
the following line:

# -- coding: iso-8859-1 --

This makes Python 2.3 happy.

To make myself perfectly clear: Python has absolutely
no right to complain about comment lines that do not
have the form:

# -- coding: <encoding> --

Python 2.3.1
Windows XP

Edward K. Ream
edreamleo@charter.net

malemburg · 2004-02-17T21:14:05Z

Logged In: YES
user_id=38388

Python is behaving correctly and according to the PEP.

The encoding declaration parser will look for "coding[:=][
\t]*<encoding>"
to make it play nice with various different editor encoding
comments
in use today. The format you are quoting is Emacs-style, but
there are also vi-style and various other formats. Most of them
use the "coding[:=]" declaration which is why this parsing
method
was chosen.

Does leo need the trailing dot in the comment ?

loewis · 2004-02-17T21:47:05Z

Logged In: YES
user_id=21627

Actually, what Python should (and does) really do is to
follow the language specification (the PEP becomes
irrelevant once implemented):

http://www.python.org/doc/current/ref/encodings.html

This gives the precise regexp that is used.

Differences between the language spec and the implementation
would be considered as a bug. Closing this report as not-a-bug.

edream · 2004-02-17T22:59:30Z

Logged In: YES
user_id=14056

Does leo need the trailing dot in the comment?

In general, Leo needs to know where the encoding
specification ends and a possible end-block-comment delim
begin. In specific languages, and in particular Python, Leo
would not have needed the trailing dot. Alas, this is a moot
point. The only options available to Leo now are:

Have the user insert encoding comments by hand or
Change the format of files created by Leo.

In other words, no previous 4.x version of Leo (including 4.1
final, due tomorrow) can ever work with Python 2.3 without
the user inserting a workaround.

I am most upset that the Pep said one thing in English and
something almost completely different in the re. Furthermore,
what the re implies is a very bad idea: having a _restricted_
kind of special-purpose comment is one thing: having a way-
too-general kind of special-purpose comment is wrong, wrong,
wrong. It needlessly invalidates comments that _should_
have been none of Python's business. Yes, I know there was
a reason for this bad idea; there always is.

Edward

edream mannequin closed this as completed Feb 17, 2004

edream mannequin added invalid interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Feb 17, 2004

ezio-melotti transferred this issue from another repository Apr 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python 2.3 encoding parsing bug #39944

Python 2.3 encoding parsing bug #39944

edream mannequin commented Feb 17, 2004

edream mannequin commented Feb 17, 2004

malemburg commented Feb 17, 2004

loewis mannequin commented Feb 17, 2004

edream mannequin commented Feb 17, 2004

Python 2.3 encoding parsing bug #39944

Python 2.3 encoding parsing bug #39944

Comments

edream mannequin commented Feb 17, 2004

edream mannequin commented Feb 17, 2004

malemburg commented Feb 17, 2004

loewis mannequin commented Feb 17, 2004

edream mannequin commented Feb 17, 2004