Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 2.3 encoding parsing bug #39944

Closed
edream mannequin opened this issue Feb 17, 2004 · 4 comments
Closed

Python 2.3 encoding parsing bug #39944

edream mannequin opened this issue Feb 17, 2004 · 4 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@edream
Copy link
Mannequin

edream mannequin commented Feb 17, 2004

BPO 898757
Nosy @malemburg, @loewis

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2004-02-17.22:59:30.000>
created_at = <Date 2004-02-17.14:36:28.000>
labels = ['interpreter-core', 'invalid']
title = 'Python 2.3 encoding parsing bug'
updated_at = <Date 2004-02-17.22:59:30.000>
user = 'https://bugs.python.org/edream'

bugs.python.org fields:

activity = <Date 2004-02-17.22:59:30.000>
actor = 'edream'
assignee = 'none'
closed = True
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2004-02-17.14:36:28.000>
creator = 'edream'
dependencies = []
files = []
hgrepos = []
issue_num = 898757
keywords = []
message_count = 4.0
messages = ['20020', '20021', '20022', '20023']
nosy_count = 3.0
nosy_names = ['lemburg', 'loewis', 'edream']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue898757'
versions = []

@edream
Copy link
Mannequin Author

edream mannequin commented Feb 17, 2004

The documentation for encoding lines at

C:\Python23\Doc\Python-Docs-2.3.1\whatsnew\section-
encodings.html

states:

"Encodings are declared by including a specially
formatted comment in the first or second line of the
source file."

In fact, contrary to the implication, the Python 2.3
parser does not look for lines of the form:

# -- coding: <encoding> --

For example, Python improperly scans the following line
for an encoding

#@+leo-ver=4-encoding=iso-8859-1.

and reports that iso-8859-1. (note trailing dot) is an
invalid encoding!

The workaround for my app is to precede this line with
the following line:

# -- coding: iso-8859-1 --

This makes Python 2.3 happy.

To make myself perfectly clear: Python has absolutely
no right to complain about comment lines that do not
have the form:

# -- coding: <encoding> --

Python 2.3.1
Windows XP

Edward K. Ream
edreamleo@charter.net

@edream edream mannequin closed this as completed Feb 17, 2004
@edream edream mannequin added invalid interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Feb 17, 2004
@malemburg
Copy link
Member

Logged In: YES
user_id=38388

Python is behaving correctly and according to the PEP.

The encoding declaration parser will look for "coding[:=][
\t]*<encoding>"
to make it play nice with various different editor encoding
comments
in use today. The format you are quoting is Emacs-style, but
there are also vi-style and various other formats. Most of them
use the "coding[:=]" declaration which is why this parsing
method
was chosen.

Does leo need the trailing dot in the comment ?

@loewis
Copy link
Mannequin

loewis mannequin commented Feb 17, 2004

Logged In: YES
user_id=21627

Actually, what Python should (and does) really do is to
follow the language specification (the PEP becomes
irrelevant once implemented):

http://www.python.org/doc/current/ref/encodings.html

This gives the precise regexp that is used.

Differences between the language spec and the implementation
would be considered as a bug. Closing this report as not-a-bug.

@edream
Copy link
Mannequin Author

edream mannequin commented Feb 17, 2004

Logged In: YES
user_id=14056

Does leo need the trailing dot in the comment?

In general, Leo needs to know where the encoding
specification ends and a possible end-block-comment delim
begin. In specific languages, and in particular Python, Leo
would not have needed the trailing dot. Alas, this is a moot
point. The only options available to Leo now are:

  1. Have the user insert encoding comments by hand or
  2. Change the format of files created by Leo.

In other words, no previous 4.x version of Leo (including 4.1
final, due tomorrow) can ever work with Python 2.3 without
the user inserting a workaround.

I am most upset that the Pep said one thing in English and
something almost completely different in the re. Furthermore,
what the re implies is a very bad idea: having a _restricted_
kind of special-purpose comment is one thing: having a way-
too-general kind of special-purpose comment is wrong, wrong,
wrong. It needlessly invalidates comments that _should_
have been none of Python's business. Yes, I know there was
a reason for this bad idea; there always is.

Edward

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)
Projects
None yet
Development

No branches or pull requests

1 participant