source files using encoding ./. universal newlines #38984

atsuoishimoto · 2003-07-31T09:16:56Z

BPO	780730
Nosy	@malemburg, @loewis, @doerwalter, @jackjansen, @birkenfeld, @atsuoishimoto
Files	test.py

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/malemburg'
closed_at = <Date 2006-02-21.09:45:42.000>
created_at = <Date 2003-07-31.09:16:56.000>
labels = ['interpreter-core']
title = 'source files using encoding ./. universal newlines'
updated_at = <Date 2006-02-21.09:45:42.000>
user = 'https://github.com/atsuoishimoto'

bugs.python.org fields:

activity = <Date 2006-02-21.09:45:42.000>
actor = 'lemburg'
assignee = 'lemburg'
closed = True
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2003-07-31.09:16:56.000>
creator = 'ishimoto'
dependencies = []
files = ['990']
hgrepos = []
issue_num = 780730
keywords = []
message_count = 19.0
messages = ['17521', '17522', '17523', '17524', '17525', '17526', '17527', '17528', '17529', '17530', '17531', '17532', '17533', '17534', '17535', '17536', '17537', '17538', '17539']
nosy_count = 8.0
nosy_names = ['lemburg', 'loewis', 'doerwalter', 'nnorwitz', 'jackjansen', 'georg.brandl', 'ishimoto', 'johnjsmith']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue780730'
versions = ['Python 2.5']

atsuoishimoto · 2003-07-31T09:16:56Z

Universal Newline Support doesn't work for source files
that contain encoding definition.

loewis · 2003-08-01T07:04:25Z

Logged In: YES
user_id=21627

It's not clear to me what the problem is: Source code files
and universal newline support have nothing to do with each
other.

Can you attach a small example that demonstrates the problem?

atsuoishimoto · 2003-08-01T07:35:39Z

Logged In: YES
user_id=463672

Attached file has encoding definition and it's newline is '\r'.
Executing this script on Windows 2000, I got following error.
But it runs fine if I remove encoding definition.

C:\Python23>python .\test.py
File ".\test.py", line 2
a() print 'hi'
^
SyntaxError: invalid syntax

loewis · 2003-08-01T08:22:40Z

Logged In: YES
user_id=21627

I see. This is not about PEP-278, though, as you are not
calling any open() function, and passing no 'U' argument to
it - cross-platform newlines should work independent of that
PEP.

jackjansen · 2003-08-01T09:45:38Z

Logged In: YES
user_id=45365

The submitter isn't calling open explicitly, but PEP-278 is also about
the implicit opening of sourcefiles by the interpreter.

My guess (but I don't know the PEP-263 implementation at all) is
that when you specify an encoding the normal code that the
interpreter uses to read sourcefiles is bypassed, and replaced by
something that does the encoding magic. Apparently this code
does not do universal newline conversion, which it should.

loewis · 2003-08-03T19:19:24Z

Logged In: YES
user_id=21627

The only way to implement that would be to add 'U' support
for all codecs, and open the codec with the 'U' flag. As
this will also affect codecs not part of Python, we cannot
fix this bug in Python 2.3, but have to defer a solution to
Python 2.4.

malemburg · 2003-08-03T19:31:00Z

Logged In: YES
user_id=38388

I'm not sure this is correct: unless the codecs implement
their own .readline() implementation, the one in codecs.py
is used and that simply delegates the readline request to
the underlying stream object.

Now. since the stream object in the source code reader is
a plain Python file object, currently opened in "rb" mode,
changing the mode to "rbU" should be enough to get
universal readline support for all such codecs.

The relevant code is in Parser/tokenizer.c:fp_setreadl():

static int
fp_setreadl(struct tok_state *tok, const char* enc)
{
	PyObject *reader, *stream, *readline;

/* XXX: constify filename argument. \*/
stream = PyFile_FromFile(tok-\>fp, (char*)tok-\>filename,

"rb", NULL);
if (stream == NULL)
return 0;

	reader = PyCodec_StreamReader(enc, stream, NULL);
	Py_DECREF(stream);
	if (reader == NULL)
		return 0;

	readline = PyObject_GetAttrString(reader, "readline");
	Py_DECREF(reader);
	if (readline == NULL)
		return 0;

	tok->decoding_readline = readline;
	return 1;
}

loewis · 2003-08-03T20:22:21Z

Logged In: YES
user_id=21627

That would work indeed. The question is whether we can
impose support for universal newlines on all codecs "out
there", for Python 2.3.1, when 2.3 makes no such requirement.

malemburg · 2003-08-04T08:21:51Z

Logged In: YES
user_id=38388

Uhm, we don't impose Universal readline support on the codecs.
They would just get a stream object that happens to know
about universal newlines and work with it. That's completely
in line with the codec spec.

I'm +1 on adding this to 2.3.1.

jackjansen · 2003-08-04T08:24:29Z

Logged In: YES
user_id=45365

There's no such things as "rbU", I think, but simply "rU" should
work. As far as I know the only difference between "r" and "rb" is
newline conversion, right? If there are C libraries that do more
then we should implement "rbU".

About 2.3.1 compatibility: technically we could break workarounds
people have done themselves, but I think the chances are slim.

malemburg · 2003-08-04T15:57:58Z

Logged In: YES
user_id=38388

Jack, I was just looking at the code I posted and the one
in fileobect.c. The latter enables universal newline support
whenever it sees a 'U' in the mode string, so I throught that
adding a 'U' to the mode would be enough.

The only system where 'b' does make a difference that I'm
aware of is Windows, so you may want to check whether it
makes a difference there.

johnjsmith · 2003-08-04T19:34:28Z

Logged In: YES
user_id=830565

In MS Windows, a '\x1a' (Ctrl-Z) in a file will be treated
as EOF, unless the file is opened with 'rb'.

malemburg · 2003-08-05T07:31:33Z

Logged In: YES
user_id=38388

Thanks John.

Not sure whether any of codecs would actually use 0x1A,
but using "rbU" sounds like the safer approach then.

jackjansen · 2003-08-05T10:48:50Z

Logged In: YES
user_id=45365

You misunderstand what I tried to say (or I mis-said it:-): there is
no such thing as mode "rbU", check the code in fileobject.c.

There is "r" == "rt" for text mode, "rb" for binary mode,
"U"=="rU" for universal newline textmode. With "rU" the
underlying file is opened in binary mode, so I don't think we'll
have the control-Z problem.

nnorwitz · 2005-10-02T05:50:39Z

Logged In: YES
user_id=33168

I don't see a clear resolution here. Is there something we
can/should do to fix this problem in 2.5?

doerwalter · 2006-02-20T21:42:19Z

Logged In: YES
user_id=89016

The changes to the codecs done in Python 2.4 added support
for universal newlines:

Python 2.4.1 (#2, Mar 31 2005, 00:05:10) 
[GCC 3.3 20030304 (Apple Computer, Inc. build 1666)] on darwin
Type "help", "copyright", "credits" or "license" for more
information.
>>> open("foo.py", "wb").write("# -*- coding: iso-8859-1
-*-\rprint 17\rprint 23\r")
>>> import foo
17
23
>>>

birkenfeld · 2006-02-20T21:48:16Z

Logged In: YES
user_id=849994

So this is resolved now?

doerwalter · 2006-02-20T22:06:10Z

Logged In: YES
user_id=89016

It looks to me that way.

Any comments from the OP?

malemburg · 2006-02-21T09:45:42Z

Logged In: YES
user_id=38388

Closing the bug. Thanks Walter.

atsuoishimoto mannequin closed this as completed Jul 31, 2003

atsuoishimoto mannequin assigned malemburg Jul 31, 2003

atsuoishimoto mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jul 31, 2003

atsuoishimoto mannequin closed this as completed Jul 31, 2003

atsuoishimoto mannequin assigned malemburg Jul 31, 2003

atsuoishimoto mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jul 31, 2003

ezio-melotti transferred this issue from another repository Apr 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

source files using encoding ./. universal newlines #38984

source files using encoding ./. universal newlines #38984

atsuoishimoto mannequin commented Jul 31, 2003

atsuoishimoto mannequin commented Jul 31, 2003

loewis mannequin commented Aug 1, 2003

atsuoishimoto mannequin commented Aug 1, 2003

loewis mannequin commented Aug 1, 2003

jackjansen commented Aug 1, 2003

loewis mannequin commented Aug 3, 2003

malemburg commented Aug 3, 2003

loewis mannequin commented Aug 3, 2003

malemburg commented Aug 4, 2003

jackjansen commented Aug 4, 2003

malemburg commented Aug 4, 2003

johnjsmith mannequin commented Aug 4, 2003

malemburg commented Aug 5, 2003

jackjansen commented Aug 5, 2003

nnorwitz mannequin commented Oct 2, 2005

doerwalter commented Feb 20, 2006

birkenfeld commented Feb 20, 2006

doerwalter commented Feb 20, 2006

malemburg commented Feb 21, 2006

source files using encoding ./. universal newlines #38984

source files using encoding ./. universal newlines #38984

Comments

atsuoishimoto mannequin commented Jul 31, 2003

atsuoishimoto mannequin commented Jul 31, 2003

loewis mannequin commented Aug 1, 2003

atsuoishimoto mannequin commented Aug 1, 2003

loewis mannequin commented Aug 1, 2003

jackjansen commented Aug 1, 2003

loewis mannequin commented Aug 3, 2003

malemburg commented Aug 3, 2003

loewis mannequin commented Aug 3, 2003

malemburg commented Aug 4, 2003

jackjansen commented Aug 4, 2003

malemburg commented Aug 4, 2003

johnjsmith mannequin commented Aug 4, 2003

malemburg commented Aug 5, 2003

jackjansen commented Aug 5, 2003

nnorwitz mannequin commented Oct 2, 2005

doerwalter commented Feb 20, 2006

birkenfeld commented Feb 20, 2006

doerwalter commented Feb 20, 2006

malemburg commented Feb 21, 2006