Make codecs.StreamReader.read() more compatible with read() of other files #76291

serhiy-storchaka · 2017-11-22T07:40:17Z

BPO	32110
Nosy	@malemburg, @serhiy-storchaka
PRs	bpo-32110: codecs.StreamReader.read(n) now returns not more than n #4499 [3.6] bpo-32110: codecs.StreamReader.read(n) now returns not more than n (GH-4499) #4622 [2.7] bpo-32110: codecs.StreamReader.read(n) now returns not more than n (GH-4499) #4623

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/serhiy-storchaka'
closed_at = <Date 2017-11-29.00:16:20.405>
created_at = <Date 2017-11-22.07:40:16.875>
labels = ['3.7', 'type-bug', 'library', 'expert-IO']
title = 'Make codecs.StreamReader.read() more compatible with read() of other files'
updated_at = <Date 2017-11-29.00:16:20.404>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2017-11-29.00:16:20.404>
actor = 'serhiy.storchaka'
assignee = 'serhiy.storchaka'
closed = True
closed_date = <Date 2017-11-29.00:16:20.405>
closer = 'serhiy.storchaka'
components = ['Library (Lib)', 'IO']
creation = <Date 2017-11-22.07:40:16.875>
creator = 'serhiy.storchaka'
dependencies = []
files = []
hgrepos = []
issue_num = 32110
keywords = ['patch']
message_count = 6.0
messages = ['306701', '306705', '306710', '307191', '307194', '307196']
nosy_count = 2.0
nosy_names = ['lemburg', 'serhiy.storchaka']
pr_nums = ['4499', '4622', '4623']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue32110'
versions = ['Python 2.7', 'Python 3.6', 'Python 3.7']

serhiy-storchaka · 2017-11-22T07:40:16Z

Usually the read() method of a file-like object takes one optional argument which limits the amount of data (the number of bytes or characters) returned if specified.

codecs.StreamReader.read() also has such parameter. But this is the second parameter. The first parameter limits the number of bytes read for decoding. read(1) can return 70 characters, that will confuse most callers which expect either a single character or an empty string (at the end of stream).

Some times ago codecs.open() was recommended as a replacement for the builtin open() in programs that should work in 2.x and 3.x (this was before adding io.open()), and it is still used in many programs. But this peculiarity makes it bad replacement of builtin open().

I wanted to fix this issue long time ago, but forgot, and the question on Stack Overflow has reminded me about this. https://stackoverflow.com/questions/46437761/codecs-openutf-8-fails-to-read-plain-ascii-file

malemburg · 2017-11-22T09:20:56Z

On 22.11.2017 08:40, Serhiy Storchaka wrote:

Usually the read() method of a file-like object takes one optional argument which limits the amount of data (the number of bytes or characters) returned if specified.

codecs.StreamReader.read() also has such parameter. But this is the second parameter. The first parameter limits the number of bytes read for decoding. read(1) can return 70 characters, that will confuse most callers which expect either a single character or an empty string (at the end of stream).

That's not true. .read(1) will at most read 1 byte from the stream
and decode it. There's no way it will return 70 characters. It will
usually return less chars than the number of bytes read.

The reasoning here is the same as for .read() on regular byte
streams in Python 2.x: the first argument size tells the reader how
many bytes to read for decoding, since this is needed to properly
work together with .seek().

The optional second parameter chars was added as convenience,
since the user may not know how many bytes need to be read in
order to decode a certain number of characters.

That said, I see in your patch that you want to bind chars
to size. That will work and also protect the user from the
unlikely case where the codec returns more chars than bytes
read.

serhiy-storchaka · 2017-11-22T09:56:58Z

That's not true. .read(1) will at most read 1 byte from the stream
and decode it. There's no way it will return 70 characters.

See the added tests. They are failed without changing the read() method.

.read(1) currently returns all characters from the characters buffer. And this buffer can be not empty after .readline().

I understand the reason of having two limitation parameters in StreamReader.read(). But currently its behavior does not completely match the expected behavior of the read() method with one argument.

Actually size already has been used instead of chars if chars < 0 for reading in a loop. The code can be simplified.

serhiy-storchaka · 2017-11-28T23:30:03Z

New changeset 219c2de by Serhiy Storchaka in branch 'master':
bpo-32110: codecs.StreamReader.read(n) now returns not more than n (bpo-4499)
219c2de

serhiy-storchaka · 2017-11-29T00:06:55Z

New changeset 230ffea by Serhiy Storchaka (Miss Islington (bot)) in branch '3.6':
bpo-32110: codecs.StreamReader.read(n) now returns not more than n (GH-4499) (bpo-4622)
230ffea

serhiy-storchaka · 2017-11-29T00:15:46Z

New changeset fc73c54 by Serhiy Storchaka (Miss Islington (bot)) in branch '2.7':
bpo-32110: codecs.StreamReader.read(n) now returns not more than n (GH-4499) (bpo-4623)
fc73c54

serhiy-storchaka added the 3.7 (EOL) end of life label Nov 22, 2017

serhiy-storchaka self-assigned this Nov 22, 2017

serhiy-storchaka added stdlib Python modules in the Lib dir topic-IO type-bug An unexpected behavior, bug, or error labels Nov 22, 2017

serhiy-storchaka closed this as completed Nov 29, 2017

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make codecs.StreamReader.read() more compatible with read() of other files #76291

Make codecs.StreamReader.read() more compatible with read() of other files #76291

serhiy-storchaka commented Nov 22, 2017

serhiy-storchaka commented Nov 22, 2017

malemburg commented Nov 22, 2017

serhiy-storchaka commented Nov 22, 2017

serhiy-storchaka commented Nov 28, 2017

serhiy-storchaka commented Nov 29, 2017

serhiy-storchaka commented Nov 29, 2017

Make codecs.StreamReader.read() more compatible with read() of other files #76291

Make codecs.StreamReader.read() more compatible with read() of other files #76291

Comments

serhiy-storchaka commented Nov 22, 2017

serhiy-storchaka commented Nov 22, 2017

malemburg commented Nov 22, 2017

serhiy-storchaka commented Nov 22, 2017

serhiy-storchaka commented Nov 28, 2017

serhiy-storchaka commented Nov 29, 2017

serhiy-storchaka commented Nov 29, 2017