codecs.StreamReader.read behaves differently from regular files #58680

tdb · 2012-04-02T13:13:34Z

BPO	14475
Nosy	@vstinner, @serhiy-storchaka
Superseder	bpo-8260: When I use codecs.open(...) and f.readline() follow up by f.read() return bad result

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2012-12-07.20:03:38.891>
created_at = <Date 2012-04-02.13:13:33.598>
labels = ['type-bug']
title = 'codecs.StreamReader.read behaves differently from regular files'
updated_at = <Date 2012-12-07.20:03:38.889>
user = 'https://bugs.python.org/tdb'

bugs.python.org fields:

activity = <Date 2012-12-07.20:03:38.889>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = True
closed_date = <Date 2012-12-07.20:03:38.891>
closer = 'serhiy.storchaka'
components = []
creation = <Date 2012-04-02.13:13:33.598>
creator = 'tdb'
dependencies = []
files = []
hgrepos = []
issue_num = 14475
keywords = []
message_count = 4.0
messages = ['157355', '157380', '160941', '177122']
nosy_count = 4.0
nosy_names = ['vstinner', 'serhiy.storchaka', 'tdb', 'A.S']
pr_nums = []
priority = 'normal'
resolution = 'duplicate'
stage = 'resolved'
status = 'closed'
superseder = '8260'
type = 'behavior'
url = 'https://bugs.python.org/issue14475'
versions = ['Python 2.6', 'Python 2.7']

tdb · 2012-04-02T13:13:33Z

For regular files, a read() call without arguments will read until EOF. codecs.StreamReader does its own buffering, and if there are characters in the buffer, a read() call will be satisfied from the buffer without an attempt to read the rest of the file. This discrepancy causes certain code that worked with regular open() fail if codecs.open() is substituted.

The easiest way to reproduce this is to first call readline() and then read(). Since readline() can't know how many characters are on the line, it will almost always leave some characters in the buffer, triggering the problem with read().

vstinner · 2012-04-02T20:03:59Z

Oh, yet another bug in in codecs.StreamReader. I should add it to the PEP :-)
http://www.python.org/dev/peps/pep-0400/

Use io.TextIOWrapper (open) instead of codecs.StreamReader (codecs.open), it's bugfree :-)

AS · 2012-05-16T23:37:34Z

Just got this behavior, with readlines(), which is unsurprising since it internally uses read() as described in the original bug report.

The break on line 468 of codecs.py seems to be the problem, it fixes it if I remove this conditional locally.

http://hg.python.org/cpython/file/f6a207d86154/Lib/codecs.py#l466

I may be overlooking something, but I would assume this should be checking if the character buffer extends to the EOF of the underlaying stream at this point?

As stated before can be reproduced by:
f = codecs.open(...)
f.read()
f.readlines()

serhiy-storchaka · 2012-12-07T20:03:39Z

This is obviously a duplicate of bpo-8260 and bpo-12446.

tdb mannequin added the type-bug An unexpected behavior, bug, or error label Apr 2, 2012

serhiy-storchaka closed this as completed Dec 7, 2012

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codecs.StreamReader.read behaves differently from regular files #58680

codecs.StreamReader.read behaves differently from regular files #58680

tdb mannequin commented Apr 2, 2012

tdb mannequin commented Apr 2, 2012

vstinner commented Apr 2, 2012

AS mannequin commented May 16, 2012

serhiy-storchaka commented Dec 7, 2012

codecs.StreamReader.read behaves differently from regular files #58680

codecs.StreamReader.read behaves differently from regular files #58680

Comments

tdb mannequin commented Apr 2, 2012

tdb mannequin commented Apr 2, 2012

vstinner commented Apr 2, 2012

AS mannequin commented May 16, 2012

serhiy-storchaka commented Dec 7, 2012