Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file iterator "deemed broken"; can resume after StopIteration #67643

Open
dalke mannequin opened this issue Feb 12, 2015 · 1 comment
Open

file iterator "deemed broken"; can resume after StopIteration #67643

dalke mannequin opened this issue Feb 12, 2015 · 1 comment
Labels
docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error

Comments

@dalke
Copy link
Mannequin

dalke mannequin commented Feb 12, 2015

BPO 23455
Nosy @pitrou

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2015-02-12.18:48:54.422>
labels = ['type-bug', 'docs']
title = 'file iterator "deemed broken"; can resume after StopIteration'
updated_at = <Date 2015-07-21.07:29:07.686>
user = 'https://bugs.python.org/dalke'

bugs.python.org fields:

activity = <Date 2015-07-21.07:29:07.686>
actor = 'ethan.furman'
assignee = 'docs@python'
closed = False
closed_date = None
closer = None
components = ['Documentation']
creation = <Date 2015-02-12.18:48:54.422>
creator = 'dalke'
dependencies = []
files = []
hgrepos = []
issue_num = 23455
keywords = []
message_count = 1.0
messages = ['235850']
nosy_count = 3.0
nosy_names = ['dalke', 'pitrou', 'docs@python']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue23455'
versions = ['Python 3.5']

@dalke
Copy link
Mannequin Author

dalke mannequin commented Feb 12, 2015

The file iterator is "deemed broken". As I don't think it should be made non-broken, I suggest the documentation should be changed to point out when file iteration is broken. I also think the term 'broken' is a label with needlessly harsh connotations and should be softened.

The iterator documentation uses the term 'broken' like this (quoting here from https://docs.python.org/3.4/library/stdtypes.html):

Once an iterator’s __next__() method raises StopIteration,
it must continue to do so on subsequent calls. Implementations
that do not obey this property are deemed broken.

(Older versions comment "This constraint was added in Python 2.3; in Python 2.2, various iterators are broken according to this rule.")

An IOBase is supposed to support the iterator protocol (says https://docs.python.org/3.4/library/io.html#io.IOBase ). However, it does not, nor does the documentation say that it's broken in the face of a changing file (eg, when another process appends to a log file).

  % ./python.exe 
  Python 3.5.0a1+ (default:4883f9046b10, Feb 11 2015, 04:30:46) 
  [GCC 4.8.4] on darwin
  Type "help", "copyright", "credits" or "license" for more information.
  >>> f = open("empty")
  >>> next(f)
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  StopIteration
  >>>
  >>> ^Z
  Suspended
  % echo "Hello!" >> empty
  % fg
  ./python.exe

  >>> next(f)
  'Hello!\n'

This is apparently well-known behavior, as I've come across several references to it on various Python-related lists, including this one from Miles in 2008: https://mail.python.org/pipermail/python-list/2008-September/491920.html .

Strictly speaking, file objects are broken iterators:

Fredrik Lundh in the same thread ( https://mail.python.org/pipermail/python-list/2008-September/521090.html ) says:

it's a design guideline, not an absolute rule

The 7+ years of 'broken' behavior in Python suggests that /F is correct. But while 'broken' could be considered a meaningless label, it carries with it some rather negative connotations. It sounds like developers are supposed to make every effort to avoid broken code, when that's not something Python itself does. It also means that my code can be called "broken" solely because it assumed Python file iterators are non-broken. I am not happy when people say my code is broken.

It is entirely reasonable that a seek(0) would reset the state and cause next(it) to not continue to raise a StopIteration exception. However, errors can arise when using Python file objects, as an iterator, to parse a log file or any other files which are appended to by another process.

Here's an example of code that can break. It extracts the first and last elements of an iterator; more specifically, the first and last lines of a file. If there are no lines it returns None for both values; and if there's only one line then it returns the same line as both values.

  def get_first_and_last_elements(it):
    first = last = next(it, None)
    for last in it:
        pass
    return first, last

This code expects a non-broken iterator. If passed a file, and the file were 1) initially empty when the next() was called, and 2) appended to by the time Python reaches the for loop, then it's possible for first value to be None while last is a string.

This is unexpected, undocumented, and may lead to subtle errors.

There are work-arounds, like ensuring that the StopIteration only occurs once:

  def get_first_and_last_elements(it):
    first = last = next(it, None)
    if last is not None:
        for last in it:
            pass
    return first, last

but much existing code expects non-broken iterators, such as the Python example implementation at https://docs.python.org/2/library/itertools.html#itertools.dropwhile . (I have a reproducible failure using it, a fork(), and a file iterator with a sleep() if that would prove useful.)

Another option is to have a wrapper around file object iterators to keep raising StopIteration, like:

   def safe_iter(it):
       yield from it

# -or- (line for line in file_iter)

but people need to know to do this with file iterators or other potentially broken iterators. The current documentation does not say when file iterators are broken, and I don't know which other iterators are also broken.

I realize this is a tricky issue.

I don't think it's possible now to change the file's StopIteration behavior. I expect that there is code which depends on the current brokenness, the ability to seek() and re-iterate is useful, and the idea that next() returns text if and only if readline() is not empty is useful and well-entrenched. Pypy has the same behavior as CPython so any change will take some time to propagate to the other implementations.

Instead, I'm fine with a documentation change in io.html . It currently says:

IOBase (and its subclasses) support the iterator protocol,
meaning that an IOBase object can be iterated over yielding
the lines in a stream. Lines are defined slightly differently
depending on whether the stream is a binary stream (yielding
bytes), or a text stream (yielding unicode strings). See
readline() below.

I suggest adding something like:

The file iterator does not completely follow the iterator protocol.
If new data is added to the file after the iterator raises
a StopIteration then next(file) will resume returning lines.
The safest way to iterate over lines in a log file or other
changing file is use a generator comprehension:

 (line for line in file)

The iterator may also resume after using seek() to move
the file position.

You'll note that I failed to use the term "broken". This should really start

The file iterator is broken.

I find that term rather harsh, and since broken iterators are acceptable in Python, I suggest toning down or qualifying the use of "broken" in stdtypes.html. I have no suggestions for an improved version.

@dalke dalke mannequin assigned docspython Feb 12, 2015
@dalke dalke mannequin added the docs Documentation in the Doc dir label Feb 12, 2015
@ezio-melotti ezio-melotti added the type-bug An unexpected behavior, bug, or error label Mar 2, 2015
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant