Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify documentation of itertools.zip_longest #75453

Closed
raphaelm mannequin opened this issue Aug 24, 2017 · 5 comments
Closed

Simplify documentation of itertools.zip_longest #75453

raphaelm mannequin opened this issue Aug 24, 2017 · 5 comments
Assignees
Labels
3.7 (EOL) end of life docs Documentation in the Doc dir type-feature A feature request or enhancement

Comments

@raphaelm
Copy link
Mannequin

raphaelm mannequin commented Aug 24, 2017

BPO 31270
Nosy @rhettinger, @bitdancer, @raphaelm
PRs
  • bpo-31270: Simplify documentation of itertools.zip_longest #3200
  • bpo-31270: Modification of Pr 3200 #3427
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2017-09-07.21:03:37.023>
    created_at = <Date 2017-08-24.16:03:33.339>
    labels = ['type-feature', '3.7', 'docs']
    title = 'Simplify documentation of itertools.zip_longest'
    updated_at = <Date 2017-09-07.21:51:11.984>
    user = 'https://github.com/raphaelm'

    bugs.python.org fields:

    activity = <Date 2017-09-07.21:51:11.984>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2017-09-07.21:03:37.023>
    closer = 'rhettinger'
    components = ['Documentation']
    creation = <Date 2017-08-24.16:03:33.339>
    creator = 'rami'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 31270
    keywords = []
    message_count = 5.0
    messages = ['300788', '300790', '300799', '300830', '301635']
    nosy_count = 4.0
    nosy_names = ['rhettinger', 'r.david.murray', 'docs@python', 'rami']
    pr_nums = ['3200', '3427']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue31270'
    versions = ['Python 3.6', 'Python 3.7']

    @raphaelm
    Copy link
    Mannequin Author

    raphaelm mannequin commented Aug 24, 2017

    The documentation given for itertools.zip_longest contains a "roughly equivalent" pure-python implementation of the function that is intended to help the user understand what zip_longest does on a functional level.

    However, the given implementation is very complicated to read for newcomers and experienced Python programmers alike, as it uses a custom-defined exception for control flow handling, a nested function, a condition that always is true if any arguments are passed ("while iterators"), as well as two other non-trivial functions from itertools (chain and repeat).

    For future reference, this is the currently given implementation:

        def zip_longest(*args, **kwds):
            # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
            fillvalue = kwds.get('fillvalue')
            iterators = [iter(it) for it in args]
    
            while True:
                exhausted = 0
                values = []
    
                for it in iterators:
                    try:
                        values.append(next(it))
                    except StopIteration:
                        values.append(fillvalue)
                        exhausted += 1
    
                if exhausted < len(args):
                    yield tuple(values)
                else:
                    break

    This is way more complex than necessary to teach the concept of zip_longest. With this issue, I will submit a pull request with a new example implementation that seems to be the same level of "roughly equivalent" but is much easier to read, since it only uses two loops and now complicated flow

        def zip_longest(*args, **kwds):
            # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
            fillvalue = kwds.get('fillvalue')
            iterators = [iter(it) for it in args]
    
            while True:
                exhausted = 0
                values = []
    
                for it in iterators:
                    try:
                        values.append(next(it))
                    except StopIteration:
                        values.append(fillvalue)
                        exhausted += 1
    
                if exhausted < len(args):
                    yield tuple(values)
                else:
                    break

    Looking at the C code of the actual implementation, I don't see that any one of the two implementations is obviously "more equivalent". I'm unsure about performance -- I haven't tried them on that but I don't think that's the point of this learning implementation.

    I ran all tests from Lib/test/test_itertools.py against both the old and the new implementation. The new implementation fails at 3 tests, while the old implementation failed at four. Two of the remaining failures are related to TypeErrors not being thrown on invalid input, one of them is related to pickling the resulting object. I believe all three of them are fine to ignore in this sample, as it is not relevant to the documentation purpose.

    Therefore, I believe the documentation should be changed like suggested. I'd be happy for any feedback or further ideas to improve its readability!

    @raphaelm raphaelm mannequin assigned docspython Aug 24, 2017
    @raphaelm raphaelm mannequin added 3.7 (EOL) end of life docs Documentation in the Doc dir type-feature A feature request or enhancement labels Aug 24, 2017
    @raphaelm
    Copy link
    Mannequin Author

    raphaelm mannequin commented Aug 24, 2017

    I just noticed that in my post I accidentally pasted MY implementation twice instead of the old one, sorry for that. Here's the old one for quick comparison:

    class ZipExhausted(Exception):
        pass
    
    def zip_longest(*args, **kwds):
        # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
        fillvalue = kwds.get('fillvalue')
        counter = len(args) - 1
        def sentinel():
            nonlocal counter
            if not counter:
                raise ZipExhausted
            counter -= 1
            yield fillvalue
        fillers = repeat(fillvalue)
        iterators = [chain(it, sentinel(), fillers) for it in args]
        try:
            while iterators:
                yield tuple(map(next, iterators))
        except ZipExhausted:
            pass

    @rhettinger rhettinger assigned rhettinger and unassigned docspython Aug 24, 2017
    @bitdancer
    Copy link
    Member

    Thanks for wanting to improve the documentation.

    Raymond will address this definitively, but unless I'm mistaken part of the purpose of the examples is to show how the various itertools can be used. If that is true, then in the context of the overall itertools documentation I think the current example has more teaching value than your suggested revision.

    @raphaelm
    Copy link
    Mannequin Author

    raphaelm mannequin commented Aug 25, 2017

    Well, I could think of a way to still use repeat() here that also is pretty clean except for the fact that it fails if all inputs to zip_longest are repeat() iterators themselves (which would here lead to an empty iterator while it would otherwise lead to an infinite one):

        def zip_longest(*args, **kwds):
            # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
            fillvalue = kwds.get('fillvalue')
            iterators = [iter(it) for it in args]
    
            while True:
                values = []
    
                for i, it in enumerate(iterators):
                    try:
                        values.append(next(it))
                    except StopIteration:
                        values.append(fillvalue)
                        iterators[i] = repeat(fillvalue)
    
                if all(isinstance(it, repeat) for it in iterators):
                    break
                else:
                    yield tuple(values)

    Keeping chain() in use here just for the sake of using it is not worth it, I believe.

    @rhettinger
    Copy link
    Contributor

    New changeset 3147b04 by Raymond Hettinger in branch 'master':
    bpo-31270: Modification of Pr 3200 (bpo-3427)
    3147b04

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life docs Documentation in the Doc dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants