Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add programming FAQ entry: remove multiple entries from list #85940

Closed
sreedeviha mannequin opened this issue Sep 12, 2020 · 23 comments
Closed

Add programming FAQ entry: remove multiple entries from list #85940

sreedeviha mannequin opened this issue Sep 12, 2020 · 23 comments
Assignees
Labels
3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes docs Documentation in the Doc dir type-feature A feature request or enhancement

Comments

@sreedeviha
Copy link
Mannequin

sreedeviha mannequin commented Sep 12, 2020

BPO 41774
Nosy @rhettinger, @terryjreedy, @pfmoore, @ericvsmith, @stevendaprano, @miss-islington
PRs
  • bpo-41774: Add programming FAQ entry #22402
  • [3.9] bpo-41774: Add programming FAQ entry (GH-22402) #22447
  • [3.8] bpo-41774: Add programming FAQ entry (GH-22402) #22448
  • bpo-41774: Tweak new programming FAQ entry #22562
  • [3.9] bpo-41774: Tweak new programming FAQ entry (GH-22562) #22563
  • [3.8] bpo-41774: Tweak new programming FAQ entry (GH-22562) #22564
  • Files
  • listrepxxx.py: Program to remove 20 from list
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/terryjreedy'
    closed_at = <Date 2020-10-05.15:16:04.912>
    created_at = <Date 2020-09-12.16:20:35.749>
    labels = ['type-feature', '3.8', '3.9', '3.10', 'docs']
    title = 'Add programming FAQ entry: remove multiple entries from list'
    updated_at = <Date 2020-10-05.15:16:04.911>
    user = 'https://bugs.python.org/sreedeviha'

    bugs.python.org fields:

    activity = <Date 2020-10-05.15:16:04.911>
    actor = 'terry.reedy'
    assignee = 'terry.reedy'
    closed = True
    closed_date = <Date 2020-10-05.15:16:04.912>
    closer = 'terry.reedy'
    components = ['Documentation']
    creation = <Date 2020-09-12.16:20:35.749>
    creator = 'sreedevi.ha'
    dependencies = []
    files = ['49454']
    hgrepos = []
    issue_num = 41774
    keywords = ['patch']
    message_count = 23.0
    messages = ['376803', '376804', '376805', '376806', '376807', '376808', '376810', '376822', '377456', '377464', '377471', '377477', '377478', '377484', '377486', '377497', '377654', '377655', '377656', '377673', '378034', '378037', '378038']
    nosy_count = 7.0
    nosy_names = ['rhettinger', 'terry.reedy', 'paul.moore', 'eric.smith', 'steven.daprano', 'miss-islington', 'sreedevi.ha']
    pr_nums = ['22402', '22447', '22448', '22562', '22563', '22564']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue41774'
    versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

    @sreedeviha
    Copy link
    Mannequin Author

    sreedeviha mannequin commented Sep 12, 2020

    Define a list which has all same elements.
    While removing the same element from list using for loop and by using remove() method, output should be empty list, but the output we get is one element is unremoved

    @sreedeviha sreedeviha mannequin assigned terryjreedy Sep 12, 2020
    @sreedeviha sreedeviha mannequin added type-bug An unexpected behavior, bug, or error topic-IDLE OS-windows 3.8 (EOL) end of life labels Sep 12, 2020
    @sreedeviha sreedeviha mannequin assigned terryjreedy Sep 12, 2020
    @sreedeviha sreedeviha mannequin added the type-bug An unexpected behavior, bug, or error label Sep 12, 2020
    @ericvsmith
    Copy link
    Member

    You should not mutate a list while iterating over it.

    See, for example https://stackoverflow.com/questions/6260089/strange-result-when-removing-item-from-a-list

    I couldn't find a place where this is mentioned in the python list docs. If it's not there, we should add something to the programming FAQ.

    @ericvsmith ericvsmith added docs Documentation in the Doc dir and removed topic-IDLE OS-windows labels Sep 12, 2020
    @ericvsmith
    Copy link
    Member

    Removing Windows and IDLE devs from nosy list, since this isn't related to either of those areas.

    @stevendaprano
    Copy link
    Member

    You say:

    "output should be empty list"

    but that's not actually correct. You intend the output to be the empty list, but if you think carefully about what happens during iteration, you should see that the behaviour is correct.

    To make it easier to see what is happening, let's use different values:

    L = [20, 21, 22]

    On the first iteration, the interpreter looks at position 0, which is 20. You remove 20 from the list, which shrinks the list down to:

    L = [21, 22]

    Now it is 21 in position 0. But that iteration of the loop is finished, so the interpreter looks at position 1, which is 22. You remove 22 from the list, giving:

    L = [21]

    and the interpreter looks at position 2, which does not exist, so the loop finishes.

    >>> L = [20, 21, 22]
    >>> for i in L:
    ...     L.remove(i)
    ...     print('i', i, 'L is now:', L)
    ... 
    i 20 L is now: [21, 22]
    i 22 L is now: [21]
    >>> L
    [21]

    So the interpreter did exactly what you told it to do. The problem is that what you told it to do is not what you wanted it to do.

    @pfmoore
    Copy link
    Member

    pfmoore commented Sep 12, 2020

    There is a bug in your program. You should not mutate the list while looping over it.

    @stevendaprano
    Copy link
    Member

    I think Eric left the issue open because he was hoping to update the FAQs and/or docs with information about this issue.

    I will leave it to someone else to decide whether or not to reopen it.

    @pfmoore
    Copy link
    Member

    pfmoore commented Sep 12, 2020

    Yeah, apologies - I saw the email notification, but Eric removed me from the nosy list (quite reasonably) and I didn't notice there had been extra discussion :-(

    If someone wants to re-open the issue, feel free.

    @terryjreedy
    Copy link
    Member

    Sreedevi: the Windows tag is meant for issues that involve Windows behavior that is different from other OSes; the IDLE tag is meant for issues that involve the behavior of IDLE, as distinct from other ways of running Python code. They should not be used just because you ran code from IDLE on Windows.

    There are two aspects to mutating while iteration: correctness and speed. For beginners, the general advice "Don't do it" is not bad. However:

    Removing items from a list *works* when iterating in reverse if removal is done by index (del mylist[i] rather than value (mylist.remove(value)). But it turns an inherently O(n:=len(list)) operation into a slow O(n*n) operation.

    The usually recommended alternative is to make a new list of things not deleted. But one can do this in-place by writing the new list on top of the old by using a explicit second index to move items just once.

    I reopened to add to the programming FAQ Sequences (Tuples/Lists) section
    How do you remove multiple items from a list?
    after the current
    How do you remove duplicates from a list?

    @terryjreedy terryjreedy added 3.10 only security fixes and removed 3.8 (EOL) end of life labels Sep 13, 2020
    @terryjreedy
    Copy link
    Member

    You are right; the replacement index I called 'j' is better buried as a C index or pointer within a slice replacement. In fact, a generator expression, if one has a keep expression, or a filter call, if one has filter function, work, without the intermediate list. Both also incorporate the keep scan index/pointer in C. I verified that this works by defining 3 functions.

    def fc(n, keep):
        mylist = list(range(n))
        mylist[:] = [x for x in mylist if keep(x)]
        return mylist
    
    def fg(n, keep):
        mylist = list(range(n))
        mylist[:] = (x for x in mylist if keep(x))
        return mylist
    
    def fl(n, keep):
        mylist = list(range(n))
        mylist[:] = filter(keep, mylist)
        return mylist

    I added a second test expression.

    print(fc(i, keep) == fg(i, keep) == fl(i, keep) == expect)
    

    at the 3 obvious places in the test loop above.
    ---

    In the existing question about removing duplicates, the existing all-hashable answer
    mylist = list(set(mylist))
    could be replaced by
    mylist[:] = set(mylist)

    @rhettinger
    Copy link
    Contributor

    Try running some timings. I suspect the genexp/iterator versions run more slowly. The speed of the slice replacement relies on knowing the size of the input.

    @terryjreedy
    Copy link
    Member

    Timings depend on multiple factors, including implementation (cpython versus pypy versus cython, etc) and complexity of the keep/discard decision. I said in the proposed text that a listcomp may be faster. I think that sufficient; anyone who cares can test their specific case.

    I like the simple ad easy 'slice replacement = iterator' form because it illustrates to me that we have done something right with Python's design.

    @rhettinger
    Copy link
    Contributor

    I like the simple ad easy 'slice replacement = iterator' form
    because it illustrates to me that we have done something
    right with Python's design.

    I understand that you like it and that it reflects the way you think the world should work, , but that doesn't warrant putting it in the FAQ. We should steer users down a path of feeding unsizable inputs into tooling that needs a size to work well (the receiving code either has to implicitly build a list first before it can start or it will have to have periodic resizes). A straight list comprehension will suffice to answer the question cleanly.

    FWIW, the same issue occurs with str.join(). It works better with a list comprehension than an iterator. Given an iterator, it has to build an internal list first before it can start. That is slower than starting with a list in the first place and makes the memory consumption implicit when it should be explicit (a generator would create the illusion that a list isn't being formed which is misleading).

    @terryjreedy
    Copy link
    Member

    Raymond, please take what I have written and rewrite it to your satisfaction. I have lots else to do, including investigating the IDLE bug you just reported.

    @rhettinger
    Copy link
    Contributor

    Sorry for the wording of the last message. Go ahead with whatever you would like do :-)

    Was only trying to point-out that the generator semantics don't interact nicely with slice assignments:

        $ python3.9 -m timeit -s 'a=list(range(1000))' 'b=a[:]' 'b[:]=(x for x in a)'
        5000 loops, best of 5: 46.6 usec per loop
    
        $ python3.9 -m timeit -s 'a=list(range(1000))' 'b=a[:]' 'b[:]=[x for x in a]'
        10000 loops, best of 5: 26.3 usec per loop

    @terryjreedy
    Copy link
    Member

    New changeset 5b0181d by Terry Jan Reedy in branch 'master':
    bpo-41774: Add programming FAQ entry (GH-22402)
    5b0181d

    @miss-islington
    Copy link
    Contributor

    New changeset d50a070 by Miss Islington (bot) in branch '3.8':
    bpo-41774: Add programming FAQ entry (GH-22402)
    d50a070

    @miss-islington
    Copy link
    Contributor

    New changeset 868c8e4 by Miss Islington (bot) in branch '3.9':
    bpo-41774: Add programming FAQ entry (GH-22402)
    868c8e4

    @rhettinger
    Copy link
    Contributor

    If space is not an issue, the list comprehension may be fastest.

    I think there is still a misunderstanding here. The generator variant does not save space. It uses slightly *more* space than the list variant.

    The list_ass_slice()¹ function runs PySequence_Fast² on its input. If the input is already a list or tuple, it is returned immediately. If not, the iterator is run to exhaustion and a new list is built. Either way, when the actual update occurs, the source will be a list or tuple.

    ¹ https://github.com/rhettinger/cpython/blob/master/Objects/listobject.c#L597
    ² https://github.com/rhettinger/cpython/blob/master/Objects/abstract.c#L2014

    @terryjreedy
    Copy link
    Member

    New changeset 060937d by Terry Jan Reedy in branch 'master':
    bpo-41774: Tweak new programming FAQ entry (GH-22562)
    060937d

    @miss-islington
    Copy link
    Contributor

    New changeset 7e941fa by Miss Skeleton (bot) in branch '3.8':
    bpo-41774: Tweak new programming FAQ entry (GH-22562)
    7e941fa

    @miss-islington
    Copy link
    Contributor

    New changeset 75dd70e by Miss Skeleton (bot) in branch '3.9':
    bpo-41774: Tweak new programming FAQ entry (GH-22562)
    75dd70e

    @terryjreedy terryjreedy added 3.8 (EOL) end of life 3.9 only security fixes labels Oct 5, 2020
    @terryjreedy terryjreedy changed the title While removing element from list using for and remove(), which has same items output is not right Add programming FAQ entry: remove multiple entries from list Oct 5, 2020
    @terryjreedy terryjreedy added type-feature A feature request or enhancement 3.8 (EOL) end of life 3.9 only security fixes and removed type-bug An unexpected behavior, bug, or error labels Oct 5, 2020
    @terryjreedy terryjreedy changed the title While removing element from list using for and remove(), which has same items output is not right Add programming FAQ entry: remove multiple entries from list Oct 5, 2020
    @terryjreedy terryjreedy added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Oct 5, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes docs Documentation in the Doc dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants