-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add programming FAQ entry: remove multiple entries from list #85940
Comments
Define a list which has all same elements. |
You should not mutate a list while iterating over it. See, for example https://stackoverflow.com/questions/6260089/strange-result-when-removing-item-from-a-list I couldn't find a place where this is mentioned in the python list docs. If it's not there, we should add something to the programming FAQ. |
Removing Windows and IDLE devs from nosy list, since this isn't related to either of those areas. |
You say: "output should be empty list" but that's not actually correct. You intend the output to be the empty list, but if you think carefully about what happens during iteration, you should see that the behaviour is correct. To make it easier to see what is happening, let's use different values: L = [20, 21, 22] On the first iteration, the interpreter looks at position 0, which is 20. You remove 20 from the list, which shrinks the list down to: L = [21, 22] Now it is 21 in position 0. But that iteration of the loop is finished, so the interpreter looks at position 1, which is 22. You remove 22 from the list, giving: L = [21] and the interpreter looks at position 2, which does not exist, so the loop finishes. >>> L = [20, 21, 22]
>>> for i in L:
... L.remove(i)
... print('i', i, 'L is now:', L)
...
i 20 L is now: [21, 22]
i 22 L is now: [21]
>>> L
[21] So the interpreter did exactly what you told it to do. The problem is that what you told it to do is not what you wanted it to do. |
There is a bug in your program. You should not mutate the list while looping over it. |
I think Eric left the issue open because he was hoping to update the FAQs and/or docs with information about this issue. I will leave it to someone else to decide whether or not to reopen it. |
Yeah, apologies - I saw the email notification, but Eric removed me from the nosy list (quite reasonably) and I didn't notice there had been extra discussion :-( If someone wants to re-open the issue, feel free. |
Sreedevi: the Windows tag is meant for issues that involve Windows behavior that is different from other OSes; the IDLE tag is meant for issues that involve the behavior of IDLE, as distinct from other ways of running Python code. They should not be used just because you ran code from IDLE on Windows. There are two aspects to mutating while iteration: correctness and speed. For beginners, the general advice "Don't do it" is not bad. However: Removing items from a list *works* when iterating in reverse if removal is done by index (del mylist[i] rather than value (mylist.remove(value)). But it turns an inherently O(n:=len(list)) operation into a slow O(n*n) operation. The usually recommended alternative is to make a new list of things not deleted. But one can do this in-place by writing the new list on top of the old by using a explicit second index to move items just once. I reopened to add to the programming FAQ Sequences (Tuples/Lists) section |
You are right; the replacement index I called 'j' is better buried as a C index or pointer within a slice replacement. In fact, a generator expression, if one has a keep expression, or a filter call, if one has filter function, work, without the intermediate list. Both also incorporate the keep scan index/pointer in C. I verified that this works by defining 3 functions. def fc(n, keep):
mylist = list(range(n))
mylist[:] = [x for x in mylist if keep(x)]
return mylist
def fg(n, keep):
mylist = list(range(n))
mylist[:] = (x for x in mylist if keep(x))
return mylist
def fl(n, keep):
mylist = list(range(n))
mylist[:] = filter(keep, mylist)
return mylist I added a second test expression.
at the 3 obvious places in the test loop above. In the existing question about removing duplicates, the existing all-hashable answer |
Try running some timings. I suspect the genexp/iterator versions run more slowly. The speed of the slice replacement relies on knowing the size of the input. |
Timings depend on multiple factors, including implementation (cpython versus pypy versus cython, etc) and complexity of the keep/discard decision. I said in the proposed text that a listcomp may be faster. I think that sufficient; anyone who cares can test their specific case. I like the simple ad easy 'slice replacement = iterator' form because it illustrates to me that we have done something right with Python's design. |
I understand that you like it and that it reflects the way you think the world should work, , but that doesn't warrant putting it in the FAQ. We should steer users down a path of feeding unsizable inputs into tooling that needs a size to work well (the receiving code either has to implicitly build a list first before it can start or it will have to have periodic resizes). A straight list comprehension will suffice to answer the question cleanly. FWIW, the same issue occurs with str.join(). It works better with a list comprehension than an iterator. Given an iterator, it has to build an internal list first before it can start. That is slower than starting with a list in the first place and makes the memory consumption implicit when it should be explicit (a generator would create the illusion that a list isn't being formed which is misleading). |
Raymond, please take what I have written and rewrite it to your satisfaction. I have lots else to do, including investigating the IDLE bug you just reported. |
Sorry for the wording of the last message. Go ahead with whatever you would like do :-) Was only trying to point-out that the generator semantics don't interact nicely with slice assignments: $ python3.9 -m timeit -s 'a=list(range(1000))' 'b=a[:]' 'b[:]=(x for x in a)'
5000 loops, best of 5: 46.6 usec per loop
$ python3.9 -m timeit -s 'a=list(range(1000))' 'b=a[:]' 'b[:]=[x for x in a]'
10000 loops, best of 5: 26.3 usec per loop |
I think there is still a misunderstanding here. The generator variant does not save space. It uses slightly *more* space than the list variant. The list_ass_slice()¹ function runs PySequence_Fast² on its input. If the input is already a list or tuple, it is returned immediately. If not, the iterator is run to exhaustion and a new list is built. Either way, when the actual update occurs, the source will be a list or tuple. ¹ https://github.com/rhettinger/cpython/blob/master/Objects/listobject.c#L597 |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: