## Deleting Elements from a List

Sometimes we want to remove elements from a list that satisfy some criteria.

In general, we may have some predicate function that determines the match.

Suppose we have the following list:

In [1]:
l = list(range(10))
l

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Now, we want to remove from this list, all the elements that are odd.

Let's create a predicate function for that (we really don't need it for this simple case, but just trying to keep things as generic as possible):

In [2]:
def include(value):
    return value % 2 == 0

We could actually write it this way too, but it's slightly more confusing to read, so I prefer the above option:

In [3]:
def include(value):
    return not value % 2

You may be tempted to iterate over the list and delete elements as you encounter the ones that need to be removed from the list.

In general you'll run into trouble if you modify a collection you are iterating over.

But this can be mitigated in several ways, such as iterating in reverse, something like this:

In [4]:
l = list(range(10))

for i in range(len(l)-1, -1, -1):
    if not include(l[i]):
        del l[i]
        
l

[0, 2, 4, 6, 8]

However, this loop approach is actually quite "slow" (quadratic behavior when you consider that you are looping over the list, and for each element you remove from the list, the item is deleted and remaining items are shifted left).

While this approach works, it's not optimal, especially for large lists.

There are a few more efficient alternatives.

The first is using the `filter` function Python provides:

In [5]:
l = list(range(10))
l = list(filter(include, l))
l

[0, 2, 4, 6, 8]

Another approach is simply to use a comprehension:

In [6]:
l = list(range(10))
l = [el for el in l if include(el)]
l

[0, 2, 4, 6, 8]

One thing you may have noticed is that the first approach (using `del`) mutated the original list, whereas the last two methods we saw just created a new list, and replaced the reference for the symbol `l`.

In [7]:
l = list(range(10))
print(id(l))
l = [el for el in l if include(el)]
print(id(l))
print(l)

4395732672
4395784512
[0, 2, 4, 6, 8]


This may not be what you are looking for - you may truly want to mutate your original list (i.e. not change the reference for `l`) at all.

This is still achievable using slice assignments (although we are still generating a new list, and then mutating our original list, so technically these methods both use more memory than the first one - as is often the case, we trade off speed vs memory)

In [8]:
l = list(range(10))
print(id(l))
l[:] = [el for el in l if include(el)]
print(id(l))
print(l)

4395732800
4395732800
[0, 2, 4, 6, 8]


Now let's look at some timings for these approaches.

In [9]:
from timeit import timeit

In [10]:
def remove_loop(l):
    for i in range(len(l)-1, -1, -1):
        if not include(l[i]):
            del l[i]

In [11]:
def remove_filter(l):
    l[:] = filter(include, l)

In [12]:
def remove_comprehension(l):
    l[:] = [el for el in l if include(el)]

Let's test these functions and make sure they work as expected:

In [13]:
l = list(range(10))
print(id(l))
remove_loop(l)
print(id(l))
print(l)

4395737344
4395737344
[0, 2, 4, 6, 8]


In [14]:
l = list(range(10))
print(id(l))
remove_filter(l)
print(id(l))
print(l)

4394418496
4394418496
[0, 2, 4, 6, 8]


In [15]:
l = list(range(10))
print(id(l))
remove_comprehension(l)
print(id(l))
print(l)

4395749952
4395749952
[0, 2, 4, 6, 8]


Now let's time them on some large lists - but one thing to be careful with - we need to re-create the original list for every test run, otherwise our first test run will remove half the elements of the list, and the remaining runs will not mutate anything (nothing left to remove from the list).

To get around this, I'm going to pass a copy of the original list `l` every time.

In [16]:
l = list(range(200_000))
timeit("remove_loop(l.copy())", globals=globals(), number=10)

8.087936125

In [17]:
l = list(range(200_000))
timeit("remove_filter(l.copy())", globals=globals(), number=10)

0.08694020799999969

In [18]:
l = list(range(200_000))
timeit("remove_comprehension(l.copy())", globals=globals(), number=10)

0.12086675000000113

As you can see, the `filter` and `comprehension` approach run quite a bit faster.

The usual caveat I give when I discuss optimizing your code - **do not optimize prematurely**.

Write your code in the most readable manner possible (without a total disregard for efficiency of course!) - but don't start optimizing your code and refactoring until you understand **where** your code is slow. In the above example, we saved less than 10 seconds - but if your code takes 10 minutes to run, then shaving off 10 seconds might be meaningless (by itself). 

**First** identify the bottlenecks in your code, **then** optimize your code.