# Lists

A *list* is a **mutable sequence** (= ordered collection) of **arbitrary objects** enclosed by brackets and seperated by commas. The objects are also referred to as **elements** or **items**.

In the simplest case, to create a list object, we just "list" all elements within brackets.

In [1]:
empty_list = []
simple_list = [40, 50]

The elements do not need to be of the same type and lists can also be **nested**.

In [2]:
numbers = [empty_list, 10, 20.0, "Thirty", simple_list]

In [3]:
numbers

[[], 10, 20.0, 'Thirty', [40, 50]]

Lists are objects as well.

In [4]:
id(numbers)

140250718758728

In [5]:
type(numbers)

list

## Sequence

The number of elements can be obtained with the [len()](https://docs.python.org/3/library/functions.html#len) function. Observe that the contained `empty_list` and `simple_list` count as one element each only.

In [6]:
len(numbers)

5

Using a `for` loop, we can traverse all elements.

In [7]:
for element in numbers:
    print(element, type(element))

[] <class 'list'>
10 <class 'int'>
20.0 <class 'float'>
Thirty <class 'str'>
[40, 50] <class 'list'>


## Membership Testing

The boolean `in` operator checks if a given object is a member of a list. This uses the `==` operator behind the scences (not the `is` operator) and executes a so-called **[linear search](https://en.wikipedia.org/wiki/Linear_search)**, which means Python implicitly loops over *all* list items in order, which is a relatively slow operation for big lists.

In [8]:
10 in numbers

True

The integer $20$ and the float $20.0$ compare equal.

In [9]:
20 in numbers

True

In [10]:
30 in numbers

False

## Indexing

The syntax for accessing the elements of the list is the same as for accessing the characters of a string.

In [11]:
numbers[1]

10

Again, the last index is one less than the above length of the list. Python raises a `IndexError` if we look up an invalid index.

In [12]:
numbers[5]

IndexError: list index out of range

Negative indices are applicable and brackets are chained for nested list element access.

In [15]:
numbers[-1][1]

50

## Slicing

List slices work analogous to strings. Slicing returns a *new* list object.

In [16]:
numbers[1:4]

[10, 20.0, 'Thirty']

In [17]:
numbers[::2]

[[], 20.0, [40, 50]]

An important pattern is taking a "full" slice, which essentially copies all elements of a list into a *new* list object.

In [18]:
numbers_copy = numbers[:]

In [19]:
numbers_copy

[[], 10, 20.0, 'Thirty', [40, 50]]

In [20]:
id(numbers) == id(numbers_copy)

False

As [PythonTutor](http://pythontutor.com/visualize.html#code=simple_list%20%3D%20%5B40,%2050%5D%0Anumbers%20%3D%20%5B10,%2020.0,%20%22Thirty%22,%20simple_list%5D%0Anumbers_copy%20%3D%20numbers%5B%3A%5D&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows, only the pointers to the elements are copied, a concept called **shallow** copy. This becomes important when an element is a mutable object itself as then two different lists point to the same object that can be changed through access via any of the two lists (as we will see in the next section). Whereas shallow copies are the default, there is also a [deepcopy()](https://docs.python.org/3/library/copy.html#copy.deepcopy) function in the [copy](https://docs.python.org/3/library/copy.html) module that creates a new object out of every element that has the same value as the corresponding object in the original list. This concept is called **deep** copy.

We could also see this using the [id()](https://docs.python.org/3/library/functions.html#id) function.

In [21]:
id(numbers[-1]) == id(numbers_copy[-1])

True

## Mutability

A major difference to strings is that list objects are mutable. That means we can assign to indices or slices and also remove items, thereby changing parts of the list object in memory. 

In [22]:
numbers[0] = 0

In [23]:
numbers

[0, 10, 20.0, 'Thirty', [40, 50]]

When we assign to a slice, we can actually change the size of the list.

In [24]:
numbers[:4] = [100, 100, 100]  # assign three elements where there were four before

In [25]:
numbers

[100, 100, 100, [40, 50]]

In [26]:
len(numbers)

4

Note that the list object's memory location does not change.

In [27]:
id(numbers)

140250718758728

Let's change the nested list via `numbers_copy`.

In [28]:
numbers_copy

[[], 10, 20.0, 'Thirty', [40, 50]]

In [29]:
numbers_copy[-1][:] = [1, 2, 3]

In [30]:
numbers_copy

[[], 10, 20.0, 'Thirty', [1, 2, 3]]

This has a surprising side effect on `numbers`.

In [31]:
numbers

[100, 100, 100, [1, 2, 3]]

This is a direct consequence of the shallow copy concept from before. [PythonTutor](http://pythontutor.com/visualize.html#code=simple_list%20%3D%20%5B40,%2050%5D%0Anumbers%20%3D%20%5B0,%2010,%2020.0,%20%22Thirty%22,%20simple_list%5D%0Anumbers_copy%20%3D%20numbers%5B%3A%5D%0Anumbers%5B%3A4%5D%20%3D%20%5B100,%20100,%20100%5D%0Anumbers%5B-1%5D%5B%3A%5D%20%3D%20%5B1,%202,%203%5D&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows that the two lists share a pointer to the same nested list. Consequently, with lists we always need to ask ourselves if we are dealing with shallow copies and if these copies contain *mutable* objects.

Lastly, we can use the `del` statement to remove an element. This is actually not a list method but has a similar effect.

In [32]:
del numbers[-1]

In [33]:
numbers

[100, 100, 100]

This also works for slices.

In [34]:
del numbers[:2]

In [35]:
numbers

[100]

## List Operations

As with strings, the `+` and `*` operators are overloaded and concatenate lists. They always return *new* list objects. However, if any of the lists involved contains *mutable* objects, the above caveat still applies.

In [36]:
first_list = [10, 20, 30]
second_list = [40, 50, 60]

In [37]:
first_list + second_list

[10, 20, 30, 40, 50, 60]

In [38]:
2 * first_list

[10, 20, 30, 10, 20, 30]

In [39]:
second_list * 3

[40, 50, 60, 40, 50, 60, 40, 50, 60]

Depending on the context, the `*` operator does not mean **multiplication** but **unpacking**. The next cell does not create a nested but a *flat* list.

In [40]:
[30, *second_list, 70]

[30, 40, 50, 60, 70]

## List Methods

Lists are a very important **data structures** when it comes to implementing algorithms. Many typical list based data structures as introduced in introductory courses on algorithms are already built into Python. See the [Python documentation](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists) for a full overview. Understanding the methods of built-in types thus further speeds up the development process as such behavior must first be manually implemented in languages like C.

In contrast to string methods, list methods change (= "mutate") a list in place and do *not* create and return a new list object. Instead, they return `None`. So we should not assign the return values of list method invocations to the variable holding the list!

In [41]:
letters = ["c", "b", "a", "x", "p"]

To add an object to the end of the list, we can use the append() method.

In [42]:
letters.append("e")

In [43]:
letters

['c', 'b', 'a', 'x', 'p', 'e']

We can also extend() a list with a list of objects.

In [44]:
letters.extend(["k", "o"])

In [45]:
letters

['c', 'b', 'a', 'x', 'p', 'e', 'k', 'o']

Lists can be sorted *in place* with the [sort()](https://docs.python.org/3/library/stdtypes.html#list.sort) method.

In [46]:
letters.sort()

In [47]:
letters

['a', 'b', 'c', 'e', 'k', 'o', 'p', 'x']

In [48]:
letters.sort(reverse=True)

In [49]:
letters

['x', 'p', 'o', 'k', 'e', 'c', 'b', 'a']

pop() removes the last element from the list and returns it.

In [50]:
letters.pop()

'a'

In [51]:
letters

['x', 'p', 'o', 'k', 'e', 'c', 'b']

It takes an optional *index* argument.

In [52]:
letters.pop(1)

'p'

In [53]:
letters

['x', 'o', 'k', 'e', 'c', 'b']

Instead of removing by index, we can remove() by value.

In [54]:
letters.remove("e")

In [55]:
letters

['x', 'o', 'k', 'c', 'b']

remove() raises an exception if the value is not found.

In [56]:
letters.remove("e")

ValueError: list.remove(x): x not in list

## Lists as Function Arguments

As lists are mutable, the caller of a function sees changes made to a list passed to that function as an argument. This is often a surprising side effect and should be avoided.

In [57]:
letters = ["a", "b", "c"]

In [58]:
def add_xyz(arg):
    arg.extend(["x", "y", "z"])
    return arg

While the function is being executed, two variables, namely `letters` (global) and `arg` (local to the function) point to the same list in memory. We say that the list object is **aliased**. The result is that both the list object passed in and the list object returned from the function are the same. So `letters` and `letters_with_xyz` are aliases as well. Depending on the context, this might be confusing and unintended.

In [59]:
letters_with_xyz = add_xyz(letters)

In [60]:
letters

['a', 'b', 'c', 'x', 'y', 'z']

In [61]:
letters_with_xyz

['a', 'b', 'c', 'x', 'y', 'z']

A better way is to create a copy of the entire list within the function that is then modified and returned. The downside of this approach is the higher amount of memory necessary. This is a typical trade-off when we face big data and need to understand exactly which variable aliases another one. This second and easier to grasp approach is in accordance with the [functional programming](https://en.wikipedia.org/wiki/Functional_programming) paradigm that is going through a "renaissance" currently.

In [62]:
letters = ["a", "b", "c"]

In [63]:
def add_xyz(arg):
    new_arg = arg[:]  # full list slice to obtain a copy
    new_arg.extend(["x", "y", "z"])
    return new_arg

In [64]:
letters_with_xyz = add_xyz(letters)

In [65]:
letters

['a', 'b', 'c']

In [66]:
letters_with_xyz

['a', 'b', 'c', 'x', 'y', 'z']

## Map, Filter, & Reduce

Whenever we need to store data in lists, it is highly likely that what we want to do with the data can be classified into one of the three broader categories of **map**, **filter**, or **reduce** operations. This paradigm has caught a lot of attention in recent years as it promotes **parallel processing**, which is important when dealing with big amounts of data (see this [link](https://en.wikipedia.org/wiki/MapReduce) for some more background).

Let's look at a simple example.

In [67]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9]

### Mapping

Mapping refers to the idea of applying the same transformation to every element in a list.

In the example, let's square each element and add $1$ to it.

In [68]:
def transform(element):
    return (element ** 2) + 1

In [69]:
transformed_numbers = []

for old in numbers:
    new = transform(old)
    transformed_numbers.append(new)

In [70]:
transformed_numbers

[2, 5, 10, 17, 26, 37, 50, 65, 82]

As this type of operation is very common, Python provides a more convenient way for it using the [map()](https://docs.python.org/3/library/functions.html#map) function. This function, however, does not return a **materialized list** object (i.e., all list elements are in the memory simultaneously) but a so-called **iterator** object that can create the list's objects "on the fly" or one by one. It can be converted to a real list object with the [list()](https://docs.python.org/3/library/functions.html#func-list) function. So instead of creating an empty list first and filling it in a `for` loop as above, we could use the following one-liner ...

In [71]:
map(transform, numbers)

<map at 0x7f8ea8507080>

... and convert that into an ordinary list.

In [72]:
list(map(transform, numbers))

[2, 5, 10, 17, 26, 37, 50, 65, 82]

### Filtering

Filtering refers to the idea of creating a subset of a list using a **boolean filter** function (= a function that returns `True` or `False` depending on whether an element should be kept or filtered out).

In the example, let's only keep even numbers.

In [73]:
def is_even(element):
    if element % 2 == 0:
        return True
    return False

In [74]:
filtered_numbers = []

for number in transformed_numbers:
    if is_even(number):
        filtered_numbers.append(number)

In [75]:
filtered_numbers

[2, 10, 26, 50, 82]

Again, Python provides a [filter()](https://docs.python.org/3/library/functions.html#filter) function for convenience. As before, we need to explicitly use the [list()](https://docs.python.org/3/library/functions.html#func-list) function for conversion.

In [76]:
filter(is_even, transformed_numbers)

<filter at 0x7f8ea85077f0>

In [77]:
list(filter(is_even, transformed_numbers))

[2, 10, 26, 50, 82]

We can also create a chain of functions based on the original `numbers` list.

In [78]:
filter(is_even, map(transform, numbers))

<filter at 0x7f8ea85075c0>

In [79]:
list(filter(is_even, map(transform, numbers)))

[2, 10, 26, 50, 82]

Using the [map()](https://docs.python.org/3/library/functions.html#map) and [filter()](https://docs.python.org/3/library/functions.html#filter) functions we could quickly change the order of operations, i.e., filter first and then transform the remaining elements. Additionally, these functions are a lot faster than a `for` loop as they are executed at the C level directly.

In [80]:
list(map(transform, filter(is_even, numbers)))

[5, 17, 37, 65]

### Reducing

Lastly, reduce operations go along a list of objects and "summarize" them into a single object.

A simple example would be the ordinary [sum()](https://docs.python.org/3/library/functions.html#sum) function that can be applied to the objects returned by [map()](https://docs.python.org/3/library/functions.html#map) and [filter()](https://docs.python.org/3/library/functions.html#filter) ...

In [81]:
sum(filter(is_even, map(transform, numbers)))

170

... making all the `for` loops needed to create `filtered_numbers` obsolete. The result is the same, of course.

In [82]:
sum(filtered_numbers)

170

Other simple examples are the [min()](https://docs.python.org/3/library/functions.html#min) or [max()](https://docs.python.org/3/library/functions.html#max) built-ins.

The more general way of reducing a list of objects is to apply a function of two arguments on a "rolling" horizon where the first argument is a reduction of the elements covered so far and the second is the next item in the list.

For example, let's compute the product of a list of integers (in the spirit of the `factorial` function).

In [83]:
def multiply(product_so_far, next_number):
    print(product_so_far, end=" ")  # line added only for didactical purposes
    return product_so_far * next_number

In [84]:
product = filtered_numbers[0]  # use first list item as the initial value

for number in filtered_numbers[1:]:  # iterate over the remaining items
    product = multiply(product, number)

2 20 520 26000 

In [85]:
product

2132000

The [reduce()](https://docs.python.org/3/library/functools.html#functools.reduce) function in the [functools](https://docs.python.org/3/library/functools.html) module in the standard library provides more convenience replacing the `for` loop again.

In [86]:
from functools import reduce

In [87]:
reduce(multiply, filtered_numbers)

2 20 520 26000 

2132000

### Lambda Functions

As we have seen, [map()](https://docs.python.org/3/library/functions.html#map), [filter()](https://docs.python.org/3/library/functions.html#filter), and [reduce()](https://docs.python.org/3/library/functools.html#functools.reduce) always take a function object as their first argument and we defined `transform`, `is_even`, and `multiply` to use them for exactly that. Often times, such functions are **used only once** and then **never again** in a program. However, the **purpose of functions** is to **re-use** them. In these cases, it makes more sense to define them "anonymously" (i.e., without assigning them to a variable) at the position where the first argument goes.

Python provides the `lambda` syntax (see the [documentation](https://docs.python.org/3/reference/expressions.html#lambda)) to **create function objects without a name** pointing to them that must consist of exactly one expression that is then used as the return value (without an explicit `return` statement).

The above `multiply` example could be re-written like so ...

In [88]:
lambda product_so_far, next_number: product_so_far * next_number

<function __main__.<lambda>(product_so_far, next_number)>

... or even shorter as below. The arguments are seperated by commas and listed before a colon, after which follows the expression to be returned.

In [89]:
lambda x, y: x * y

<function __main__.<lambda>(x, y)>

With this we could rewrite the entire example in this section in just a few lines of code thereby saving three `for` loops and three `def` statements. The resulting code is concise, easy to read, quick to modify, and even faster in execution. Most importantly, it is based on iterators and can handle amounts of data that may exceed the memory of the machine it is running on. Such programming "tricks" become more important as datasets increase in size.

In [90]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9]

In [91]:
transformed = map(lambda x: (x ** 2) + 1, numbers)
filtered = filter(lambda x: x % 2 == 0, transformed)
multiplied = reduce(lambda x, y: x * y, filtered)

In [92]:
multiplied

2132000

### List Comprehensions

For the [map()](https://docs.python.org/3/library/functions.html#map) and [filter()](https://docs.python.org/3/library/functions.html#filter) functions, Python provides even more syntactic sugar that is appealing to people with background in mathematics.

Consider the same example again with only one `for` loop that filters and transforms the numbers at the same time. After all, we are not really interested in the intermediate steps in many cases.

In [93]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9]

In [94]:
transformed_and_filtered = []

for number in numbers:
    if number % 2 == 0:
        new = (number ** 2) + 1
        transformed_and_filtered.append(new)

In [95]:
transformed_and_filtered

[5, 17, 37, 65]

Python provides a special syntax to derive new lists out of existing lists, referred to as **list comprehensions**: The inner most expression that is appended to the results list is written first within brackets followed by `for` and `if` clauses (that can also be nested). List comprehensions are actually faster than the equivalent `for` loop approach. Also, it materializes all elements upon execution.

In [96]:
[(n ** 2) + 1 for n in numbers if n % 2 == 0]

[5, 17, 37, 65]

A list comprehension can directly be used as an argument to a reduce operation. For example, let's find the sum.

In [97]:
sum([(n ** 2) + 1 for n in numbers if n % 2 == 0])

124

#### Generator Expressions

The most Pythonic way, however, would be to forego materialized lists alltogether and use an iterator-like approach. This is called a **generator expression** and works exactly like a list comprehension except that the brackets are replaced by parenthesis.

In [98]:
((n ** 2) + 1 for n in numbers if n % 2 == 0)

<generator object <genexpr> at 0x7f8ea850fe60>

When used as an argument to a function, the parenthesis can actually be left out.

In [99]:
sum((n ** 2) + 1 for n in numbers if n % 2 == 0)

124

The best practice is to always use a generator expression where possible as they can easily be converted into a list using the [list()](https://docs.python.org/3/library/functions.html#func-list) function.

In [100]:
list((n ** 2) + 1 for n in numbers if n % 2 == 0)

[5, 17, 37, 65]