# Collections (lists, tuples, sets and dictionaries)

## $ \S 1 $ Lists

### $ 1.1 $ The `list` type

A __list__ (that is, an object of type `list`) consists of zero, one or several
objects ordered in sequence. In Python, a list can be __heterogeneous__, meaning
that its elements can be of any type whatsoever, and these types need not 
coincide. For example, one can create a list which contains integers, floats and
strings; or a list whose elements can be either complex numbers or functions.  In
particular, lists in Python have the important property of __closure__: one is
allowed to make lists of lists, or lists of lists of lists, etc. Finally,
in contrast to the arrays of some other programming languages, lists are
__dynamic__ (not _static_), meaning that their lengths can change during
execution.

A list is represented using _brackets_ in the form `[<elements>]`, with the
elements separated by commas. 

__Example:__

In [3]:
# Here is a list of strings (where each string consists of an emoji):
fruits = ["🍑", "🥝", "🍉", "🥥", "🍋", "🍐"]

# Here is a list consisting of elements of several distinct types:
numbers = [0, 'eight', -53, 12.34, (3 + 4j)]

# This list has a single element:
mages = ['Delfador']

print(fruits)
print(numbers)
print(mages)

['🍑', '🥝', '🍉', '🥥', '🍋', '🍐']
[0, 'eight', -53, 12.34, (3+4j)]
['Delfador']


The __empty list__ is the unique list which has no elements. There are at least
two ways to instantiate it:

In [5]:
empty1 = []         # first way
empty2 = list()     # second way

# They represent the same object:
print(empty1 == empty2)

True


Just as for strings, the function `len` can be
used to count the number of items in a list.


In [6]:
print(len(mages))
print(len(fruits))

1
6


In [7]:
new_list = fruits + mages   # We can concatenate two lists using '+'.
print(new_list)

['🍑', '🥝', '🍉', '🥥', '🍋', '🍐', 'Delfador']


Lists can be __concatenated__ with the `+` operator,
__repeated__ by "multiplication" with a positive integers using `*` and
__sliced__ with the `:` operator. Note that none of these operations _modifies_
the original list; instead, they create a _new_ list. Recall that all of
this is also true for strings.


__Exercises:__ Let _movies_ be the list in the code cell below. Determine the output of the following statements:

(a) `movies * 2`

(b) `movies + ["Paths of Glory", "Modern Times"]`

(c) `["Star Wars", "The Third Man"] + movies`

(d) `movies[:2]`

(e) `movies[::-1]`

(f) `movies + []`

(g) `movies + "error"`

In [None]:
movies = ["Gone with the Wind",
         "Interstellar",
         "E.T.",
         "It's a Wonderful Life",
         "Rain Man",
         "Rambo"]

### $ 1.2 $ Modifying lists

In contrast to strings, lists are __mutable__ objects, meaning that their
individual elements can be modified by assignments.

__Exercise:__ Let _movies_ be the list in the next code cell. What is the value
of _movies_ after each of the statements below is run through the interpreter
in sequence?

(a) `movies[1] = "Forrest Gump"`

(b) `movies[2:4] = ["Modern Times", "Paths of Glory"]`

(c) `movies[-1] = "Bicycle Thieves"`

(d) `movies += ["Das Leben der Anderen"]`


In [9]:
movies = ["Gone with the Wind",
         "Interstellar",
         "E.T.",
         "It's a Wonderful Life",
         "Rain Man",
         "Rambo"]

⚠️ In order to _modify_ the element at the $ k $-th index of a list, the list
must have items associated with every index between $ 0 $ and $ k - 1 $. Trying
to access in any way the $ k $-th element of a list of length $ \le k $
generates an `IndexError`.

__Example:__

In [10]:
# A long list can be split over several lines for better readability:
planets = [
    "Earth",
    "Mars",
    "Jupiter",
    "Saturn",
    "Neptune"
]
planets[5] = "Kepler-452b"

IndexError: list assignment index out of range

### $ 1.3 $ Some list methods

Lists support several useful methods (recall that a __method__ is a function
associated with a specific class or type). Here are examples of how some of them
are used.

__Examples:__ Perhaps the most frequently used list method is `append`, which can be used
to add an element to the end of a list:

In [24]:
fruits = ["🍑", "🥝", "🍉", "🥥", "🍋", "🍐"]

fruits.append("🍎")     # Append an apple to the end of the list
print(fruits)

['🍑', '🥝', '🍉', '🥥', '🍋', '🍐', '🍎']


More generally, we can use `insert` to insert an element at a specified position:

In [25]:
fruits.insert(1, "🫐")     # Insert blueberries as the fruit at index 1
print(fruits)

['🍑', '🫐', '🥝', '🍉', '🥥', '🍋', '🍐', '🍎']


The method `index` returns the index of the first occurrence of an element in a
list. If there is no such element, it yields a `ValueError`.

In [26]:
print(fruits.index('🥥'))

4


In [27]:
fruits.index('🏫')

ValueError: '🏫' is not in list

With `remove` we can remove (in place) the _first occurrence_ of an element of a
list. Again, trying to remove an element which is not currently in the list
raises a `ValueError`.

In [28]:
fruits.remove('🍋')
print(fruits)


['🍑', '🫐', '🥝', '🍉', '🥥', '🍐', '🍎']


Perhaps the most useful list method is `sort`, which efficiently sorts the given
list in place. Similarly, `reverse` does what its name suggests:

In [29]:
fruits.sort()            # Sort the list
print(fruits)

fruits.reverse()         # Reverse the order of the elements in the list
print(fruits)

['🍉', '🍎', '🍐', '🍑', '🥝', '🥥', '🫐']
['🫐', '🥥', '🥝', '🍑', '🍐', '🍎', '🍉']


The last method that we will consider is `pop`, which is essentially the inverse
of `append`: when called without any arguments, it removes the last item of the
list and returns it as output.

In [30]:
print(fruits)

a = fruits.pop()        # Use 'pop' without any arguments to remove the
print(fruits)           # last item and return it as output

b = fruits.pop(2)       # Remove the element having index 2
print(fruits)           # and return it as output

print(a, b)

['🫐', '🥥', '🥝', '🍑', '🍐', '🍎', '🍉']
['🫐', '🥥', '🥝', '🍑', '🍐', '🍎']
['🫐', '🥥', '🍑', '🍐', '🍎']
🍉 🥝


__Exercise:__ Let _planets_ be the list provided in the code cell below,
representing the planets of our solar system. Describe the list and the
output after each of the following statements is run in sequence through the
interpreter.

(a) `planets.insert(1, "Vulcan")`

(b) `planets = planets + ["Pluto"]`

(c) `planets.remove("Vulcan")`

(d) `planets.sort(reverse=True)`

(e) `planets.index("Mars")`

(f) `planets.append("Planet X")`

(g) `planets.pop(2)`

(h) `planets.reverse()`

In [10]:
planets = ["Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune"]


__Exercise:__ What is `[] * 3`? What is `[[]] * 3`? What is `[[]] * (-1)`?

## $ \S 2 $ Tuples

### $ 2.1 $ The `tuple` type

Another sequential data type is `tuple`, the type of __tuples__. Like a list, a
tuple consists of an ordered sequence of objects of possibly distinct types,
separated by commas. Also as for lists, the types of the elements held in
a tuple are completely arbitrary and tuple formation enjoys the closure
property, allowing us to create tuples of tuples, etc.

Tuples are enclosed by _parentheses_ `()` instead of brackets.  The raison
d'être of tuples is that they are __immutable__, so that, like strings but
unlike lists, their individual elements _cannot_ be modified. Trying to do
so will raise a `TypeError`.

__Example:__

In [25]:
t = (8, 'Anna', 23.49, [1, 2, 3])
print(t, type(t))

(8, 'Anna', 23.49, [1, 2, 3]) <class 'tuple'>


📝 Actually, the enclosing parentheses are not strictly required to define a
tuple. Rather, it is the presence of commas `,` which makes a sequence of values
a tuple. However, it is generally a good idea to include them as delimiters for
clarity, especially in more complex expressions.


In [35]:
numbers = 1, 2, 3  # <--- This is a tuple!
print(numbers, type(numbers))

single = 1,  # <--- This is also a tuple! The comma is necessary to distinguish it from the number 1
print(single, type(single))

(1, 2, 3) <class 'tuple'>
(1,) <class 'tuple'>


In [37]:
empty_tuple1 = ()       # This is the empty tuple
empty_tuple2 = tuple()  # Another way to instantiate the empty tuple

print(empty_tuple1, type(empty_tuple1))
print(empty_tuple1 == empty_tuple2)

() <class 'tuple'>
True


### $ 2.2 $ Operations on tuples

As for the other sequential types that we have considered (strings and lists), tuples can be concatenated with `+`, their lengths can be computed using `len`, and we can access their elements and slice them using `[]` and `[:]`.

__Exercise:__ The tuple in the code cell below records some data about a famous
scientist. Describe the output and the effect of the following statements:

(a) `record[0]`

(b) `record[2]`

(c) `record[-1]`

(d) `record[:]`

(e) `len(record)`

(f) `record + record`

(g) `record *= 2`

(h) `record[4] = 'United States'`

(i) `print(record[0], record[1])`

In [28]:
record = ('Albert', 'Einstein', 'physicist', 26, 'Germany')

Albert Einstein


To convert a tuple to a list, use `list` as a function. Similarly,
to convert a list to a tuple, apply `tuple` to it.

__Example:__

In [32]:
scientist = ('Marie', 'Curie', 'chemist', 32, 'Poland')
data = list(scientist)
print(data, type(data))

['Marie', 'Curie', 'chemist', 32, 'Poland'] <class 'list'>


### $ 2.3 $ Some warnings

⚠️ To define a tuple consisting of a single item, a comma must still be used, so that the tuple can be disambiguated from an expression surrounded by parentheses:

In [39]:
language = ('Sindarin', )         # To define a nonempty tuple, we must include a comma!
print(language, type(language))

lang = ('Sindarin')               # This is not a tuple, but rather a string;
print(lang, type(lang))           # the parentheses play no role in this case.


('Sindarin',) <class 'tuple'>
Sindarin <class 'str'>


<div class="alert alert-warning"> Even if $ x $ and $ y $ are two tuples or
lists of the same length and whose items are of the same numerical type,
<code>x + y</code> is <i>not</i> obtained by summing their respective elements;
it is instead the <i>concatenation</i> of $ x $ and $ y $.  Similarly, if $ a $
is a scalar, then <code>a * x</code> is not the result obtained by multiplying
each item of $ x $ by $ a $, even if $ a $ is an integer.  </div>

⚠️ Neither lists nor tuples are adequate data structures to represent _vectors_ in
the sense of linear algebra. The most adequate type for this task is an
`ndarray` (short for _$n$-_dimensional array_), provided by the
[__NumPy__](https://scipy.github.io/old-wiki/pages/Numpy_Example_List.html)
module, which we will consider later.

## $ \S 3 $ Other common iterable data types

Although we will not discuss them in detail, Python also provides a few
other iterable data types besides strings, lists and tuples. The two most
important and useful ones are __sets__ (type: `set`), which behave very much
like sets in mathematics, and __dictionaries__ (type: `dict`), which consist of
key-value pairs and are also called _hash tables_ or _associative arrays_ in
other programming languages.  Both sets and dictionaries are _mutable_, that is,
their contents can be modified.

### $ 3.1 $ Sets

To create a set, we may list its elements separated by commas
and within braces `{ }`.

__Example:__

In [45]:
one_two_three = {1, 2, 3}
print(one_two_three, type(one_two_three))

# In sets, repetitions do not matter. Hence, one_two_three == set_2 if
set_2 = {1, 2, 2, 3, 3, 3, 3, 3}
print(one_two_three == set_2)

{1, 2, 3} <class 'set'>
True


In [47]:
# Similarly, in sets the order in which elements are listed is irrelevant:
set_3 = {3, 2, 1}
print(one_two_three == set_3)

True


Here are the main methods associated to sets, with `a` and `b` denoting arbitrary sets:

| Method syntax             | Equivalent syntax | Description                                                        |
|-------------------------:|:-----------------:|:------------------------------------------------------------------|
| `set.add(elem)`           |        N/A        | adds an element to the set.                                        |
| `set.remove(elem)`        |        N/A        | removes an element from the set.                                   |
| `set.discard(elem)`       |        N/A        | removes an element from the set if it is a member.                 |
| `set.pop()`               |        N/A        | removes and returns an arbitrary element from the set.             |
| `a.isdisjoint(b)`         |        N/A        | returns `True` if `a` has no elements in common with `b`.          |
| `a.issubset(b)`           |      `a <= b`     | returns `True` if all elements of `a` are in `b`.                  |
| `a.issuperset(b)`         |      `a >= b`     | returns `True` if all elements of `b` are in `a`.                  |
| `a.union(b)`              |      `a \| b`     | returns a new set with elements that are in either `a` or `b`.     |
| `a.intersection(b)`       |      `a & b`      | returns a new set with elements common to `a` and `b`.             |
| `a.difference(b)`         |      `a - b`      | returns a new set with elements in `a` that are not in `b`.        |
| `a.symmetric_difference(b)` |    `a ^ b`      | returns a new set with elements in either `a` or `b` but not both. |


__Exercise:__ Let $ A = \{1, 2, 3, 4, 5\} $ and $ B = \{-3, -1, 1, 3, 5, 7\} $.
Using Python, compute:
* The union $ A\cup B $;
* The intersection $ A \cap B $;
* The differences $ A \smallsetminus B $ and $ B \smallsetminus A $;
* The symmetric difference $ A \,{\Delta}\, B = \big(A \smallsetminus B\big) \cup \big(B \smallsetminus A\big) $.
* The number of elements in each set (using the `len` function).


### $ 3.2 $ Dictionaries

Dictionaries allow one to create an iterable object whose values need not be
referenced by the integers $ 0,\,1,\,2,\, \dots $, as is the case for lists and
tuples. Instead, one can use __keys__ of any immutable type as indices. This
allows for more flexible and intuitive manipulation of data. Note however that
dictionaries and sets use more memory than either lists or tuples.

__Example:__ To create a dictionary, we list key-value pairs in the form `<key>:
<value>` inside braces and separated by commas. The values can be of any type,
while the keys can be of any _immutable_ type. Here is an example:

In [52]:
info = {"name": "Bilbo Baggins",
        "age": 23,
        "race": "Hobbit",
        "height": 110.3,
        "email": "bilbo@hobbitmail.com",
        "friends": ["Frodo", "Pippin"]}

print(info)
print(type(info))

{'name': 'Bilbo Baggins', 'age': 23, 'race': 'Hobbit', 'height': 110.3, 'email': 'bilbo@hobbitmail.com', 'friends': ['Frodo', 'Pippin']}
<class 'dict'>


We can access the values stored in a dictionary by referring to the
corresponding key inside brackets:

In [53]:
print(info["name"])
print(info["friends"])

Bilbo Baggins
['Frodo', 'Pippin']


## ⚡ $ \S 4 $ Mutable and immutable objects

If lists and tuples are so similar, it may not be clear why Python provides
both data types. In fact, technically we could always get by using only one of
them. However, the versatility has a few advantages.

In some cases, an object in the real world may more adequately be conceived of
as having an identity which is completely determined by its parts. For
example, we think of a rational number such as $ 2 / 3 $ as a pair of integers
(its numerator and denominator); if we change the denominator to $ 5 $, the
result is a different fraction.

In other cases, however, it is better to think about an object's identity as 
being something distinct from the mere totality of its pieces. For instance, it
is more adequate to think of someone's bank account as being the same object
from one day to the next, even though the client's address, balance or even
her legal name may have changed in the meantime.

To what extent does an object retain its identity after it is modified? This is
a difficult philosophical question which has _a priori_ nothing to do with
programming, even though it greatly affects how we may choose to represent a
given object in Python.

### $ 4.1 $ Definitions and examples

An object in Python is called __mutable__ if its state or contents can be
changed after it is created; otherwise, it is called __immutable__.
* Strings, integers, floats, and tuples are all _immutable_.
* Examples of _mutable_ objects include lists, dictionaries, and sets.

🚫 Since a tuple is _immutable_, an attempt to modify one or more of its
elements by an assignment results in a `TypeError`:

In [None]:
coordinates = (1.2, 5.6)
coordinates[0] = 3.4

TypeError: 'tuple' object does not support item assignment

In [None]:
# A list is a mutable object:
a_list = [1, 2, 3]

# Hence, modifying an element of the list is allowed:
a_list[0] = 47
print(a_list)    # Output: [47, 2, 3]

[47, 2, 3]


### $ 4.2 $ Understanding assignments of mutable and immutable objects

When binding a variable to an object, at first sight the behavior of the
assignment may seem different based on whether the object is mutable or
immutable. In order to dispel any confusion, it is helpful to think of the
assignment in terms of __pointers__ or __references__ to objects. When an object
is assigned to a variable, this variable does not become _identified_ with the
object; rather, it is merely a _pointer_ to the memory location storing the
object.  Therefore, if we reassign another object to the identifier `x`, this
variable will now refer to a new memory location holding the new object:

In [None]:
x = 12         # x points to the memory location holding the integer 12.
y = x          # y points to the same memory location as x
print(x, y)
# Despite the use of the equality sign, y is not equal to x. They
# are two distinct identifiers that currently _refer_ to the same object.

x = 34         # x now points to a _new_ memory location, holding the integer 34.
print(x, y)    # However, y still points to the memory location holding 12.

12 12
34 12


In this case, the assignment statements create two references to the immutable
object `12`, and the reassignment of `x` changes its reference point to a new
object, `34`. The same description also holds when the objects involved are
mutable:

In [None]:
x = [1, 2]     # x points to the location holding a list containing 1 and 2.
y = x          # y refers to the same memory location (list object) as x.
print(x, y)

x = [3, 4]     # x now points to a _new_ memory location holding another list.
print(x, y)    # However, y still points to the original memory location.

[1, 2] [1, 2]
[3, 4] [1, 2]


Finally, consider the following example:

In [None]:
x = [1, 2]
y = x
print(x, y)

x[0] = 34      # The list object in that location is _modified_ to [34, 2].
print(x, y)    # Both x and y still point to the same object.

[1, 2] [1, 2]
[34, 2] [34, 2]


Using our mental model, the last result should not come as a suprise.
Again, `x` should not be thought of as _coinciding_ with the object to
which it was assigned (the list `[1, 2]`); rather, it is only a _reference_
to it. If we modify the object itself through `x` as in the fifth line,
its location in memory does not change. Hence, when we access that object once
again through the reference `y`, the modification will be shown, as expected.

📝 To create an independent copy of a string or list, we can use a complete
slice of the object.

__Example:__

In [None]:
x = [0, 1, 2]
y = x[:]         # y points to an independent copy, stored at another memory address.

x.pop()
print(x)
print(y)        # Note that y has not been affected by the modification of x.

[0, 1]
[0, 1, 2]


The immutability of a tuple means that once a tuple is created, the tuple
itself cannot be altered. This includes adding/removing elements and changing
the identity of any element within the tuple. However, if a tuple contains
mutable objects (like lists), the _state_ of those mutable objects can be changed,
even though the tuple itself cannot be modified directly. Consider the following example:

In [None]:
my_tuple = (1, 2, [30, 40])
my_tuple[2].append(50)  # Appends 50 to the list [30, 40]
# This is allowed because we're modifying the mutable list, not the tuple itself.
print(my_tuple)

(1, 2, [30, 40, 50])


In [None]:
# Attempting to change (the identity of) an element directly
# through assignment, however, results in an error:
my_tuple[2] = [30, 40, 50]

TypeError: 'tuple' object does not support item assignment

### $ 4. 3 $ Differences between mutable and immutable objects
 
Immutable objects cannot be modified directly, while mutable objects can; this
is the primary difference between the two categories. However, this also has
several implications:

* _Object sharing:_ Mutable objects can have unintended side effects when shared
  across different parts of the code, as changes made to the object through one
  reference affect all references. This type of bug can be particularly
  dangerous and difficult to spot.
* _Usage as dictionary keys:_ Only immutable objects can be used as
  keys in dictionaries. This way we can guarantee that the key's hash value will
  not change during the lifetime of the dictionary.
* _Performance:_ Operations on immutable objects are faster in some cases.
  However, in general the difference is not significant.

Understanding the difference between mutable and immutable objects is important
because it directly influences how we design data structures and algorithms.