# Lecture 3: mutable types: lists, sets, dictionaries

In this lecture we'll cover the last of Python's important built in data types. Unlike the types that we've seen before, some of these are *mutable*, meaning they can be changed.

Mutable objects can be very handy, but they also introduce some subtle issues that we'll have a look at.

## Mutable types

All types we have considered so far are **immutable**. That means that once a value of the type is constructed, it can never change. To make sure this sinks in properly, consider the following code:

In [None]:
a = 3 # assign 3 to a
a = 5 # assign 5 to a
a

This works fine, and it looks like we're changing a value from 3 to 5. However, what actually happens is the following sequence:

1. Construct an `int` with value 3.
2. Let the variable `a` refer to that value.
3. Construct a *second* `int`, this time with value 5.
4. Redirect the variable `a` to refer to that second value. The first value is now unused and is automatically discarded ("garbage collection").

So the first value is simply discarded, it is *not* modified.

Similarly, strings and tuples are immutable.

In [None]:
# Exercise: assign a string to a variable s and try to change
#           the second letter to be equal to the first.

s = "string"
s[1] = s[0]

In practice, it would sometimes be very handy to be able to change the entry of a tuple; for that reason there is a second built in type in Python that looks a lot like tuples, except that they can actually be modified: `list`.

### Mutable tuples: `list`

Lists behave exactly like tuples, except that they can be modified.

A list is constructed with square brackets instead of round brackets, or as usual, one can use the type constructor `list`.

In [None]:
a = (1,2,3) # a tuple
a[2] = 4    # not allowed

In [None]:
b = [1,2,3] # a list
b[2] = 4    # this should be allowed since b is a list
b

In [None]:
list(range(5)) # a list made using the type constructor

You can assign to lists just like you can assign to tuples, which unpacks the list into variables.

In [None]:
[a,b] = [3,"boo"]
print("a is now", a,"and b is", b)

It doesn't actually matter if you use round or square brackets on either side: the unpacking works anyway.

In [None]:
(a,b) = [3, "boo"]
print("a is now", a,"and b is", b)

Since lists are so much like tuples, but more powerful, the question becomes, why are tuples useful in the first place and when should I use which? 

The most important difference is that tuples can be stored in sets and dictionaries, which are introduced below.

Also, tuples are sometimes a bit faster, and may require less memory than lists. They are more readable, since you don't have to worry that they might get modified somewhere down the line. So the rule of thumb is:

**Use a tuple when you know don't need to modify, or in sets and dictionaries.**

Often, it really doesn't matter all that much so don't get hung up about it.

In [None]:
# Exercise: is the following a tuple or a list?
# Check your answer using the function 'type'.
# Make sure you understand what's going on here!

a = ([4,5],6)
type(a)

You can append a new item onto an existing list using `append`:

In [None]:
b = ["elephant"]
b.append("trunk") # extends the existing list
print(b)

`append` actually changes the list; this is different from concatenation with `+`, which makes a new list:

In [None]:
a = ["a", "list"]
b = ["another", "one"]
together = a+b
a

## The bitter pill: references to mutable values

Now that we've added a mutable type to our arsenal, there are some rules of Python that we need to know about that are not immediately obvious, namely the difference between *copies* of a value (meaning that it appears multiple times in the computer memory), and *references* to a value (all references refer to the same value in computer memory).

I have to warn you that this is a confusing subject and that *many* data scientists who use Python do not know how this works very well. I still want you to learn about this, because it is crucial for a good understanding of Python. It is the most in-depth aspect of the language I will ask you to understand.

In the first lecture, I was careful to write that a variable "refers to" a value. It's not itself that value. Recall the following example:

In [None]:
a = 3 # assign 3 to a
a = 5 # assign 5 to a
a

It looks like an integer value is "modified" from 3 into 5, right?

What actually happens is that *two distinct* integer values get constructed, and `a` refers to first one, then the other.

The distinction does not matter all that much as long as you deal with immutable types. But once you have mutable objects, it's important to realise that, for efficiency reasons, **assignment just creates a *reference*, it does not copy the value**. The following experiment shows how this works:

In [None]:
# construct a list and then let a refer to it
a = ["hello", "innocent", "world", "of", "immutable", "values"] 

# copy the reference: b now refers to *the same* list
b = a

# change b to see what happens to a
b[0] = "goodbye"

a

So, an assignment makes a variable, or an entry in a list, refer to a value *that already exists*. It does not create or copy that value. The same happens when we call a function with a list: the function argument is a *reference* to the list, **not a copy**:

In [None]:
def f(someList):
    someList[0] = "changed"

a = ["unchanged", "list"]
f(a)
a

These rules are subtle and can lead to confusion!

In [None]:
# Exercise:
# Before you run this code: think: what will happen? Why?

def f(someList):
    someList = ["I", "don't", "know"]

def g(someList):
    someList[:] = ["I", "don't", "know"]

    
a = ["unchanged", "list"]

f(a)
print(a)

g(a)
print(a)


### Tl;dr

The moral of the story is: once you work with mutable types, you have to start to distinguish between *constructing* values (possibly by copying a value that already existed), and *referring* to values that already existed.

How can you tell the difference?

Here are some common ways to *construct* new values:
- By using literal expressions: `'bla'` for strings, `3` for integers, `3.5e-4` for floats, `None` for NoneType, `True` for booleans, `("hello",)` for tuples, and `["hello"]` for lists.
- Many functions construct values, in particular the type constructor functions `tuple`, `list` and `range`, but also `str`, `int`, and so on, but also things like the list concatenation operator `+`.

Here are ways to get *references* to values:
- By assignment `=`
- By calling a function. The function arguments are references.

**Python rule: values disappear from memory when there are no more references to them.**

### Testing whether two references are the same

You can check if two expressions refer to the same value using the `is` operator. While `==` checks that one value is the same as another, `is` checks that the two expressions refer to **the same data in memory**.

In [None]:
a="the quick brown fox jumps over the lazy dog"
b="the quick brown fox jumps over the lazy dog"
(a == b, a is b)

In [None]:
b = a
(a == b, a is b)

In [None]:
a = 3
b = 3.0
(a==b, a is b)

Not crucially important but potentially confusing: for small, immutable objects, Python sometimes makes sure that only one instance is ever created. So don't be surprised if your code *looks* like you create two values, but only one actually gets made. This is only detectable if you test for identity with `is`:

In [None]:
a = False
b = False
a is b

In [None]:
a = "hello"
b = str("hello")
a is b

In [None]:
a = 42
b = 42
a is b

**If two expressions refer to the same value according to `is`, they are also the same according to `==`, but not always the other way around.**

**Exercise:** Suppose the variable `a` contains a very long string that takes up 2 gigabytes of the computer's memory. We now do

```
b=a
```

- How much additional memory is consumed after executing this statement? Why?

Now we decide that `b` needs to be the string in reverse, so we use:

```
b=a[::-1]
```

- How much memory does this consume, you think?

## Sets

There are two more kinds of compound values in Python: sets and dictionaries. Both are iterable. 

Lets look at sets first. A set is an **unordered** collection of distinct values. It allows for very quick insertion, deletion and lookup of new values. In contrast, lists allow quick insertion and deletion at the end, and quick lookup by index, but have slow insertion, deletion and lookup by value.

One example of how sets might be used is to determine the number of distinct words in a text; or a set might contain numbers or strings with a particular property, so that you can quickly check whether some number has that property or not, by checking if it's in the set.

Sets can be constructed using curly brackets:

In [None]:
a = {(3,4), "hello", (3,4), True}
a

Sets can also be constructed from iterable values using the type constructor:

In [None]:
set([(3,4),"hello",(3,4),True])

Note that the ordering of the items is lost and that only one among equal entries is retained (equal according to `==`, not `is`).

You can see if an item is in the set using `in`:

In [None]:
("True" in a, (3,4) in a)

In [None]:
3 in a

Note that we saw before that `in` also worked for tuples and strings. However, testing set membership using `in` is *especially* fast, so consider using sets if you have to use `in` a lot!

Sets are iterable, so you can use them in a `for` directly:

In [None]:
for item in a:
    print("Set item:", item)

Note that you are given no guarantees about the order in which they appear. It may be different after Python is updated to a new version, for example. Never rely on the order :)

Sets are also *mutable*. You can add items to sets and remove items using `add` and `remove`, as follows:

In [None]:
a.add("world")
a

In [None]:
a.remove("world")
a

The size of the set can be obtained with `len`:

In [None]:
len(a)

You can also use the following common set operations:

In [None]:
a = { 1, 3, 5 }
b = { 2, 3, 7 }

# set union
a | b

In [None]:
# set intersection
a & b

In [None]:
# set difference
a - b

**Only immutable items can be stored in a set. So in particular, you cannot put a list or another set inside a set.**

In [None]:
a.add(b)

In [None]:
# Exercise: create a set that contains only the string "abracadabra"

In [None]:
# Exercise: use a set to count the number of distinct characters in "abracadabra".


## Immutable sets: `frozenset`

Python offers an immutable set type, called `frozenset`. Roughly, `set` is to `frozenset` as `list` is to `tuple`.

While a list and a tuple of the same values do not count as "equal", a frozenset and a set of the same values do:

In [None]:
print("list equals tuple    :", [3,"boo"] == (3, "boo"))
print("set  equals frozenset:", {3,"boo"} == frozenset({"boo", 3}))

But you cannot add new values to it:

In [None]:
frozenset({3,"boo"}).add("hello")

Frozensets are useful because, being immutable, they can be stored in a set. Sometimes, it's useful to have a set of sets.

In [None]:
a = {"hello", 0}
b = {"world", 1}
a.add(frozenset(b))
a

Now, `a` contains a frozen version of the set `b`, which we can verify by looking it up:

In [None]:
b in a

Note that we looked up the *unfrozen* version - which worked because `b==frozenset(b)`. The same does not work with tuples and lists:

In [None]:
a = ["hello", 0]
b = ["world", 1]
a.append(tuple(b))
a


In [None]:
b in a

## Dictionaries

Dictionaries are like sets, but every element of the set, now called a *key*, is now associated with a second item, called the *value*. So a dictionary is a *mapping* from keys to values. It is constructed like a set, where colons identify `key:value` pairs:

In [None]:
a = { "hello": 1, "world": 2 }
a

In [None]:
type(a)

You can view a dictionary as a generalisation of a list: in a list, the values are always indexed by an integer, but in a dictionary, the values can be indexed by *any* kind of key.

Dictionaries can be used as an easy way to implement discrete functions, such as probability mass functions (keys: the outcomes, values: their probability mass), or to count the frequencies of the elements of a list (keys: the list elements, values: their frequencies).

In practice, dictionaries are used more often than sets.

If you use the dictionary as an iterator, it will iterate over the keys.

In [None]:
tuple(a) # works because a is iterable

You can also request iterators for either the keys or the values explicitly:

In [None]:
print("keys  :", tuple(a.keys()))
print("values:", tuple(a.values()))

As with sets, you can still check if a key is in the dictionary:

In [None]:
("hello" in a, True in a)

But now you can also look up the value associated with a specific key.

In [None]:
a["world"]

Like sets, dictionaries are mutable, so you can change the value associated with a key, and add new keys:

In [None]:
a["new key"] = 5
del a["world"]
a

For some reason, there is no immutable dictionary in Python.

*Next lecture, we'll wrap up the basic language and start looking into actually using Python for real problems :)*