# Lecture 4, Part 2 – Collections

## CSS Summer Bootcamp, Week 1 🥾

#### Suraj Rampure

### Collections

In Python, there are four built-in types of _collections_, or _containers_, which are types that store other types:
- `list`.
- `dict`.
- `tuple`.
- `set`.

We've looked at the first two extensively; let's take a look at the other two.

### Tuples

Tuples, like lists, are used to store multiple values, and are ordered (meaning we access elements by their index). However, unlike lists, **tuples are immutable**.

To create a tuple, use (parentheses) rather than [square brackets].

In [None]:
ucsd = ('UC San Diego', 'La Jolla, CA', 1960, 40_000)
ucsd

In [None]:
type(ucsd)

In [None]:
ucsd[0]

In [None]:
ucsd[4]

In [None]:
# This would work if `ucsd` were a list!
ucsd[2] = 1998

### Why tuples?

While tuples may _seem_ like "worse" lists, there are times where we need immutable sequences.

For one: the keys in a dictionary must be immutable. As such, **tuples can be keys, but lists cannot**.

In [None]:
coords = {}
coords[(-1, 4)] = 'UC San Diego'
coords[(2, 5)] = 'UC Irvine'
coords

In [None]:
coords[[3, -1]] = 'UC Berkeley'

In addition, when a function returns multiple outputs, the **type** of the output is `tuple`.

In [None]:
def f(x, y):
    return x + y, y ** 2

In [None]:
out = f(3, 5)
out

In [None]:
type(out)

In [None]:
out[0]

### Sets

Sets are **unordered** collections of immutable items. As they don't have order, they **do not allow duplicates**.

Create an empty set with `set()`, or initialize one with elements using {curly brackets}. Add elements with the `add` method.

In [None]:
bag = {'fries', 'coke'}

In [None]:
bag.add('big mac')

In [None]:
bag.add(('mcchicken', 'apple pie'))

In [None]:
bag.add(None)

In [None]:
# Note that the order they appear in is not the order we added them in!
bag

In [None]:
bag.add('big mac')

In [None]:
# There aren't two 'big mac's now – sets don't have duplicates
bag

In [None]:
bag.remove('big mac')

In [None]:
bag

In [None]:
# Can't put a list in a set, because lists are mutable
bag.add(['mcchicken', 'apple pie'])

**Question:** Can we add a set inside of another set?

### Order really doesn't matter!

In [None]:
{3, 4} == {4, 3}

In [None]:
[3, 4] == [4, 3]

### Why sets?

Since sets don't have order, we cannot access elements by their position.

In [None]:
bag

In [None]:
bag[0]

The primary use case of sets is to **check for membership** – i.e. to keep track of whether items are in it or not. For this use case, sets are **much faster** than lists. Let's see this in action.

### Example: dictionary (of words) 📖

In [None]:
f = open('data/words.txt', 'r')
words_list = f.read().split('\n')
words_set = set(words_list)

Both `words_list` and `words_set` contain the same words. However, it's **much** quicker to check if a given word (like `'dog'`) is in `words_set` than if it's in `words_list`.

In [None]:
%%timeit
'dog' in words_list

In [None]:
%%timeit
'dog' in words_set

µs refers to "microseconds" (one millionth of a second), while ns refers to "nanoseconds" (one billionth of a second). The set approach is ~10,000x faster!

### Sequences vs. iterables

Earlier, we provided the following "pattern" for `for`-loops:

```py
for <elem> in <sequence>:
    <for body>
```

However, **sequences must be ordered**. Sets, as such, are not sequences. You can still iterate over them, though!

In [None]:
bag

In [None]:
for item in bag:
    print('in my bag, I have', item)

In Python, an **iterable** is any object that can produce its elements one at a time. Lists, dictionaries, sets, tuples, and strings are all iterables.

### Set operations

In mathematics, sets have several operations. Python sets support these as well.

In [None]:
A = {2, 3, 8, 9}
B = {2, 4, 5, 9, 10}

The intersection of two sets contains all elements that are in both sets.

In [None]:
A.intersection(B)

In [None]:
# Symmetric
B.intersection(A)

The union of two sets contains all elements that are in either set.

In [None]:
A.union(B)

In [None]:
# Symmetric
B.union(A)

The **difference** of two sets contains all elements in one set that are not in the other.

In [None]:
A

In [None]:
B

In [None]:
A.difference(B)

In [None]:
# Not symmetric!
B.difference(A)

<h3><span style='color:purple'>Activity</span></h3>

Complete the implementation of the function `either_not_both`, which takes in two sets and returns a new set containing all of the elements in either set, **without the elements that are in both sets**.

In [None]:
def either_not_both(A, B):
    ...

## List comprehension

### List comprehension

Back to lists for a minute. Consider the following list.

In [None]:
ages = [23, 35, 18, 19, 5, 42, 49, 13]

Suppose we want to create a new list containing the elements in `age` all increased by 5. Using what we know now, we'd use a `for`-loop:

In [None]:
new_ages = []
for age in ages:
    new_ages.append(age + 5)
new_ages

Is there a more convenient way of creating the same list?

### List comprehension

List comprehension is a technique that allows us to **create a list in a single line, using another iterable (list, set, tuple, dict, string).**

The general pattern is as follows:

```py
[<expression> for <item> in <iterable>]
```

Let's try it out.

In [None]:
ages

In [None]:
[age + 5 for age in ages]

In [None]:
[max(t[0], t[1]) for t in {(1, 2), (3, -1), (9, 2), (3, 8)}]

### List comprehension with conditions

List comprehensions can also follow this pattern:

```py
[<expression> for <item> in <iterable> if <expr>]
```

In [None]:
ages

In [None]:
[age for age in ages if age % 2 == 0]

In [None]:
[x ** 2 for x in range(1, 11) if x % 2 == 1]

List comprehensions can get even more sophisticated!

In [None]:
[i * 2 if i % 2 == 1 else -i for i in range(1, 11)]

<h3><span style='color:purple'>Activity</span></h3>

Write a list comprehension that generates the following list.

```py
[3, 6, '999', 12, 15, '181818', 21, 24, '272727', ..., 93, 96, '999999']
```

## Mutability and immutability

### Motivation: making copies

Consider the following dictionary.

In [None]:
tracks = {
    'Drake': ['Best I Ever Had', "Marvin's Room", 'Controlla'],
    'Lady Gaga': ['Just Dance', 'Paparazzi'],
    'DaBaby': ['Rockstar', 'Suge'],
    'Olivia Rodrigo': ["Driver's License"]
}

What if we'd like to create a **copy** of `tracks`, and keep `tracks` unchanged?

### A naive approach

To create a copy of `tracks`, we might create a new variable, `tracks_copy`, and assign it to `tracks`.

In [None]:
tracks

In [None]:
tracks_copy = tracks

In [None]:
tracks_copy

Now, we _should_ be able to make changes to `tracks_copy` without changing `tracks`... right?

In [None]:
tracks_copy['The Weeknd'] = ['The Hills']

In [None]:
tracks_copy

But wait...

In [None]:
tracks

`tracks` was changed too! 😱

### Mutability

Below, `food` and `drink` are two names for the same dictionary. We can make changes to the dictionary using either `food` or `drink`. **This is because dictionaries are mutable!**

In [None]:
food = {
    'Taco Stand': 'Carne Asada Fries',
    'CAVA': 'Chicken Bowl',
    'Padadak': 'Soy Garlic Wings'
}

drink = food

In [None]:
food['Taco Stand'] = 'California Burrito'

In [None]:
drink['McDonalds'] = 'Diet Coke'

In [None]:
food

In [None]:
drink

Only one dictionary was created above, not two.

Below, we ask Python for the "addresses" of `food` and `drink` in memory.

In [None]:
id(food)

In [None]:
id(drink)

We get the same address both times, meaning that `food` and `drink` are really just different labels for the same dictionary.

<h3><span style='color:purple'>Activity</span></h3>

What values does `dog` contain after running the following lines of code?

```py
cat = [2, 3]
dog = cat
cat[0] = [5]
```

**Try and answer WITHOUT running any code.**

<h3><span style='color:purple'>Activity (followup)</span></h3>

What values does `dog` contain after running the following lines of code?

```py
cat = [2, 3]
dog = cat
cat = [5, 6]
```

**Try and answer WITHOUT running any code.**

### Mutability and function arguments

When a mutable input is passed into a function, the function has the ability to change the input's values **in the notebook, outside of the function's scope!**

In the example below, what happens to `nums` after `appender(nums)` is called?

In [None]:
nums = [8, 4, 5]

In [None]:
def appender(values):
    values.append(99)

In [None]:
appender(nums)

In [None]:
nums

Contrast the above behavior with the behavior of `adder`, specifically when `adder(x)` is called.

In [None]:
x = 5

In [None]:
def adder(n):
    n = n + 1000
    return n

In [None]:
adder(x)

In [None]:
x

### Terminology summary

The table below summarizes the collections we've seen so far (plus strings, which are not quite "collections" but are similar) and their properties.

| Type | Iterable? | Sequence? | Mutable? |
| --- | --- | --- | --- |
| `str` | Yes | Yes | No |
| `list` | Yes | Yes | Yes |
| `dict` | Yes | No | Yes |
| `tuple` | Yes | Yes | No |
| `set` | Yes | No | Yes |