**Introduction**

The most basic definition of a set is an **unordered** collection of **distinct** objects.

In Python, elements in a set are generally **unordered** and always **distinct** (they do not compare `==`), but additionally, they must be **hashable**. Sets are implemented as hash maps so they're (almost) the same as dictionaries.

For pretty much all the operations that we'll want to do, e.g. unions, intersections, symmetric differences, checking if disjointed, checking if one set is a subset/superset of another - there will be a method for it, but these return a **new** set, not mutate the original. 

The **key difference** is that with the method approach, you can pass in iterables instead of just sets. 

Here is the full set of operations:

- cardinality: `len(s)`
- membership testing: `in`, `not in`
- unions: `s1 | s2`, `s1.union(s2)`
- intersections: `s1 & s2`, `s1.intersection(s2)`
- differences: `s1 - s2`, `s1.difference(s2)`
- symmetric difference: `s1 ^ s2`, `s1.symmetric_difference(s2)`
- subsets: `s1 <= s2`, `s1.issubset(s2)`
- proper subsets: `s1 < s2`
- supersets: `s1 >= s2`, `s1.issuperset(s2)`
- proper supersets: `s1 > s2`
- disjointness: `s1.isdisjoint(s2)`

Elements of a set must be **unique** and **hashable**. Sets are a mutable collection. 
 
Therefore, a set is **not hashable**. Therefore, they can't be used as a dictionary key, nor can you have a set containing another set. 

**Frozen Sets**

Elements of a set must also be **unique** and **hashable**. Frozen sets are NOT a mutable collection. 

Therefore, a frozen set *is* **hashable**. Therefore, they *can* be used as a dictionary key and you *can* you have a set/frozenset containing a frozen set. 

These take the form: `frozenset(iterable)` where the `iterable` can be a set. These are the immutable equivalent of sets. Tuples to lists are frozen sets to sets. 

**Membership Testing**

This is one of the strong points of a set. Testing membership of an element in a set is **extremely** efficient (hash table lookup). 

Therefore, `if a in {10, 20, 30}:` is much better than `if a in [10, 20, 30]:` or `if a in (10, 20, 30):`.

**But**, there's a higher storage cost since it's a hash table.

# 01 - Creating Sets

We can use literals, constructors, comprehensions or unpacking, but we cannot use literals for an empty set - that will create an empty dictionary.

In [6]:
literal = {'a', 10, 3.14159}
constructor = set(('a', 'b', 'c'))
comprehension = {c for c in 'python'}  # since 'python' is already an iterable, better to use the constructor for this.

Remember, we **cannot** make sets containing mutable elements: 

In [12]:
set([['a', 'b'], ['c', 'd']])

TypeError: unhashable type: 'list'

When it comes to unpacking, we use `*` for iterables and `**` for dictionaries.

Since sets are iterables, we can unpack them with `*my_set`.

In [9]:
s1 = {'a', 'b', 'c'}
s2 = {'d', 'e', 'f'}

{*s1, *s2}

{'a', 'b', 'c', 'd', 'e', 'f'}

Recall that the default iteration on a dictionary will be on its keys, so passing a dictionary to a `set()` constructor will make a set of the keys:

In [13]:
d = {'a': 1, 'b': 2}
set(d)

{'a', 'b'}

Also, note that `*` on a dict will unpack **only the keys**.

In [11]:
my_dict = {'a': 1, 'b': 2, 'c': 3}

my_set = {*my_dict}
my_set

{'a', 'b', 'c'}

# 02 - Common Set Operations

**Adding Elements**

This is straight forward:

In [14]:
s = set()
s.add('python')
s

{'python'}

**Removing Elements**

The three ways are `s.remove(element)`, `s.discard(element)`, `s.pop()` and `s.clear()`. All are mutating operations.

Using `.remove(element)` will throw a `KeyError` exception if the element does not exist, just like with dictionaries. 

In [18]:
s = {'a', 'b', 'c'}
s.remove('b')
print(s)
s.remove('z')

{'c', 'a'}


KeyError: 'z'

Using `.discard(element)` won't throw an exception:

In [21]:
s = {'a', 'b', 'c'}
s.discard('b')
print(s)
s.discard('z')

{'c', 'a'}


`s.pop(element)` will remove and return an **arbitrary** element - `KeyError` if set is empty.

In [23]:
s = {'a', 'b', 'c'}
s.pop()

'c'

`s.clear()` removes **all** elements:

In [24]:
s = {'a', 'b', 'c'}
s.clear()
s

set()

# 03 - Set Operations

Everything in this subsection is just demonstrating the usage of the set operators and set methods on simple examples. For example:

**Method**

In [28]:
s1 = {1, 2, 3}
s2 = {2, 3, 4}
s3 = {3, 4, 5}

print(s1.intersection(s2, s3))

{3}


In [34]:
print(s1.intersection((i for i in range(2, 5)), range(3,6)))  # can pass in generators as they're iterables.

{3}


Remember that the iterable has to contain hashable objects:

In [35]:
{1, 2}.intersection([(1, 2), (3, 4)])

set()

In [36]:
{1, 2}.intersection([[1, 2], [3, 4]])

TypeError: unhashable type: 'list'

**Set Operator**

In [27]:
print(s1 & s2 & s3)

{3}


# 04 - Update Operations

Recall that with lists `l1 += l2` mutates `l1` whereas `l1 + l2` creates a new object. A similar notation exists for sets:

- `|=` or `s1.update(s2)`
- `&=` or `s1.intersection_update(s2)` (whereas `s1.intersection(s2)` creates a new set) 
- `-=` or `s1.difference_update(s2)` ""
- `^=` or `s1.symmetric_difference_update(s2)` ""
  

One somewhat unintuitive thing to remember is that the `difference` method doesn't behave very intuitively:

The operator approach makes sense - we compute the RHS and then the left: `s1 = s1 - ((s2 - s3) - s4)`

In [48]:
s1 = {1, 2, 3, 4}
s2 = {2, 3}
s3 = {3, 4}
s4 = {4, 5}

s1 -= s2 - s3 - s4
s1

{1, 3, 4}

But this approach is unintuitive: `s1 = ((s1 - s2) - s3) - s4`

In [49]:
s1 = {1, 2, 3, 4}
s2 = {2, 3}
s3 = {3, 4}
s4 = {4, 5}

s1.difference_update(s2, s3, s4)
s1

{1}

#### Example

Here's a practical example which demonstrates the use of `difference_update`.

Suppose we have a program that fetches data from some API, database, whatever - and it retrieves a paged list of city names. We want our program to keep fetching data from the source until the source is exhausted, and filter out any cities we are not interested in from our final result.

To simulate the data source, let's do this:

In [20]:
def gen_read_data():
    yield ['Paris', 'Beijing', 'New York', 'London', 'Madrid', 'Mumbai']
    yield ['Hyderabad', 'New York', 'Milan', 'Phoenix', 'Berlin', 'Cairo']
    yield ['Stockholm', 'Cairo', 'Paris', 'Barcelona', 'San Francisco']

Now, we take our data and mutate it by removing any cities that we are not interested in. `filter_incoming('London', 'Paris', data_set={'London', 'Paris', 'Madrid'})` -> `{'Madrid'}`

In [26]:
def filter_incoming(*cities, data_set):
    data_set.difference_update(cities)

In [27]:
result = set()
data = gen_read_data()
for page in data:
    result.update(page)
    filter_incoming('Paris', 'London', data_set=result)
print(result)

{'Hyderabad', 'New York', 'Phoenix', 'San Francisco', 'Barcelona', 'Mumbai', 'Stockholm', 'Cairo', 'Madrid', 'Milan', 'Beijing', 'Berlin'}


# 05 - Copying Sets

We know everything in this section.

For shallow copies, do: `s2 = s1.copy()`, `s2 = set(s1)`, `s2 = {*s1}`

For deep copies, do: `from copy import deepcopy` and `s2 = deepcopy(s1)`.

# 06 - Frozen Sets

#### Lecture

The similarities between tuples vs lists follow through to frozen sets vs sets. 

If **all** elements of a frozen set are hashable, then the frozen set is **hashable**. So, it can be a key of a dictionary or an element of another set, for example. 

Remember that, for tuples, we can **shallow** copy it and the copy will have the same ID. This is safes because tuples are immutables. The same applies for **frozen** sets.

In [4]:
t1 = (1, 2, 3)
t2 = tuple(t1)
print(f"{t1 is t2 = }")

t1 is t2 = True


In [9]:
fs1 = frozenset({1, 2, 3})
fs2 = frozenset(fs1)
print(f"{fs1 is fs2 = }")

fs1 is fs2 = True


But, remember, this doesn't apply to regular sets which are mutable:

In [10]:
s1 = set({1, 2, 3})
s2 = set(s1)
print(f"{s1 is s2 = }")

s1 is s2 = False


**Set Operations**

We can perform **not-mutating** set operations (`& | - ^`) on sets with frozen sets and vice versa. What type would the resulting set be?

Resulting type: type of the **first** operand.

So, if `s3 = s1 & s2`, then `type(s3) == type(s1)`  

In [14]:
s1 = {1, 2, 3}
fs1 = frozenset({3, 4, 5})

intersection_s_fs = s1 & fs1
intersection_fs_s = fs1 & s1

print(f"{intersection_s_fs = }, {type(intersection_s_fs) = }")
print(f"{intersection_fs_s = }, {type(intersection_fs_s) = }")

intersection_s_fs = {3}, type(intersection_s_fs) = <class 'set'>
intersection_fs_s = frozenset({3}), type(intersection_fs_s) = <class 'frozenset'>


**Identity and Equality**

Sets and frozensets containing the same elements will be `==` but not `is`. 

In [18]:
s1 = set(list((1, 2, 3)))
fs1 = frozenset(tuple((1, 2, 3)))

print(f"{s1 is fs1 = }")
print(f"{s1 == fs1 = }")

s1 is fs1 = False
s1 == fs1 = True


#### Coding

##### Example: Memoisation

Let's take a simple example of using the memoisation decorator in the standard library:

In [24]:
from functools import lru_cache

@lru_cache()
def my_func(*, a, b):
    print(f"calculating {a} + {b}")
    return a + b

print(my_func(a=1, b=2))
print(my_func(a=1, b=2))
print(my_func(a=1, b=2))

calculating 1 + 2
3
3
3


As we can see, we only called the function / calculated it once - the other two times, we pulled it from the cache.

We should imagine that the decorator has some dictionary that looks something like:
```python
{
    (1, 2): 3
}
```

But, if we swap the *arguments* `a` and `b` around (but `a` and `b` both have their original values), we'll have to recalculate the value. Our dictionary now looks like this:
```python
{
    (1, 2): 3
    (2, 1): 3
}
```

In [47]:
print(my_func(b=1, a=2))

calculating 2 + 1
3


This is a problem because the entire point of keyword arguments is that they're independent of position. So `a=1, b=2` should be identical to `b=2, a=1`.

Before we fix this limitation with our implementation, let's try running the code below:

In [27]:
print(my_func(a=[1, 2, 3], b=[4, 5, 6]))

TypeError: unhashable type: 'list'

This is a perfectly valid operation that would work if function wasn't decorated. Once decorated, the arguments of the function become keys, but lists **cannot** be keys because they are **unhashable**. It's always useful to reiterate that point.

In [46]:
d = {'a': 1, 'b': 2}

print(set(d.items()))


{('a', 1), ('b', 2)}


Our solution is to write our own memoiser that takes in the keyword arguments and converts them into a frozen set. 

- if `a=1, b=2` ---> `{('a', 1), ('b', 2)}` (this is a frozen set but I'm not using the constructor `frozenset()` so that I can reduce bracket fatigue).
- if `b=2, a=1` ---> `{('b', 2), ('a', 1)}`

Both are identical because order is not important in sets/frozen sets.

We'll let this decorator take positional arguments too, though it's irrelevant to this example. We don't need to convert `args` into a frozenset because they are already hashable.

In [51]:
def memoiser(fn):
    cache = {}

    def inner(*args, **kwargs):
        key = (args, frozenset(kwargs.items()))
        if key not in cache:
            result = fn(*args, **kwargs)
            cache[key] = result
        return cache[key]

    return inner

@memoiser
def my_func(*, a, b):
    print(f"Calculating {a} + {b}...")
    return a + b

In [52]:
my_func(a=1, b=2)

Calculating 1 + 2...


3

In [53]:
my_func(b=2, a=1)

3

As you can see, the second function call did not print `Calculating...` because we read from the cache.

Also note, constructors like `set()` and `frozenset()` take a **single** iterable whose elements will be elements of the set/frozenset.

# 07 - Dictionary Views

#### Reminder from Section 3

**Dictionary Views are Dynamic**

**Views are more than just iterables**. This is something unintuitive. If we store the result of *any* of these views in a variable and then modify the dictionary, the variable will reflect this modification. That is to say, looking up the variable performs a dictionary lookup too. 

In [8]:
d = {'a': 1, 'b': 2}

my_items = d.items()
print(f"{my_items = }, {id(my_items) = }")

d['a'] = 100
d['b'] = 200
d['c'] = 300

print(f"{my_items = }, {id(my_items) = }")

my_items = dict_items([('a', 1), ('b', 2)]), id(my_items) = 2264696224160
my_items = dict_items([('a', 100), ('b', 200), ('c', 300)]), id(my_items) = 2264696224160


Notice how the IDs are the same. 

What's going on under the hood is that when we request a variable that points to a view, we are infact iterating through the original dictionary again to build up the view.

Python does this by going back to the original dictionary, creates a new **iterator** and consumes it. Since the iterator is new, the elements within it reflect the most recent state of the dictionary.

#### Lecture

A very long time ago, a dictionary's `d.keys()` and `d.values()` methods were highly inflexible. Calling them produced lists which duplicated memory, had no set-like functionality, and searching through these iterables requires linear scanning (O(n)) since they were lists.

Now (PEP3106), we have dictionary views which behaves like an iterable, implements set-like behaviour and does not "own" any data (no memory duplication).

**Important**: The order of keys, values and items are guaranteed to be the **same**. So, for example, the fourth item in the `keys` view corresponds to the fourth item in the `values` view, which together corresponds to the fourth item in the `items` view:

In [55]:
d = {'a':1, 'b':2, 'c':3}
print(list(d.keys()))
print(list(d.values()))
print(list(d.items()))

['a', 'b', 'c']
[1, 2, 3]
[('a', 1), ('b', 2), ('c', 3)]


**Iterating through the keys directly vs. through the `keys()` view**

As you know, there's two ways we can iterate through the keys of a dictionary:

In [68]:
d = dict(zip('abc',[1, 2, 3]))
for k in d:
    print(k)

a
b
c


In [69]:
d = dict(zip('abc',[1, 2, 3]))

for k in d.keys():
    print(k)

a
b
c


The difference is that in the first example, we are returning an iterator from the dictionary while in the second, we are returning a `view` object. But, there's no difference in speed - use whichever approach you want.

But if you want set-like functionality, you need to use the `view` object.

**Set Behaviour**

The `keys()` view **always** behaves like a **frozen** set; frozen because we cannot modify the `keys()` view directly - we must modify the dictionary that contains them.

The `items()` view **may** behave like a **frozen** set; this is true if all values are hashable (uniqueness is guaranteed since the keys are guaranteed unique).

The `values()` view will **never** behave like a set; values are not guaranteed to be unique or hashable.

**Modifying a dictionary during iteration**

Python will **not allow** modifying the **size** of a dictionary as you iterate through any view.

The first iteration of this actually runs perfectly fine. But, when we get to the second iteration `d.values()` is **called/computed** again, and since the length of this iterable is different, we break.

In [58]:
d = {'a':1, 'b':2, 'c':3}

for val in d.values():
    del d['c']

RuntimeError: dictionary changed size during iteration

What if we actually do want to do something with a key and then throw it away. Well, a better data structure might be a queue or a deque. But, if we have to use a dictionary, then here's how we could do it:

We'll need to iterate through a static list, and during each iteration, delete the key from the dictionary. Thus, we've avoided modifying the size of a dictionary while iterating through **its own** view:

In [79]:
d = {'a':1, 'b':2, 'c':3}

keys = list(d.keys())  # keys view went from dynamic to static when we converted to list

for k in keys:
    print(k, d[k]**2)  # some random operation before throwing key away
    del d[k]

print(f"final dictionary: {d}")

a 1
b 4
c 9
final dictionary: {}


A better way to do this is with `d.pop()` which will remove the key but give us a handle to the value before doing so: 

In [81]:
d = {'a':1, 'b':2, 'c':3}

for k in list(d.keys()):
    v = d.pop(k)
    print(k, v**2)

print(f"final dictionary: {d}")

a 1
b 4
c 9
final dictionary: {}


There are plenty of other ways that do not require iterating through any of the views:

In [84]:
d = {'a':1, 'b':2, 'c':3}

while len(d):
    key, val = d.popitem()
    print(key, val**2)

print(f"final dictionary: {d}")

c 9
b 4
a 1
final dictionary: {}


Python *will* **allow** modifying the **keys** of a dictionary, **BUT DON'T DO IT**!