**Introduction**

The most basic definition of a set is an **unordered** collection of **distinct** objects.

In Python, elements in a set are generally **unordered** and always **distinct** (they do not compare `==`), but additionally, they must be **hashable**. Sets are implemented as hash maps so they're (almost) the same as dictionaries.

For pretty much all the operations that we'll want to do, e.g. unions, intersections, symmetric differences, checking if disjointed, checking if one set is a subset/superset of another - there will be a method for it, but these return a **new** set, not mutate the original. 

The **key difference** is that with the method approach, you can pass in iterables instead of just sets. 

Here is the full set of operations:

- cardinality: `len(s)`
- membership testing: `in`, `not in`
- unions: `s1 | s2`, `s1.union(s2)`
- intersections: `s1 & s2`, `s1.intersection(s2)`
- differences: `s1 - s2`, `s1.difference(s2)`
- symmetric difference: `s1 ^ s2`, `s1.symmetric_difference(s2)`
- subsets: `s1 <= s2`, `s1.issubset(s2)`
- proper subsets: `s1 < s2`
- supersets: `s1 >= s2`, `s1.issuperset(s2)`
- proper supersets: `s1 > s2`
- disjointness: `s1.isdisjoint(s2)`

Elements of a set must be **unique** and **hashable**. Sets are a mutable collection. 
 
Therefore, a set is **not hashable**. Therefore, they can't be used as a dictionary key, nor can you have a set containing another set. 

**Frozen Sets**

Elements of a set must also be **unique** and **hashable**. Frozen sets are NOT a mutable collection. 

Therefore, a frozen set *is* **hashable**. Therefore, they *can* be used as a dictionary key and you *can* you have a set/frozenset containing a frozen set. 

These take the form: `frozenset(iterable)` where the `iterable` can be a set. These are the immutable equivalent of sets. Tuples to lists are frozen sets to sets. 

**Membership Testing**

This is one of the strong points of a set. Testing membership of an element in a set is **extremely** efficient (hash table lookup). 

Therefore, `if a in {10, 20, 30}:` is much better than `if a in [10, 20, 30]:` or `if a in (10, 20, 30):`.

**But**, there's a higher storage cost since it's a hash table.

# 01 - Creating Sets

We can use literals, constructors, comprehensions or unpacking, but we cannot use literals for an empty set - that will create an empty dictionary.

In [6]:
literal = {'a', 10, 3.14159}
constructor = set(('a', 'b', 'c'))
comprehension = {c for c in 'python'}  # since 'python' is already an iterable, better to use the constructor for this.

Remember, we **cannot** make sets containing mutable elements: 

In [12]:
set([['a', 'b'], ['c', 'd']])

TypeError: unhashable type: 'list'

When it comes to unpacking, we use `*` for iterables and `**` for dictionaries.

Since sets are iterables, we can unpack them with `*my_set`.

In [9]:
s1 = {'a', 'b', 'c'}
s2 = {'d', 'e', 'f'}

{*s1, *s2}

{'a', 'b', 'c', 'd', 'e', 'f'}

Recall that the default iteration on a dictionary will be on its keys, so passing a dictionary to a `set()` constructor will make a set of the keys:

In [13]:
d = {'a': 1, 'b': 2}
set(d)

{'a', 'b'}

Also, note that `*` on a dict will unpack **only the keys**.

In [11]:
my_dict = {'a': 1, 'b': 2, 'c': 3}

my_set = {*my_dict}
my_set

{'a', 'b', 'c'}

# 02 - Common Set Operations

**Adding Elements**

This is straight forward:

In [14]:
s = set()
s.add('python')
s

{'python'}

**Removing Elements**

The three ways are `s.remove(element)`, `s.discard(element)`, `s.pop()` and `s.clear()`. All are mutating operations.

Using `.remove(element)` will throw a `KeyError` exception if the element does not exist, just like with dictionaries. 

In [18]:
s = {'a', 'b', 'c'}
s.remove('b')
print(s)
s.remove('z')

{'c', 'a'}


KeyError: 'z'

Using `.discard(element)` won't throw an exception:

In [21]:
s = {'a', 'b', 'c'}
s.discard('b')
print(s)
s.discard('z')

{'c', 'a'}


`s.pop(element)` will remove and return an **arbitrary** element - `KeyError` if set is empty.

In [23]:
s = {'a', 'b', 'c'}
s.pop()

'c'

`s.clear()` removes **all** elements:

In [24]:
s = {'a', 'b', 'c'}
s.clear()
s

set()

# 03 - Set Operations

Everything in this subsection is just demonstrating the usage of the set operators and set methods on simple examples. For example:

**Method**

In [28]:
s1 = {1, 2, 3}
s2 = {2, 3, 4}
s3 = {3, 4, 5}

print(s1.intersection(s2, s3))

{3}


In [34]:
print(s1.intersection((i for i in range(2, 5)), range(3,6)))  # can pass in generators as they're iterables.

{3}


Remember that the iterable has to contain hashable objects:

In [35]:
{1, 2}.intersection([(1, 2), (3, 4)])

set()

In [36]:
{1, 2}.intersection([[1, 2], [3, 4]])

TypeError: unhashable type: 'list'

**Set Operator**

In [27]:
print(s1 & s2 & s3)

{3}


# 04 - Update Operations

Recall that with lists `l1 += l2` mutates `l1` whereas `l1 + l2` creates a new object. A similar notation exists for sets:

- `|=` or `s1.update(s2)`
- `&=` or `s1.intersection_update(s2)` (whereas `s1.intersection(s2)` creates a new set) 
- `-=` or `s1.difference_update(s2)` ""
- `^=` or `s1.symmetric_difference_update(s2)` ""
  

One somewhat unintuitive thing to remember is that the `difference` method doesn't behave very intuitively:

The operator approach makes sense - we compute the RHS and then the left: `s1 = s1 - ((s2 - s3) - s4)`

In [48]:
s1 = {1, 2, 3, 4}
s2 = {2, 3}
s3 = {3, 4}
s4 = {4, 5}

s1 -= s2 - s3 - s4
s1

{1, 3, 4}

But this approach is unintuitive: `s1 = ((s1 - s2) - s3) - s4`

In [49]:
s1 = {1, 2, 3, 4}
s2 = {2, 3}
s3 = {3, 4}
s4 = {4, 5}

s1.difference_update(s2, s3, s4)
s1

{1}

#### Example

Here's a practical example which demonstrates the use of `difference_update`.

Suppose we have a program that fetches data from some API, database, whatever - and it retrieves a paged list of city names. We want our program to keep fetching data from the source until the source is exhausted, and filter out any cities we are not interested in from our final result.

To simulate the data source, let's do this:

In [20]:
def gen_read_data():
    yield ['Paris', 'Beijing', 'New York', 'London', 'Madrid', 'Mumbai']
    yield ['Hyderabad', 'New York', 'Milan', 'Phoenix', 'Berlin', 'Cairo']
    yield ['Stockholm', 'Cairo', 'Paris', 'Barcelona', 'San Francisco']

Now, we take our data and mutate it by removing any cities that we are not interested in. `filter_incoming('London', 'Paris', data_set={'London', 'Paris', 'Madrid'})` -> `{'Madrid'}`

In [26]:
def filter_incoming(*cities, data_set):
    data_set.difference_update(cities)

In [27]:
result = set()
data = gen_read_data()
for page in data:
    result.update(page)
    filter_incoming('Paris', 'London', data_set=result)
print(result)

{'Hyderabad', 'New York', 'Phoenix', 'San Francisco', 'Barcelona', 'Mumbai', 'Stockholm', 'Cairo', 'Madrid', 'Milan', 'Beijing', 'Berlin'}


# 05 - Copying Sets

We know everything in this section.

For shallow copies, do: `s2 = s1.copy()`, `s2 = set(s1)`, `s2 = {*s1}`

For deep copies, do: `from copy import deepcopy` and `s2 = deepcopy(s1)`.

# 06 - Frozen Sets

# 07 - Dictionary Views