# Section 10: Data Structures Part III - Sets & the Collections Module
## The "Set" Data Structure
**The Python "set" is an *unordered* collection of *unique*, *immutable* objects.** Similar to a dictionary's keys, checking to see if an object is a member of a given set is a $\mathcal{O}(1)$ operation - the time required to check for membership does not depend on the size of the set. Furthermore, this data structure supports set-algebra operations for comparing sets.

### Creating a set
You can initialize a set using the syntax: `{item1, item2, ...}`. Please note that this is distinct from the the dictionary-initialization syntax, which uses a colon to indicate key-value pairs:

```python
# initializing a set containing various immutable objects
>>> {1, 3.4, "apple", False, (1, 2, 3)}
{False, 1, (1, 2, 3), 3.4, 'apple'}
```
A set can be constructed using the generator-comprehension syntax:
```python
# initialization via set-comprehension
>>> {i**2 for i in range(5) if i != 3}
{0, 1, 4, 16}
```

And, like the `list`, `tuple`, and `dict` constructors, `set` constructs a set from an iterable. **You must use `set()` if you want to create an empty set**, using `{}` creates an empty *dictionary*:
```python
# using `set` to consume an iterable to construct a set
>>> set(range(4))
{0, 1, 2, 3}

# creating an empty set
>>> set()  # specifying `{}` would create an empty *dictionary*
```

Redundant items are "ignored" when constructing or adding to a set. Thus **constructing a set is a great way to extract the unique items from a collection**: 
```python
# filter repeat-items from a collection by feeding it into a set
>>> x = [1, 2, 1, 2, 1, "moo", "moo"]
>>> set(x)
{1, 2, 'moo'}
```

### Set operations
Sets support membership-checking, which is an $\mathcal{O}(1)$ operation, along with iteration. Note that sets are unordered, thus the order of iteration is effectively random:
```python
# checking membership in a set
>>> 2 in {1, 2, 3}
True

# iterating over a set (the order of iteration is random)
>>> [i for i in {"a", "b", "c"}]
['b', 'c', 'a']
```
Python also provides the set-theoretic operations of union, intersection, and the relations of set equality and set inclusion. These can be invoked using operator symbols or by calling functions on the set explicitly. 

For an exhaustive list of the functions available to the set, please [refer to the official Python documentation](https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset).

```python
>>> x = {"a", "b", "c", "d"}
>>> y = {"a", "b", "e"}

# union: items in x or y, or both
>>> x | y  # or x.union(y)
{'a', 'b', 'c', 'd', 'e'}

# intersection: items in both x and y
>>> x & y  # or x.intersection(y)
{'a', 'b'}

# difference: items in x but not in y
>>> x - y  # or x.difference(y)
{'c', 'd'}

# symmetric difference: in x or y, but not in both
>>> x ^ y  # or x.symmetric_difference
{'c', 'd', 'e'}

# check if set_1 is a superset of set_2
>>> {1, 2, 3, 4} >= {1, 2}
True

# check if set_1 and set_2 are equivalent sets
>>> {1, 2, 3, 4} == {1, 2}
False
```

A set is a *mutable* object - it can be updated after it was created:
```python
# sets are mutable
>>> x.add("dog")
>>> x.remove("a")
>>> x
{'b', 'c', 'd', 'dog'}
```
Because it is mutable, a set cannot be used as a dictionary-key, nor can a set be a member of another set. Python provides an immutable version of the set, `frozenset`, which has all of the functions of a set other than those that mutate the set:
```python
# `frozenset` is an immutable version of a Python set
>>> frozenset(x)
frozenset({'b', 'c', 'd', 'dog'})
```

<div class="alert alert-block alert-success"> 
**Takeaway**: Python's set is an unordered collection of unique, immutable objects. It is an excellent tool for extracting the unique members from a collection of items. The set provides  $\mathcal{O}(1)$ membership-checking along with a suite of set-algebra operations for comparing sets. `frozenset` is an immutable version of the set.
</div>

## The Collections Module
Python provides a number of valuable, optimized data structures in its ["collections" module](https://docs.python.org/3/library/collections.html). We've already encountered its `deque` and `OrderedDict` data structures. It is recommended that the reader take some time to peruse this module. Here, we will briefly show off some of utility of its data structures. 

Refer to the [official documentation](https://docs.python.org/3/library/collections.html) for a complete listing of the functions available to these data structures.

#### Named-Tuple
A named tuple allows you to form a tuple whose members are named. Thus the user can access a member by-name or via index. Otherwise the named tuple behaves just like a typical tuple. This facilitates clean, readable code.
```python
# demonstrate the use of named tuple
>>> from collections import namedtuple

# define a tuple that holds space-time coordinate
>>> space_time_coord = namedtuple("space_time_coordinate", ['x', 'y', 'z', 't'])

# `r` is a particular space-time coordinate (an instance of our named tuple)
>>> r = space_time_coord(1.5, 2.3, 5.1, 100.2)
>>> r.t  # access the time coordinate "by name"; this is more descriptive than `r[3]`
100.2
```

#### Default Dictionary
A default dictionary allows you to specify an function that will be used to initialize as a "default value" for that dictionary. That is, whenever you try to access a key that does not exist in the dictionary, instead of raising `KeyError`, the mapping $key \rightarrow f()$ will be created in the dictionary:

```python
# demonstrate the behavior of the `defaultdict`
>>> from collections import defaultdict

>>> example_default_dict = defaultdict(list)  # will map any missing key to `list()`
>>> example_default_dict  # an empty default dictionary
defaultdict(list, {})

# "apple" is not a key, so the default mapping "apple" -> list() is created
# and this value is returned
>>> example_default_dict["apple"]  
[]

# this mapping now exists in the dictionary
>>> example_default_dict 
defaultdict(list, {'apple': []})
```

Suppose you want to use a dictionary as a grade book, which maps $name \rightarrow grades$. With a standard dictionary, you have to worry about encountering a student for the first time:
```python
student = "Ryan"
grade = 52  # I failed the test..

# standard dictionary usage
gradebook = {}

# if student isnt in the gradebook, enter that student 
# along with an empty list as the grades
if student not in gradebook:
    gradebook[student] = []

gradebook[student].append(grade)  # append the grade to that student's list of grades
```
The default dictionary's behavior exactly accommodates this initialization process (when providing `list` as the initializtion function):

```python
gradebook = defaultdict(list)
gradebook[student].append(grade)
```

#### Counter
Python's counter data structure is designed for tallying the unique objects that it encounters. It essentially creates a dictionary that maps: $obj \rightarrow count$. Suppose you want to study the distribution of words used in a body of text; counter is perfect for this application:
```python
# demonstrate the `Counter` data structure
>>> from collections import Counter

# Note: We will "normalize" our text by making it all lowercase. 
# We will then split the string by its spaces, storing the resulting 
# tokens in a list. For real text, we would also want to remove punctuation
>>> text_1 = "The cat in the hat"
>>> text_1 = text_1.lower().split()
>>> text_1
['the', 'cat', 'in', 'the', 'hat']

>>> word_distr = Counter(text_1)  # tally the unique objects in `text_1`
>>> word_distr
Counter({'cat': 1, 'hat': 1, 'in': 1, 'the': 2})

# feed additional items to the counter by "update"
>>> text_2 = "The apple in the tree"
>>> text_2 = text_2.lower().split()
>>> word_distr.update(text_2)
>>> word_distr
Counter({'apple': 1, 'cat': 1, 'hat': 1, 'in': 2, 'the': 4, 'tree': 1})

# get the top-2 most common words, along with their counts
>>> word_distr.most_common(2)
[('the', 4), ('in', 2)]

# get the count for the word "tree"
>>> word_distr["tree"]
1
```

`Counter` accepts any iterable of immutable objects:
```python
>>> Counter([0, 0, "moo", (None, None), (None, None), (None, None)])
Counter({(None, None): 3, 0: 2, 'moo': 1})
```
Refer to the [official documentation](https://docs.python.org/3/library/collections.html#counter-objects) for a complete listing of all the nice functions that `Counter` has access to.