In [None]:
%load_ext tutormagic

# Sets as Linked Lists

To develop our set data abstraction, we'll need to choose a representation. As a starter, we'll represent a set as a linked list using the `Link` class we've developed.

Note that sets aren't just ordinary linked lists. Sets are linked lists without any repeated elements.

## Sets as Unordered Sequences

**Proposal 1**: A set is represented by a linked list that contains no duplicate items.

In [None]:
def empty(s):
    return s is Link.empty

If we want to know whether a set is empty, we assume that `s` is a `Link` instance of `Link.empty`.

We'll also need to define a function that checks whether a value is contained within a set.

In [None]:
def contains(s, v):
    """ Return whether set s contain value v
    >>> s = Link(1, Link(3, Link(2)))
    >>> contain(s, 2)
    True
    """

How would we define such function?

## Demo

Below we have the `Link` class,

In [2]:
# Linked lists
class Link:
    empty = ()
    
    def __init__(self, first, rest = empty):
        assert rest is Link.empty or isinstance(rest, Link)
        self.first = first
        self.rest = rest
        
    def __repr__(self):
        if self.rest:
            rest_str = ', ' + repr(self.rest)
        else:
            rest_str = ''
        return 'Link({0}{1})'.format(self.first, rest_str)

We also have the `filter_link` function, which returns elements `e` of `s` in which `f(e)` returns `True`.

In [3]:
def filter_link(f, s):
    """ Return elements e of s for which f(e) is true"""
    if s is Link.empty:
        return s
    else:
        filtered = filter_link(f, s.rest)
        if f(s.first):
            return Link(s.first, filtered)
        else:
            return filtered

And we have `extend_link`, which takes 2 linked lists `s` and `t` and gives us the linked list `s` followed by `t`. 

In [4]:
def extend_link(s, t):
    if empty(s):
        return t
    else:
        return Link(s.first, extend_link(s.rest, t))

And below is the definition of sets as unsorted sequences.

In [5]:
def empty(s):
    return s is Link.empty

`contains` assumes that we're representing `s` as a linked list, but we don't know anything about the order of its elements. Thus we need to define the `contains` function that takes in a set `s` and a value `v` and tells us whether `v` is in the set `s`.

In [6]:
def contains(s, v):
    """ Return true if set s contains value v as an element.
    
    >>> s = Link(1, Link(3, Link(2)))
    >>> contains(s, 2)
    True
    >>> contains(s, 5)
    False
    """
    # If the set is empty, then most likely it's False
    if empty(s):
        return False
    # Checks if the 'first' attribute is the value
    elif s.first == v:
        return True
    else:
        return contains(s.rest, v)

Let's check if the function above works!

In [7]:
s = Link(1, Link(3, Link(2)))

In [8]:
contains(s, 2)

True

In [9]:
contains(s, 5)

False

How long did it take to perform these operations?

| Function | Time order of growth |
| --- | --- |
| `empty` | $\Theta(1)$ |
| `contains` | Time depends on whether and where `v` appears in `s` <br> but approximately $\Theta(n)$ assuming `v` either does not appear in `s` <br> **or** appears in uniformly distributed random location |

For `contains`, $\Theta(n)$ describes the average amount of time that it takes to find a value in a set `s` of length `n` assuming that `v` is either not there or appears in some random location.

## Sets as Unordered Sequences

What other operations might we perform?

In [None]:
def adjoin(s, v):
    # if s already contains v
    if contains(s, v):
        return s # then just return s without doing anything
    else:
        # Otherwise, create a new set with `v` in it
        return Link(v, s)

The time order of growth for `adjoin` is $\Theta(n)$, where `n` is the size of the set. Python has to search through `s` to see whether `v` is within `s`. 

In [None]:
def intersect(set1, set2):
    # in_set2 is a function that takes a value v and checks whether set2 contains v
    in_set2 = lambda v: contains(set2, v)
    # filter set1 for all the elements that are also in set2
    return filter_link(in_set2, set1)

`filter_link(in_set2, set1)` return elements `x` for which `in_set2(x)` returns `True`. It takes $\Theta(n^2)$ to use `intersect` since it uses the `contains` function for every element in `set1`. The order of growth $\Theta(n^2)$ is valid if both `set1` and `set2` are the same size. If they have different sizes (e.g. `set1` = `m`, `set2` = `n`), then the order of growth is $\Theta(m \times n)$

In [1]:
def union(set1, set2):
    # find all the elements that are in set1 but not in set2
    not_in_set2 = lambda v: not contains(set2, v)
    set1_not_set2 = filter_link(not_in_set2, set1)
    # add them to whatever's already in set2
    return extend_link(set1_not_set2, set2)

Above, we don't mutate `set1` and `set2` at all. Instead, we returns a linked list containing all elements in `set1_not_set2` followed by all elements in `set2`. 

The `union` operation has an order of growth $\Theta(n^2)$ due to the following,
1. Use of `filter_link`
2. Use of `extend_link`

However with `extend_link`, we might think that we have order of growth of $\Theta(n^2 + n)$. We treat this the same as $\Theta(n^2)$. 

| Function | Time order of growth | 
| ---- | --- |
| `adjoin` | $\Theta(n)$ |
| `intersect` | $\Theta(n^2)$ |
| `union` | $\Theta(n^2)$ |