# ADTs

Data structures can be viewed two ways.

1. In terms of their _concrete implementations_: for example, a fixed-size array, dynamic array, or linked list. These structures are defined by their representation in memory and the algorithms that implement their operations.
2. In terms of what _abstract operations_ they support: for example, a stack might support `push`, `pop`, and `peek`.

When we take the latter view, we are considering the data structure as an **abstract data type (ADT)**.

An ADT can have many potential implementations. For example, a stack can be implemented using arrays, linked lists, or even (in a contrived fashion) using dicts or numerous other underlying representations.

The O(n) performance of operations using any ADT implementation depends, of course, on the underlying representation, and how that representation is used to implement the ADT. Consider a stack implemented in two ways using a Python list:

(A) Using the _front_ of the list as the "top" of the stack:

* `push`: using Python list `insert` at index 0, takes O(N) time (we must shift all elements right)
* `pop`: using Python `del` operator at index 0, takes O(N) time (we must shift remaining elements left)
* `peek`: using Python indexing operator at index 0, takes O(1) time

(B) Using the _back_ of the list as the "top" of the stack:

* `push`: using Python list `append`, takes O(1) amortized time
* `pop`: using Python list `pop`, takes O(1) amortized time
* `peek`: using Python indexing operator at index 0, takes O(1) time

Clearly, it's better to use the back as the top.

Notice that although the _stack ADT_ operations can be phrased in terms of operations on the _Python list representation_, the two concepts are distinct; the stack is a facade or wrapper around the list. Code that that uses a stack --- the ADT's "client" --- should concern itself with the stack abstraction only, and have no access to the underlying list representation. In other words, the only code that should know about an an ADT implementation's representation is the implementation itself.

"Protecting" the representation of an abstraction from inappropriate access is called **encapsulation**. Different programming languages have different ways of encapsulating abstractions. In Python, the most common way is to define a class which keeps the representation "hidden" as one or more private fields of self. You can mark a field private by starting the name with one or two underscores:

```python
class MyStack1:
    def __init__(self):
        self._rep = ...
```

or

```python
class MyStack2:
    def __init__(self):
        self.__rep = ...
```

Note that the name "rep" is not special here; it is just a placeholder.

> Aside: Some nitty-gritty Python details:
>
> * Using one leading underscore for private fields is just a convention that has no special meaning to the Python interpreter, but signals to other programmers that client code should not touch these fields.
> * Using two leading underscores instructs Python to "mangle" the name of this field so that it is hard to reference from outside of the class's members (but still possible).
>
> In either case, it is _possible_ for client code to access these fields, but only very bad programmers will access the private implementation of a Python class from outside that class (outside of certain situations like debugging or metaprogramming).

Failing to encapsulate abstractions, or to respect abstraction boundaries, is **evil** --- **exposing** your **representation** has several bad effects. The primary one is that it _tightly couples_ your client code to your implementation code. If you decide to change something about how your implementation works, such as changing a private variable name, you may accidentally break clients. Conversely, if client code reaches into the implementation and mutates something, it may break some invariant that the implementation was depending on, and this defect can be hard to track down.

Don't do it!

Exercises to test your understanding:

* How would you implement the three stack ADT operations using a singly linked list?
* How would you implement a stack using a dict and a counter variable?

Answers at the end of this notebook.

### Specifying an ADT 

When you specify an ADT, you should **never** have anything about the representation leaking into your specification.

A **specification** is a **collection of procedural abstractions**, not a collection of procedures. In other words, the specification describes _the externally observable behavior_ of the abstraction, not about how the code implementing the abstraction works.

Typically, an ADT specifies interfaces (APIs) to

- construct the ADT
- add, manipulate, and access values in the ADT
- destroy the ADT, cleaning up its resources if necessary (note that since Python is a garbage collected language, often it does not require an explicit operation to destroy the ADT)

How should we write down the ADT specification?

One way is the way we've used so far: to write it down in prose. 

For example, we could just say that a stack is a sequence supporting insertion and removal at one end. But this is fairly vague. Given this description, we have very little guidance on how to implement this abstraction (for example, what should the method names be called?), and equally little on how to use it.

If we wanted greater precision, we could enumerate the methods, and informally specify what each method should do. Often this is what we do in documenting APIs for clients. If you read the [Python standard library documentation](https://docs.python.org/3/library/), this is largely the method used to describe its APIs.

Another method is to specify our interfaces using code. Python provides a number of ways to do this.

One way is to (mis?)use inheritance by creating an **abstract base class** which specifies the interface.  Implementations would inherit directly from this. For example, we could define a `Stack` as follows:

```python
class Stack:

    def push(self, value):
        """push should accept a value, and add it to the top of the stack"""
        raise NotImplementedError

    def pop(self):
        """pop should remove the top value from the stack and return it"""
        raise NotImplementedError

    def peek(self):
        """peek should return the top element in the stack"""
        raise NotImplementedError
```

and our implementations could inherit from it:

```python
class MyStack(Stack):

    def push(self, value):
        ...  # real implementation code here

    def pop(self):
        ...  # real implementation code here

    def peek(self):
        ...  # real implementation code here
```

The `Stack` class here serves two purposes.  The first is simply documentation: it is a place for the programmer to state precisely what a stack should do, including what methods it should provide, how many arguments each method takes, etc. The second is to provide the method implementations which raise `NotImplementedError`. The implementor should override all these methods in subclasses; if some method is not overridden in a subclass, then a client which uses that subclass will receive this raised exception when they call the method. `raise NotImplementedError` is an idiom in Python signaling to the client that the implementor of the class forgot to implement a required method.

This technique has a downside, however. Suppose that the programmer forgets to implement `peek` in a Stack implementation:

```python
class BrokenStack(Stack):
    
    def push(self, value):
        ...  # real implementation code here
    
    def pop(self):
        ...  # real implementation code here
```

This error will only be discovered when a client _actually executes_ a line of code that calls peek:

```python
s = BrokenStack()
...
# Much later, after running your analysis for 8 hours overnight,
# the following line raises NotImplementedError, killing your job.
# Oh well, hope you didn't have a deadline!
v = s.peek()
```

**Interfaces** (sometimes called **protocols**) are important enough that many languages have explicit constructs for describing them. For example, Java has an `interface` declaration; C++ and many other languages have interface-like constructs as well. In some languages, we can verify that an interface is completely implemented _before_ we run the line of code that invokes an unimplemented method.

Python has several idiomatic ways to describe interfaces. The most common, as we've already described, is simply English prose in doc comments, possibly accompanied with an abstract base class declaration of the type shown above. But it also has more advanced support for defining interfaces using abstract base classes in the `abc` package, which we'll describe in the next section.

### Abstract Base Classes using abc.ABC

As a preface to this section, it is important to distinguish between the _concept_ of an abstract base class, and the Python `abc.ABC` construct.

We've already described abstract base classes above. `Stack` is an abstract base class, because it is not meant to be instantiated directly (you would never want to write `s = Stack()`). _It exists only to be inherited from_ --- in particular, by implementations of the stack abstraction. Abstract base classes exist in almost all object-oriented programming languages, and they are implemented in a variety of ways.

By contrast, Python `abc.ABC` classes are one mechanism that Python offers to define and enforce interfaces. It is somewhat unfortunate that this mechanism was called `abc.ABC` instead of `interface.Interface`.

With that out of the way, let's get down to nuts and bolts.

Notice that every class has an implicit interface defined by its publicly accessible attributes (methods or data). This also includes dunder methods; while these are NOT publicly accessible, they control the external behavior on functions like `len` and `iter`.

Normally, a class's implicit interface is simply all there is. When you pass one of its instances to a function, that function uses some subset of that interface, and that is the only checking that is ever performed on it. This style of programming is called "duck typing": if an object quacks like a duck and walks like a duck, it might as well be a duck.  Similarly, if your function needs an object that has `quack` and `walk` methods, you don't care what object gets passed to you; you just call `quack` and `walk` and you're done.

However, as the `NotImplementedError` in `peek` example from the previous section shows, sometimes we would like to know whether an object implements all of the methods we expect _before_ we actually run the code which calls those methods. In other words, we would like to check up-front whether the class's implicit interface conforms to some interface that we define elsewhere.

We can use Python's `abc` package to define such interfaces, and to check whether classes implement them completely. We'll also "mix-in" `abc` classes into an implementation class to implement its behavior.

How do these work? We'll start with a short example:

In [7]:
class Answer:
    def __len__(self): 
        return 42
from collections import abc
isinstance(Answer(), abc.Sized), issubclass(Answer, abc.Sized)

(True, True)

Notice that `Answer` doesn't inherit from anything explicitly. But how, then is it possible, for example, that

```python
issubclass(Answer, abc.Sized)
```

is true, when `Answer` is clearly not a concrete subclass of anything?

From [the `abc` package source](https://hg.python.org/cpython/file/3.5/Lib/_collections_abc.py#l300) we can see that `abc.Sized` has the following definition:

```python
class Sized(metaclass=ABCMeta):

    ...  # some irrelevant code omitted

    @classmethod
    def __subclasshook__(cls, C):
        if cls is Sized:
            if any("__len__" in B.__dict__ for B in C.__mro__):
                return True
        return NotImplemented
```

Without going into too much detail about what Python's doing behind the scenes (yet), this definition tells Python the following:

* "When you're evaluating `issubclass(C, Sized)`, return true if the class `C` defines or inherits an implementation of the `__len__` method."
* "When you're evaluating `isinstance(x, Sized)`, return true if the class of `x` defines or inherits an implementation of the `__len__` method."

We'll take up ABC's in more detail next time (how they work), but before you rush out to use them, notice they involve `isinstance` checks. As Fluent Python puts it:

>However, even with ABCs, you should beware that excessive use of isinstance checks may be a code smell—a symptom of bad OO design. It’s usually not OK to have a chain of if/elif/elif with insinstance checks performing different actions depending on the type of an object: you should be using polymorphism for that—i.e., designing your classes so that the interpreter dispatches calls to the proper methods

>On the other hand, it’s usually OK to perform an insinstance check against an ABC if you must enforce an API contract: “Dude, you have to implement this if you want to call me,” as technical reviewer Lennart Regebro put it. That’s particularly useful in systems that have a plug-in architecture. Outside of frameworks, duck typing is often sim‐ pler and more flexible than type checks.

>ABCs are meant to encapsulate very general concepts, abstractions, introduced by a framework—things like “a sequence” and “an exact number.” [Readers] most likely don’t need to write any new ABCs, just use existing ones correctly, to get 99.9% of the benefits without serious risk of misdesign.

### The `SimpleSet` ADT

Here we are going to define a very simple ADT...a set with some simple operations. Thus we will use the ABC mechanism as a way of **documenting** our specification and getting verification for free.

In [8]:
import abc
class SimpleSetInterface(abc.ABC):
    
    @abc.abstractmethod
    def __len__(self):
        "A SimpleSet has a length"
        
    @abc.abstractmethod
    def __iter__(self):
        "iteration. order is not guaranteed"
    
    @abc.abstractmethod
    def __contains__(self, item)->bool:
        "A test for whether item is in set"
        
    @abc.abstractmethod
    def add(self, item)->None:
        "add item to set"
        
    @abc.abstractmethod
    def rem(self, item)->None:
        "delete item from set"
        
    @abc.abstractmethod
    def union(self, other):
        "union with another set"
        
    @abc.abstractmethod
    def intersection(self, other):
        "intersection with another set"

Notice that we cannot create a SimpleSetInterface explicitly.

In [6]:
a = SimpleSetInterface()

TypeError: Can't instantiate abstract class SimpleSetInterface with abstract methods __contains__, __iter__, __len__, add, intersection, rem, union

### Implementation

The implementation of an ADT is provided by a class for us. To do this we need to

- first choose a representation, the `rep`
- implement the procedure abstractions of the abstract class in terms of this `rep`

The representation ought to have the most frequently used procedures fast. But we mightnot know this at the onset. Then the abstraction allows us to change representations later.
Implementation of an ADT

#### Set representation with list

Lets first implement a set simply as a list. We do so below, requiring some gymnastics to make sure that there are no duplicates when we use the "implemented abstract operations"

In [352]:
import reprlib
class SimpleSet1:
    """
    >>> A=SimpleSet1([1,2,3,1])
    >>> B=SimpleSet1([2,3,4,4,5])
    >>> sorted(list(A))
    [1, 2, 3]
    >>> sorted(list(A.union(B)))
    [1, 2, 3, 4, 5]
    >>> sorted(list(A.intersection(B)))
    [2, 3]
    >>> A.rem(1)
    >>> sorted(list(A))
    [2, 3]
    """
    def __init__(self, container=[]):
        if container:
            self._storage = list(container)
        else:
            self._storage = []
        
    def __contains__(self, item):
        if item in self._storage:
            return True
        else:
            return False
        
    def __len__(self):
        counter = 0
        slist=[]
        for ele in self._storage:
            if ele not in slist:
                slist.append(ele)
                counter += 1
        return counter
    
    def __iter__(self):
        slist=[]
        for ele in self._storage:
            if ele not in slist:
                slist.append(ele)
                yield ele
                
    def add(self, item):
        self._storage.append(item)
        
    def rem(self, item): #this is wrong
        index = self._storage.index(item)
        del self._storage[index]
        
    def union(self, other): #bust the representation here.
        return SimpleSet1(self._storage + other._storage)
    
    def intersection(self, other): #here too. ok but document
        intlist = filter(lambda x : x in other._storage, self._storage)
        return SimpleSet1(intlist)
    
    def __repr__(self):
        slist=[]
        for ele in self._storage:
            if ele not in slist:
                slist.append(ele)
        return reprlib.repr(slist).replace('[','{').replace(']','}')
    

In [371]:
SimpleSetInterface.register(SimpleSet1)

__main__.SimpleSet1

In [372]:
C=SimpleSet1([1,2,3,1])

In [374]:
isinstance(C, SimpleSetInterface), issubclass(SimpleSet1, SimpleSetInterface)

(True, True)

In [354]:
C #this is NOT part of the set interface

{1, 2, 3}

In [355]:
from doctest import run_docstring_examples as dtest
dtest(SimpleSet1, globals(), verbose=True)

Finding tests in NoName
Trying:
    A=SimpleSet1([1,2,3,1])
Expecting nothing
ok
Trying:
    B=SimpleSet1([2,3,4,4,5])
Expecting nothing
ok
Trying:
    sorted(list(A))
Expecting:
    [1, 2, 3]
ok
Trying:
    sorted(list(A.union(B)))
Expecting:
    [1, 2, 3, 4, 5]
ok
Trying:
    sorted(list(A.intersection(B)))
Expecting:
    [2, 3]
ok
Trying:
    A.rem(1)
Expecting nothing
ok
Trying:
    sorted(list(A))
Expecting:
    [2, 3]
**********************************************************************
File "__main__", line ?, in NoName
Failed example:
    sorted(list(A))
Expected:
    [2, 3]
Got:
    [1, 2, 3]


The tests tell us there is something wrong in our implementation. Sure enough, in a list when we do the delete in python, it removes only the first match. Lets fix it.

In [368]:
class SimpleSet1:
    """
    A simple set implementation that has some basic functionality.
    Implements SimpleSetInterface.
    
    AbsFun: the list [a,b,...,z] represents the
    smallest set containing all the elements a,b,...,z.
    The list may contain duplicates.
    [] represents the empty set.
    
    >>> A=SimpleSet1([1,2,3,1])
    >>> B=SimpleSet1([2,3,4,4,5])
    >>> sorted(list(A))
    [1, 2, 3]
    >>> sorted(list(A.union(B)))
    [1, 2, 3, 4, 5]
    >>> sorted(list(A.intersection(B)))
    [2, 3]
    >>> A.rem(1)
    >>> sorted(list(A))
    [2, 3]
    """
    def __init__(self, container=[]):
        if container:
            self._storage = list(container)
        else:
            self._storage = []
        
    def __contains__(self, item):
        if item in self._storage:
            return True
        else:
            return False
        
    def __len__(self):
        counter = 0
        slist=[]
        for ele in self._storage:
            if ele not in slist:
                slist.append(ele)
                counter += 1
        return counter
    
    def __iter__(self):
        slist=[]
        for ele in self._storage:
            if ele not in slist:
                slist.append(ele)
                yield ele
                
    def add(self, item):
        self._storage.append(item)
        
    def rem(self, item):
        indices_to_delete=[]
        for i, v in enumerate(self._storage):
            if v==item:
                indices_to_delete.append(i)
        for i in sorted(indices_to_delete, reverse=True):
            del self._storage[i]
        
    def union(self, other): #bust the representation here.
        return SimpleSet1(self._storage + other._storage)
    
    def intersection(self, other): #here too. ok but document
        intlist = filter(lambda x : x in other._storage, self._storage)
        return SimpleSet1(intlist)
    
    def __repr__(self):
        slist=[]
        for ele in self._storage:
            if ele not in slist:
                slist.append(ele)
        return reprlib.repr(slist).replace('[','{').replace(']','}')
    

Ok!, Now we test again...

In [369]:
from doctest import run_docstring_examples as dtest
dtest(SimpleSet1, globals(), verbose=True)

Finding tests in NoName
Trying:
    A=SimpleSet1([1,2,3,1])
Expecting nothing
ok
Trying:
    B=SimpleSet1([2,3,4,4,5])
Expecting nothing
ok
Trying:
    sorted(list(A))
Expecting:
    [1, 2, 3]
ok
Trying:
    sorted(list(A.union(B)))
Expecting:
    [1, 2, 3, 4, 5]
ok
Trying:
    sorted(list(A.intersection(B)))
Expecting:
    [2, 3]
ok
Trying:
    A.rem(1)
Expecting nothing
ok
Trying:
    sorted(list(A))
Expecting:
    [2, 3]
ok


We passed the tests. Yay!

Notice that we added something strange to the documentation. It reads like this:

```
AbsFun: the list [a,b,...,z] represents the
    smallest set containing all the elements a,b,...,z.
    The list may contain duplicates.
    [] represents the empty set.
```

XXX-LEFT OFF HERE

### The Abstract Function

The **Abstract Function** helps in telling us the meaning of our representation. It maps the concrete representation (here a list) to the abstract value (a set). It helps us, the implementors, reason from the client perspective.

What is the client perspective? The client should NOT be able to distinguish  implementations based on their functional behavior. Here we have a list with repeated values giving us a set with unique ones.  The client should not know this. But the implementer here knows that there is a loss of information in going from the list to the set...this loss of information is described by the **Abstract Function**.

![](http://www.cs.cornell.edu/courses/cs3110/2011sp/lectures/lec08-absfun-repinv/images/abst-fcn2.gif)

(diagram from cornell cs 3110)

Note that several lists may map to the same set, ie this function is many-one. Additionally some values in the domain may not map to any in the range (not true here, we'll see an example soon).

### Refactoring our Implementation

Something about our implementation does not sit well. It seems un-necessarily loosey-goosey, and brittle...witness the mistake we met. There does not seem to be any way except for the *Abstraction Function* to formally reason about what the lists have. Indeed, perhaps the only way we might have been able to catch the deletion formally would have been to impose a post-condition on the deletion that ALL values corresponding to the asked-for deletion in the list implementation were removed.

Now that we have our tests we can confidently refactor our implementation to one in which we have no duplicates in the list. Notice our Abstract function has changed somewhat, as it does not need to go through the contortions to represent the fact that we might have duplicates.

In [407]:
class SimpleSet2:
    """
    AbsFun: the list [a,b,...,z] represents the
    set  a,b,...,z.
    [] represents the empty set.
    
    Examples:
    
    >>> A=SimpleSet2([1,2,3,1])
    >>> B=SimpleSet2([2,3,4,4,5])
    >>> sorted(list(A))
    [1, 2, 3]
    >>> sorted(list(A.union(B)))
    [1, 2, 3, 4, 5]
    >>> sorted(list(A.intersection(B)))
    [2, 3]
    >>> A.rem(1)
    >>> sorted(list(A))
    [2, 3]
    >>> C=SimpleSet2()
    >>> C
    {}
    >>> sorted(list(C.union(A)))
    [2, 3]
    >>> sorted(list(C.intersection(A)))
    []
    """
    def __init__(self, container=[]):
        if container:
            self._storage=[]
            for ele in container:
                self.add(ele)
        else:
            self._storage = []
        
    def __contains__(self, item):
        if item in self._storage:
            return True
        else:
            return False
        
    def __len__(self):
        return len(self._storage)
    
    def __iter__(self):
        for ele in self._storage:
            yield ele
            
    def add(self, item):#this one is wrong
        self._storage.append(item)
        
    def rem(self, item): #this is now right
        index = self._storage.index(item)
        del self._storage[index]
        
    def union(self, other):
        return SimpleSet2(self._storage + other._storage)
    
    def intersection(self, other):
        intlist = list(filter(lambda x : x in other._storage, self._storage))
        return SimpleSet2(intlist)
    
    def __repr__(self):
        return reprlib.repr(self._storage).replace('[','{').replace(']','}')
    

In [408]:
dtest(SimpleSet2, globals(), verbose=True)

Finding tests in NoName
Trying:
    A=SimpleSet2([1,2,3,1])
Expecting nothing
ok
Trying:
    B=SimpleSet2([2,3,4,4,5])
Expecting nothing
ok
Trying:
    sorted(list(A))
Expecting:
    [1, 2, 3]
**********************************************************************
File "__main__", line ?, in NoName
Failed example:
    sorted(list(A))
Expected:
    [1, 2, 3]
Got:
    [1, 1, 2, 3]
Trying:
    sorted(list(A.union(B)))
Expecting:
    [1, 2, 3, 4, 5]
**********************************************************************
File "__main__", line ?, in NoName
Failed example:
    sorted(list(A.union(B)))
Expected:
    [1, 2, 3, 4, 5]
Got:
    [1, 1, 2, 2, 3, 3, 4, 4, 5]
Trying:
    sorted(list(A.intersection(B)))
Expecting:
    [2, 3]
ok
Trying:
    A.rem(1)
Expecting nothing
ok
Trying:
    sorted(list(A))
Expecting:
    [2, 3]
**********************************************************************
File "__main__", line ?, in NoName
Failed example:
    sorted(list(A))
Expected:
    [2, 3]
Got:
    

Our union failed and out=r intersection failed. The problem is clearly in add: add violates our idea of the representation that the list must have unique values.

Indeed notice even out implementation of `__len__`: there is no uniqueness chean any more. How do we know that we dont do this? Since code does not say no duplicates, an implemmenter needs to go digging to figure this out, and wont be anle to reason locally whether `__len__` is implemented correctly or not.
 
Thus our constraint on our representation (implementation) needs to be clearly communicated, and further used for testing! Such a constraint is called a representation Invariant.

### Representation Invariant (RI)

The representation Invariant tells us what MUST NOT CHANGE across multiple methods in the concrete implementations. The fact that the list has no duplicates must be respected by all concrete operations. In other words it captures whatever we must do and maintain on the underlying data structure to keep our external interface correct.

The abstraction function tells us the loss of information we pass on to our users. There are domain consisted of all possible lists. Remember we said that some lists might not map using the AbsFun to interface values? Which ones wont? The representation invariant tells us. In other words, the RI tells us which concrete data is valid given the abstract data.

The nature of the RI can be captured now in this diagram:

![](http://www.cs.cornell.edu/courses/cs3110/2011sp/lectures/lec08-absfun-repinv/images/ri-af.png)

(diagram from cornell cs 3110)

Ok so lets add that in to our documentation. And what we will do is to define a function `repOK` whose job it is to make sure all our operations obey this representation invariant

In [409]:
def repOK(inlist):
    testlist=[]
    for item in inlist:
        if item not in testlist:
            testlist.append(item)
    assert len(testlist)==len(inlist), "there are duplicates {} {}".format(len(testlist), len(inlist))
    return inlist

In [420]:
class SimpleSet2:
    """
    AbsFun: the list [a,b,...,z] represents the
    set  a,b,...,z.
    [] represents the empty set.
    
    RepInv: the list contains no duplicates.
    
    Examples:
    
    >>> A=SimpleSet2([1,2,3,1])
    >>> B=SimpleSet2([2,3,4,4,5])
    >>> sorted(list(A))
    [1, 2, 3]
    >>> sorted(list(A.union(B)))
    [1, 2, 3, 4, 5]
    >>> sorted(list(A.intersection(B)))
    [2, 3]
    >>> A.rem(1)
    >>> sorted(list(A))
    [2, 3]
    >>> C=SimpleSet2()
    >>> C
    {}
    >>> sorted(list(C.union(A)))
    [2, 3]
    >>> sorted(list(C.intersection(A)))
    []
    """
    def __init__(self, container=[]):
        if container:
            self._storage=[]
            for ele in container:
                self.add(ele)
        else:
            self._storage = []
        
    def __contains__(self, item):
        if item in self._storage:
            return True
        else:
            return False
        
    def __len__(self):
        return len(self._storage)
    
    def __iter__(self):
        for ele in self._storage:
            yield ele
            
    def add(self, item):#this one is wrong
        self._storage.append(item)
        repOK(self._storage)
        
    def rem(self, item): #this is now right
        index = self._storage.index(item)
        del repOK(self._storage)[index]
        repOK(self._storage)
        
    def union(self, other):
        s = SimpleSet2(repOK(self._storage) + repOK(other._storage))
        repOK(s._storage)
        return s
    
    def intersection(self, other):
        intlist = list(filter(lambda x : x in other._storage, repOK(self._storage)))
        s = SimpleSet2(intlist)
        repok(s._storage)
        return s
    
    def __repr__(self):
        return reprlib.repr(self._storage).replace('[','{').replace(']','}')
    
    


In [421]:
dtest(SimpleSet2, globals(), verbose=True)

Finding tests in NoName
Trying:
    A=SimpleSet2([1,2,3,1])
Expecting nothing
**********************************************************************
File "__main__", line ?, in NoName
Failed example:
    A=SimpleSet2([1,2,3,1])
Exception raised:
    Traceback (most recent call last):
      File "//anaconda/envs/py35/lib/python3.5/doctest.py", line 1320, in __run
        compileflags, 1), test.globs)
      File "<doctest NoName[0]>", line 1, in <module>
        A=SimpleSet2([1,2,3,1])
      File "<ipython-input-420-431e5b225ea0>", line 34, in __init__
        self.add(ele)
      File "<ipython-input-420-431e5b225ea0>", line 53, in add
        repOK(self._storage)
      File "<ipython-input-409-18c2b03fb366>", line 6, in repOK
        assert len(testlist)==len(inlist), "there are duplicates {} {}".format(len(testlist), len(inlist))
    AssertionError: there are duplicates 3 4
Trying:
    B=SimpleSet2([2,3,4,4,5])
Expecting nothing
*********************************************************

Aha, by testing the repinv we fail immediately. Lets fix this:

In [424]:
class SimpleSet2:
    """
    AbsFun: the list [a,b,...,z] represents the
    set  a,b,...,z.
    [] represents the empty set.
    
    RepInv: the list contains no duplicates.
    
    Examples:
    
    >>> A=SimpleSet2([1,2,3,1])
    >>> B=SimpleSet2([2,3,4,4,5])
    >>> sorted(list(A))
    [1, 2, 3]
    >>> sorted(list(A.union(B)))
    [1, 2, 3, 4, 5]
    >>> sorted(list(A.intersection(B)))
    [2, 3]
    >>> A.rem(1)
    >>> sorted(list(A))
    [2, 3]
    >>> C=SimpleSet2()
    >>> C
    {}
    >>> sorted(list(C.union(A)))
    [2, 3]
    >>> sorted(list(C.intersection(A)))
    []
    """
    def __init__(self, container=[]):
        if container:
            self._storage=[]
            for ele in container:
                self.add(ele)
        else:
            self._storage = []
        
    def __contains__(self, item):
        if item in self._storage:
            return True
        else:
            return False
        
    def __len__(self):
        return len(self._storage)
    
    def __iter__(self):
        for ele in self._storage:
            yield ele
            
    def add(self, item):
        if item not in repOK(self._storage):
            self._storage.append(item)
        repOK(self._storage)
        
    def rem(self, item): #this is now right
        index = self._storage.index(item)
        del repOK(self._storage)[index]
        repOK(self._storage)
        
    def union(self, other):
        s = SimpleSet2(repOK(self._storage) + repOK(other._storage))
        repOK(s._storage)
        return s
    
    def intersection(self, other):
        intlist = list(filter(lambda x : x in other._storage, repOK(self._storage)))
        s = SimpleSet2(intlist)
        repOK(s._storage)
        return s
    
    def __repr__(self):
        return reprlib.repr(self._storage).replace('[','{').replace(']','}')
    
    


In [425]:
dtest(SimpleSet2, globals(), verbose=True)

Finding tests in NoName
Trying:
    A=SimpleSet2([1,2,3,1])
Expecting nothing
ok
Trying:
    B=SimpleSet2([2,3,4,4,5])
Expecting nothing
ok
Trying:
    sorted(list(A))
Expecting:
    [1, 2, 3]
ok
Trying:
    sorted(list(A.union(B)))
Expecting:
    [1, 2, 3, 4, 5]
ok
Trying:
    sorted(list(A.intersection(B)))
Expecting:
    [2, 3]
ok
Trying:
    A.rem(1)
Expecting nothing
ok
Trying:
    sorted(list(A))
Expecting:
    [2, 3]
ok
Trying:
    C=SimpleSet2()
Expecting nothing
ok
Trying:
    C
Expecting:
    {}
ok
Trying:
    sorted(list(C.union(A)))
Expecting:
    [2, 3]
ok
Trying:
    sorted(list(C.intersection(A)))
Expecting:
    []
ok


Notice that having a pss through `repOK` in conjunction with testing saves the day. Clearly all these repOK's in production code may slow things down too much. Python usually rund in debug mode, but turning on optimization (-O) will make the assert into a no-op. There is still the computation of the uniqueness though that will cost.

Thus whether to keep repinv's in or not is a decision you must make. It might be worth atleast keeping them in comments. (Notice also i did not test non-destructive methods...to be complete you might want to repOK them as well.

Notice also that ita hard to set any representation invariant on our initial write of the code, and that a refactoring with a clear representation invariant gets us quite far. The general process of refactoring involves making yourself DRY, and more generally refactoring larger functions into smaller, testable ones.

In [426]:
class SimpleSet2:
    """
    AbsFun: the list [a,b,...,z] represents the
    set  a,b,...,z.
    [] represents the empty set.
    
    RepInv: the list contains no duplicates.
    
    Examples:
    
    >>> A=SimpleSet2([1,2,3,1])
    >>> B=SimpleSet2([2,3,4,4,5])
    >>> sorted(list(A))
    [1, 2, 3]
    >>> sorted(list(A.union(B)))
    [1, 2, 3, 4, 5]
    >>> sorted(list(A.intersection(B)))
    [2, 3]
    >>> A.rem(1)
    >>> sorted(list(A))
    [2, 3]
    >>> C=SimpleSet2()
    >>> C
    {}
    >>> sorted(list(C.union(A)))
    [2, 3]
    >>> sorted(list(C.intersection(A)))
    []
    """
    def __init__(self, container=[]):
        if container:
            self._storage=[]
            for ele in container:
                self.add(ele)#makes sure repinv is respected
        else:
            self._storage = []
        
    def __contains__(self, item):
        if item in self._storage:
            return True
        else:
            return False
        
    def __len__(self):
        return len(self._storage)
    
    def __iter__(self):
        for ele in self._storage:
            yield ele
            
    def add(self, item):
        #repOK(self._storage)
        if item not in self._storage:
            self._storage.append(item)
        #repOK(self._storage)
        
    def rem(self, item): #this is now right
        #repOK(self._storage)
        index = self._storage.remove(item)
        #repOK(self._storage)
        
    def union(self, other):
        #repOK(self._storage)
        #repOK(other._storage)
        s = SimpleSet2(self._storage + other._storage)
        #repOK(s._storage)
        return s
    
    def intersection(self, other):
        #repOK(self._storage)
        intlist = list(filter(lambda x : x in other._storage, self._storage))
        s = SimpleSet2(intlist)
        #repok(s._storage)
        return s
    
    def __repr__(self):
        return reprlib.repr(self._storage).replace('[','{').replace(']','}')
    
    


In [427]:
dtest(SimpleSet2, globals(), verbose=True)

Finding tests in NoName
Trying:
    A=SimpleSet2([1,2,3,1])
Expecting nothing
ok
Trying:
    B=SimpleSet2([2,3,4,4,5])
Expecting nothing
ok
Trying:
    sorted(list(A))
Expecting:
    [1, 2, 3]
ok
Trying:
    sorted(list(A.union(B)))
Expecting:
    [1, 2, 3, 4, 5]
ok
Trying:
    sorted(list(A.intersection(B)))
Expecting:
    [2, 3]
ok
Trying:
    A.rem(1)
Expecting nothing
ok
Trying:
    sorted(list(A))
Expecting:
    [2, 3]
ok
Trying:
    C=SimpleSet2()
Expecting nothing
ok
Trying:
    C
Expecting:
    {}
ok
Trying:
    sorted(list(C.union(A)))
Expecting:
    [2, 3]
ok
Trying:
    sorted(list(C.intersection(A)))
Expecting:
    []
ok


## Modularity

The idea behind refactoring is to make sure we have no repeated code, everything is readable and testable, and most importantly, you have made it easier for future you. The usual direction this modularity goes in is to have many small classes and functions with loose coupling between them.

Here are the pros and cons to this:

- small classes/modules mean interfaces with only few abstract procedures.
- thid means simple specs for interfaces
- this also means invariants are local. What is this? For the many functions part of modularity we can judge and test what a function does independent of the other functions. Now, with AbsFun and RepInv, we can do the same for ADT.
- Notice that this makes writing pre-and post-conditions harder. If you remember out binary search implementation we could have modularized it further, but communicating the pre-and-post conditions would have got harder with greater modularity. Still, remember how complex our binary-search spec was?.
- the correctness is now easier to reason about and test on a per function and per-method basis
- but we are less performant because we have many additional function calls. Also since everything is not at one place, its harder to play optimization tricks


**You must exercise your own judgement** as to where you want to make this tradeoff between loose coupling/modularity and tight coupling/performance. As scientists, we are often exposed to the latter: a performant array for example precludes all sorts of nice streaming algorithms, duplicates memory, etc and makes things more monolithic. But where it is an advantage to ease of programming, it behooves us to choose narrow and nice interfaces. We'll see an example of this next time.


## Back to ABCs

Above we "registered" SimpleSet1 with the SimpleSetInterface ADT, and found that even without ANY explicit inheritance, SimpleSet1 was found to be a subclass of SimpleSet. This is very useful in python since Python supports multiple-inheritance, and we can thus "mixin" different protocols into what we do.

On such registration, Python will just believe us without checking that we implement all the abstract methods. If we dont, we will gwt runtime exceptions.

Do you remember the `Answer` class from abovr which was a subclass of Sized? That wasnt even registered!

This is because `abc.Sized` implements a class method  `__subclasshook__`., which had this implementation...

```python
    @classmethod
    def __subclasshook__(cls, C):
        if cls is Sized:
            if any("__len__" in B.__dict__ for B in C.__mro__):
                return True
        return NotImplemented
```

This says, when you call `issubclass`, the class in question, C, check if any of its parents has a `__len__` in them, and if they do, C is a subclass. This is precisely what happens here. 

Thus, in-fact, the ad-hoc protocols we have been talking about in python actually do have a formal counterpart, even tho no inheritance is actually going on. What is the use of this?
        
Unless you are a framework creator, dont. Indeed even SimpleSetInterface did not have to be an ABC, it could have been simply documentation. But the ideas of interface and implementation separation are super important, so whatever method you want to use to document you interface is better than none, even if it is declaring an ABC.

## Answers to Exercises

* How would you implement the three stack ADT operations using a singly linked list?

> Use the front of the linked list as the top of the stack; then:
>
> * `push`: allocate a new node and insert it at the front; O(1)
> * `pop`: remove the first node and return its value; O(1)
> * `peek`: return the first node's value.

* How would you implement a stack using a dict and a counter variable?

> Use the dict as a mapping from counts to values. Initialize the counter to 0. Then:
>
> * `push`: store the value in the dict using the current counter as the key, then increment the counter; O(1)
> * `pop`: decrement the counter, look up value for the current counter, then remove that value from the dict and return it; O(1)
> * `peek`: look up the value for counter - 1 and return it.