Programming is hard.  The key to taming its challenges is to divide a complex program into smaller pieces, where each piece is easier to understand.  Different pieces might be divvied up among the different coders on a team, but careful division of a large code base even makes sense for a lone programmer.  If you don't believe us, try looking back at your code from one of the early labs and see how well you remember it!  Careful program design makes it easy for an informed coder to understand and modify a program.  When the program has well-defined top-level structure, often we can tell that we only need to make changes in one piece.  Then we only need to understand that piece.

We've already seen decomposing a program into functions.  Another essential tool of decomposition is *encapsulation*, associated with the idea of *abstract data types (ADTs)*.  An ADT implementation connects a data structure with a set of useful operations on that structure.  The idea is that the details of the data structure are *hidden* from code using the ADT.  That way, the ADT implementation can be modified *independently* from all the other bits of code that depend on it.  You can imagine that one programmer is responsible for the ADT, and another programmer is responsible for coding some algorithm that uses the ADT.  These two programmers can now go about their business simultaneously, potentially finishing the project twice as fast as if they hadn't chosen this decomposition!  The story is even better when the ADT implementation and algorithm are both evolving over time, and each can be improved without affecting the other.  Again, this advantage is significant even with a lone programmer.

OK, enough generalities.  Let's look at some examples.

# Graph Search, Parameterized on Path Data Structures

You've practiced graph pathfinding in Lab 2 and backtracking search in Lab 4.  As a running example throughout this lecture, we'll use a few naive recursive-backtracking approaches to pathfinding.  We'll work with graphs like this one, represented simply as a dictionary mapping nodes to lists of nodes reachable in one step.

In [1]:
graph = {'A': ['B', 'C'],
         'B': ['C', 'D'],
         'C': ['D'],
         'D': ['C'],
         'E': ['F'],
         'F': ['C']}

Our recursive algorithm will maintain a partial path as it goes, and we can use this class to encapsulate the operations we need on paths.

In [2]:
class Path:
    def __init__(self, path=[]):
        self.path = path

    def add(self, node):
        return Path(self.path + [node])

    def visited(self, node):
        return node in self.path

    def len(self):
        return len(self.path)

    def as_list(self):
        return self.path

Now here's the main algorithm.  Notice that this version is not guaranteed to find the *shortest* path!

In [3]:
#DFS traversal
def find_path(graph, start, end, path):
    path = path.add(start)
    if start == end:
        return path
    if start not in graph:
        return None
    for node in graph[start]:
        if not path.visited(node):
            newpath = find_path(graph, node, end, path)
            if newpath: return newpath
    return None

def find_path_main(graph, start, end):
    p = find_path(graph, start, end, Path())
    if p:
        return p.as_list()
    else:
        return p

In [4]:
find_path_main(graph, 'A', 'D')

['A', 'B', 'C', 'D']

Here are two other algorithms, to find *all* paths connecting two nodes and to find the *shortest* path connecting two nodes.  Especially the latter is significantly less efficient than what you probably had to come up with in Lab 2, but it still works well as an example here.

In [5]:
def find_all_paths(graph, start, end, path):
    path = path.add(start)
    if start == end:
        return [path]
    if start not in graph:
        return []
    paths = []
    for node in graph[start]:
        if not path.visited(node):
            newpaths = find_all_paths(graph, node, end, path)
            for newpath in newpaths:
                paths.append(newpath)
    return paths

def find_all_paths_main(graph, start, end):
    return [p.as_list()
            for p in find_all_paths(graph, start, end, Path())]

#DFS, but keep updating the shortest path
def find_shortest_path(graph, start, end, path):
    path = path.add(start)
    if start == end:
        return path
    if start not in graph:
        return None
    shortest = None
    for node in graph[start]:
        if not path.visited(node):
            newpath = find_shortest_path(graph, node, end, path)
            if newpath:
                if not shortest or newpath.len() < shortest.len():
                    shortest = newpath
    return shortest

def find_shortest_path_main(graph, start, end):
    p = find_shortest_path(graph, start, end, Path())
    if p:
        return p.as_list()
    else:
        return p

In [6]:
find_all_paths_main(graph, 'A', 'D')

[['A', 'B', 'C', 'D'], ['A', 'B', 'D'], ['A', 'C', 'D']]

In [7]:
find_shortest_path_main(graph, 'A', 'D')

['A', 'B', 'D']

We can engineer instances of pathfinding problems where our current implementation runs much more slowly than it needs to.  For instance, consider a graph like this one.

![a degenerate linear graph](linear.png)

The nodes are consecutive integers, each connected to the number one higher than itself.  Our naive path data structure will run each `visited` check by a complete traversal of the path so far, which takes time proportional to the path length.  With many such checks per search, the time adds up.  Here's some code to demonstrate.

In [8]:
# To see how inefficient this pathfinding is, let's use this trivial straight-line graph.
def straight_line(n):
    return {i: [i+1] for i in range(n)}

# Python's limit on recursion means we'll need to run several searches to get the times high enough to compare.
def perf_test(n, path):
    x = [find_path(straight_line(n), 0, n, path)
         for i in range(1000)]

In [9]:
perf_test(1000, Path())

Note how the performance tester is parameterized on an object that provides path operations.  Why would we bother to do that?  Because next we'll see how easy it is to drop in alternative path implementations!

## Linked-List Paths

Remember that an abstract data type, or ADT, provides a set of *methods* that can be called to interact with *private* data of an object.  Code outside that object isn't supposed to muck with the object's fields directly.

We can specify a path ADT as follows, informally.

__Values:__
- Sequences of nodes

__Operations:__
- Create a new empty path
- Create a new path by adding a node to the end of another one
- Check if a node belongs to a path
- Get the length of a path
- Convert a path to a normal Python list

Note how vague we are about what a "sequence of nodes" really is.  This kind of informal specification is meant to explain to a *client* of a path object how to use that object: what to expect different method calls to return in different situations.  It is important that the client can call the object's methods usefully without knowing what is going on inside!

Let's demonstrate the flexibility of an ADT implementer by creating an alternative implementation based on linked lists, like we saw in Lecture 5.

In [32]:
class EmptyLinkedPath:
    def add(self, node):
        return NonemptyLinkedPath(node, self)

    def visited(self, node):        
        return False

    def len(self):
        return 0

    def as_list(self):
        return []

class NonemptyLinkedPath:
    def __init__(self, head, tail):
        self.head = head
        self.tail = tail
    
    def add(self, node):
        return NonemptyLinkedPath(node, self)
    
    def visited(self, node):
        return node == self.head or self.tail.visited(node)
    
    def len(self):
        return 1 + self.tail.len()

    def as_list(self):
        return self.tail.as_list() + [self.head]

Now, like magic, we can reuse all three graph algorithms from before, but with our alternative path data type and associated code.  Note that, though we start each call off with an empty path, instances of the nonempty-path class will usually be created internally.

In [33]:
find_path(graph, 'A', 'D', EmptyLinkedPath()).as_list()

['A', 'B', 'C', 'D']

In [34]:
[p.as_list() for p in find_all_paths(graph, 'A', 'D', EmptyLinkedPath())]

[['A', 'B', 'C', 'D'], ['A', 'B', 'D'], ['A', 'C', 'D']]

In [35]:
find_shortest_path(graph, 'A', 'D', EmptyLinkedPath()).as_list()

['A', 'B', 'D']

Have we solved our performance problem on the degenerate linear graph?

In [259]:
perf_test(200, EmptyLinkedPath())

Apparently not!  This version is even slower, though, interestingly, it can be much more space-efficient for large graphs, thanks to sharing in linked lists.

It's worth emphasizing the key benefit of ADTs that just made an appearance: **code can be written independently of ADT implementation details, allowing us to drop in different ADT implementations later**.

##  A Customized Path Data Structure

We could come up with some alternative path representations that work well for many different shapes of graphs.  Instead, let's try to do even better by designing a path class *customized* to graphs like our linear examples.

To start with, how about paths that start at 0 and progress upward through consecutive integers?

In [10]:
class LessThanPath:
    def __init__(self, bound=0):
        self.bound = bound
        # ...meaning "all consecutive positive integers less than bound."

    def add(self, node):
        if self.bound == node:
            # Good, the client code followed the pattern we optimize for.
            # The next path member comes right after the last.
            return LessThanPath(node+1)
        else:
            # Doesn't fit the pattern.  Fall back to an unoptimized path.
            p = Path()
            for i in range(0, self.bound):
                p = p.add(i)
            return p.add(node)
            # Notice that here we returned an instance of a different class,
            # which is totally legit!

    def visited(self, node):
        # Nice: this check now runs in time independent of path length!
        return 0 <= node and node < self.bound

    def len(self):
        # Ditto for computing the length.
        return self.bound

    def as_list(self):
        # This method still takes time proportional to the path length.
        # You can't win 'em all!
        return list(range(self.bound))

In [11]:
perf_test(1000, LessThanPath())

Even for the relatively small graph size, we now get noticeably faster execution.

We could be a bit more accommodating, when it comes to the graph shapes that we optimize for.  Why not support any path that is a range of consecutive integers?

In [12]:
class RangePath:
    def __init__(self, lower=0, upper=0):
        self.lower = lower
        self.upper = upper
        # ...meaning "all integers in [lower, upper)."
        # For the default values of 0 and 0, that's an empty interval!

    def add(self, node):
        if self.lower == self.upper:
            # Empty range so far.  Switch it now to a single-element range.
            return RangePath(node, node+1)
        elif node == self.upper:
            # The range is being expanded by one position higher.
            return RangePath(self.lower, node+1)
        else:
            # Doesn't fit the pattern.  Fall back to an unoptimized path.
            p = Path()
            for i in range(self.lower, self.upper):
                p = p.add(i)
            return p.add(node)

    def visited(self, node):
        return self.lower <= node and node < self.upper

    def len(self):
        return self.upper - self.lower

    def as_list(self):
        return list(range(self.lower, self.upper))

In [13]:
perf_test(1000, RangePath())

Let's verify that the new implementation also works with a straight-line graph that doesn't begin at zero.

In [14]:
def straight_line_from(fr, n):
    return {i: [i+1] for i in range(fr, fr+n)}

def perf_test_from(fr, n, path):
    x = [find_path(straight_line_from(fr, n), 0, n, path)
         for i in range(1000)]

In [15]:
perf_test_from(100, 2000, RangePath())

We have seen another key lesson about ADTs: there is not just one canonical, “best” implementation of each ADT.  Instead, we often **build custom ADT implementations, based on the context of a particular problem**, reusing generic algorithms to work “for free” on our new ADT implementations.

## Exhaustive Testing for Correct Encapsulation

We've already been relying implicitly on a concept of correctness for ADTs.  Usually each ADT has a *canonical implementation*, and we say that another implementation is correct if it is *indistinguishable from* the canonical (or reference) implementation.

What does it mean to be indistinguishable?  Roughly, it means that no sequence of method calls can tell the difference.  All corresponding method calls return the same values with both implementations.

One small catch is that method calls returning instances of the two classes we are comparing probably *won't* return equal values.  We *want* them to be different, because we bothered to write two different classes!  As long as some methods return values in base types like integer or Booleans, we can still test result equality on those methods, which is enough to give a useful notion of indistinguishability.

For a fully formal definition of ADT correctness, we'll defer to 6.031 and other follow-on classes.  Here, though, let's use an approximation: for all possible method call sequences within some size bounds, the two classes should return the same answers.

To that end, we'll define a set of classes that stand for path methods.  An object of one of these classes can be asked to call that method on a path object, and we can also ask if this method generates a new object of the same class ("mutates").

In [267]:
class PathLen:
    def call(self, path):
        return path.len()
    def mutates(self):
        return False

class PathAsList:
    def call(self, path):
        return path.as_list()
    def mutates(self):
        return False

class PathAdd:
    def __init__(self, node):
        self.node = node
    
    def call(self, path):
        return path.add(self.node)
    def mutates(self):
        return True

class PathVisited:
    def __init__(self, node):
        self.node = node
    
    def call(self, path):
        return path.visited(self.node)
    def mutates(self):
        return False

Here's an example of "the long way" of checking that two kinds of paths successfully recognize 42 as occupying a one-element path.

In [268]:
p1 = Path()
p2 = RangePath()
m1 = PathAdd(42)
p1 = m1.call(p1)
p2 = m1.call(p2)
m2 = PathVisited(42)
m2.call(p1), m2.call(p2)

(True, True)

The idea is that the two path implementations should agree similarly on any sequence of method calls!  We bothered to define these funny classes standing for methods because now we can generate call sequences as data values.  Infinitely many call sequences are possible, so we'll generate only up to finite bounds.

In [269]:
# Generate all method calls whose arguments are positive integers smaller than the bound.
def all_calls(bound):
    return [PathLen(), PathAsList()] \
        + [PathAdd(i) for i in range(bound)] \
        + [PathVisited(i) for i in range(bound)]

# Generate all bounded-call sequences of a given length.
def all_call_seqs(bound, length):
    if length == 0:
        return [[]]
    else:
        return [[call] + calls
                for call in all_calls(bound)
                for calls in all_call_seqs(bound, length-1)]

# Given two path implementations and a list of tests (each a call sequence),
# verify that the two give all the same answers.
def agree_on(path1, path2, tests):
    for calls in tests:
        p1 = path1
        p2 = path2
        # Note that we make copies here into local variables,
        # which we modify as we loop through the current test.

        for call in calls:
            # Get the return values of the current call on both versions.
            v1 = call.call(p1)
            v2 = call.call(p2)
            
            if not call.mutates():
                # Easy case: we are calling a method that returns a base type.
                # Make sure the answer is the same on both sides.
                assert v1 == v2
            else:
                # This must be a method that returns a new class instance.
                # Replace our path variables accordingly.
                p1 = v1
                p2 = v2

In [270]:
agree_on(Path(), EmptyLinkedPath(), all_call_seqs(3, 6))

In [271]:
agree_on(Path(), LessThanPath(), all_call_seqs(3, 6))

In [272]:
agree_on(Path(), RangePath(), all_call_seqs(3, 6))

Great: our three alternative implementations are indistinguishable from the original, within these bounds on test cases.  The larger the bounds we choose in comparing two implementations, the greater our confidence that the two are truly equivalent.  Generally, testing with finitely many cases isn't enough to establish equivalence for sure, but (1) this kind of exhaustive testing can still be useful to find bugs, and (2) imagining testing with very complete sets of inputs gives a good intuition for what ADT equivalence/correctness really is.

# Mutable Finite Sets

Finite sets are another classic ADT, embodied in Python's sets, though sometimes we have reasons to prefer other implementations.  We can define the finite set ADT like so:

__Values:__
- Finite-size sets of values

__Operations:__
- Create a new empty set
- Add an element to a set
- Check if a value belongs to a set
- Get the size of a set
- Render the set as a sorted list

We'll consider this one as a **mutable** ADT, where methods modify the underlying set, rather than returning new sets.

Here is an example of a simple algorithm parametrized on a finite-set implementation: computing how many distinct elements appear in a list.  It's easy to do by adding all the elements to a finite set and then returning the set's size.

In [136]:
def distinct(ls, set):
    for v in ls:
        set.add(v)
    return set.size()

To give us an implementation to test with, we wrap native Python sets in an appropriate class.

In [186]:
class NativeSet:
    def __init__(self):
        self.set = set()

    def add(self, v):
        self.set.add(v)

    def mem(self, v):
        return v in self.set

    def size(self):
        return len(self.set)
    
    def as_list(self):
        return sorted(list(self.set))

In [187]:
distinct([1, 2, 3, 4, 2, 6, 7, 8, 8, 10], NativeSet())

8

## Unsorted-Linked-List Sets

Let's also implement sets (fairly slowly) with unsorted linked lists, which we represent using a `ListNode` class, rather than the pattern from before where we create separate classes for empty and nonempty lists.

In [273]:
class ListNode:
    def __init__(self, head, tail):
        self.head = head
        self.tail = tail

class ListSet:
    def __init__(self):
        self.list = None
    
    def add(self, v):
        if not self.mem(v):
            self.list = ListNode(v, self.list)
    
    def mem(self, v):
        ls = self.list
        while ls != None:
            if ls.head == v:
                return True
            else:
                ls = ls.tail
        return False

    def size(self):
        ls = self.list
        n = 0
        while ls != None:
            n += 1
            ls = ls.tail
        return n

    def as_list(self):
        ls = self.list
        out = []
        while ls != None:
            out.append(ls.head)
            ls = ls.tail
        return sorted(out)

In [274]:
distinct([1, 2, 3, 4, 2, 6, 7, 8, 8, 10], ListSet())

8

As expected: we get the same answer as with native sets.

## Binary-Search-Tree Sets

We can also use binary search trees to implement finite sets.

In [275]:
class TreeNode:
    def __init__(self, left, value, right):
        self.left = left
        self.value = value
        self.right = right

class TreeSet:
    def __init__(self):
        self.tree = None
    
    def mem(self, v):
        t = self.tree
        while t != None:
            if v == t.value:
                return True
            elif v < t.value:
                t = t.left
            else:
                t = t.right
        return False
    
    def size(self):
        def size_helper(t):
            if t == None:
                return 0
            else:
                return 1 + size_helper(t.left) + size_helper(t.right)
        return size_helper(self.tree)
    
    def add(self, v):
        t = self.tree
        prev = None # What's this variable about?
                    # Watch it get updated below, and then see how it's finally used at the end.
                    # It records where we store a reference to the new node we allocate.
        while t != None:
            if v == t.value:
                return
            elif v < t.value:
                prev = (t, 'left')
                t = t.left
            else:
                prev = (t, 'right')
                t = t.right
        new = TreeNode(None, v, None)
        if prev == None:
            self.tree = new
        elif prev[1] == 'left':
            prev[0].left = new
        else:
            prev[0].right = new
    
    def as_list(self):
        t = self.tree
        out = []
        
        def as_list_helper(t):
            if t != None:
                as_list_helper(t.left)
                out.append(t.value)
                as_list_helper(t.right)
        
        as_list_helper(t)
        return out

In [276]:
distinct([1, 2, 3, 4, 2, 6, 7, 8, 8, 10], TreeSet())

8

## Exhaustive Testing

We can adapt our approach of comparing answers on all call sequences within bounds.

In [203]:
class SetSize:
    def call(self, set):
        return set.size()
    
class SetAsList:
    def call(self, set):
        return set.as_list()

class SetAdd:
    def __init__(self, v):
        self.v = v
    
    def call(self, set):
        return set.add(self.v)

class SetMem:
    def __init__(self, v):
        self.v = v
    
    def call(self, set):
        return set.mem(self.v)

def all_calls(bound):
    return [SetSize(), SetAsList()] \
        + [SetAdd(i) for i in range(bound)] \
        + [SetMem(i) for i in range(bound)]

def all_call_seqs(bound, length):
    if length == 0:
        return [[]]
    else:
        return [[call] + calls
                for call in all_calls(bound)
                for calls in all_call_seqs(bound, length-1)]

def agree_on(set1, set2, tests):
    for calls in tests:
        s1 = set1()
        s2 = set2()
        # Note that here we allocate new sets at the start of a test!
        # Thus, the arguments set1 and set2 are classes rather than objects.

        for call in calls:
            # Testing result equality is simpler, since with this mutable style,
            # results should always be literally equal.
            # (Methods run only for their side effects will return None, and of course
            # None will be equal on both sides.)
            assert call.call(s1) == call.call(s2)

In [193]:
agree_on(NativeSet, ListSet, all_call_seqs(3, 6))

In [194]:
agree_on(NativeSet, TreeSet, all_call_seqs(3, 6))

Note how we pass the `ListSet` and `TreeSet` classes above, rather than objects that belong to them.  It's a neat Python feature that we can pass classes around as first-class data, instantiating them later just by calling them like functions!

# Graphs as an Abstract Data Type

We can parameterize our graph pathfinding algorithms further, so that they are generic not just in path implementations but also in graph implementations.  Graphs are naturally viewed as an ADT like so:

__Values:__
- Sets of nodes and edges

__Operations:__
- Create a new empty graph
- Add a node to a graph
- Add an edge to a graph
- Check if a node belongs to a graph
- Compute the neighbors (targets of outgoing edges) of a node in a graph

We can easily recast our pathfinding code to work over arbitrary graph implementations.

In [279]:
def find_path(graph, start, end, path):
    path = path.add(start)
    if start == end:
        return path
    if not graph.hasNode(start):
        return None
    for node in graph.neighbors(start):
        if not path.visited(node):
            newpath = find_path(graph, node, end, path)
            if newpath: return newpath
    return None

Here is one of the simplest possible implementations of graphs.  Note that it raises `KeyError` exceptions when nonexistent nodes are mentioned.  Spoiler alert: we'll face some new related challenges later in exhaustive testing!

In [280]:
class BasicGraph:
    def __init__(self):
        self.nodes = []
        self.edges = []
    
    def addNode(self, n):
        if n not in self.nodes:
            self.nodes.append(n)
    
    def addEdge(self, n1, n2):
        if n1 not in self.nodes:
            raise KeyError
        if (n1, n2) not in self.edges:
            self.edges.append((n1, n2))
    
    def hasNode(self, n):
        return n in self.nodes
    
    def neighbors(self, n):
        if n not in self.nodes:
            raise KeyError
        return sorted([n2
                       for n1, n2 in self.edges
                       if n1 == n])
    
    def makeEmpty(self):
        return BasicGraph()

For testing, it will be handy to have an operation to convert our original dictionary-based graph format into whatever format a graph class uses.

In [281]:
def create_graph(nodes, graph):
    for node, neighbors in nodes.items():
        graph.addNode(node)
        for neighbor in neighbors:
            graph.addEdge(node, neighbor)
    return graph

In [282]:
find_path(create_graph(graph, BasicGraph()), 'A', 'D', Path()).as_list()

['A', 'B', 'C', 'D']

Note how now, to invoke pathfinding, we choose both a graph implementation and a path implementation.  Any pair of correct implementations should lead to the same answer, though performance may vary drastically based on what we select.

## A Graph Data Type from Any Set Data Type

By now, we've seen several examples of algorithms parameterized on ADT implementations.  Another very effective technique is to **parametrize ADT implementations on other ADT implementations**.  This pattern allows us to decompose a program into a hierarchy of pieces, where each piece is assembled from some more primitive pieces, and then that new piece in turn becomes an ingredient for assembling still higher-level pieces.

A simple example in this context is a generic implementation of graphs parametrized over some implementation of finite sets.  We can use the finite sets to store the neighbor set of each node.  (Fancier parametrizations over other ADTs like dictionaries would be even more effective, and we leave those as an exercise for the reader.)

Some diagrams might help explain the general idea.  A monolithic ADT implementation works like this:

![dataflow through an ADT](adt.png)

In contrast, a parameterized ADT implementation looks like this:

![dataflow through a parameterized ADT](adt_parameterized.png)

Here's a concrete example, a graph class that maintains a dictionary mapping nodes to neighbor sets, using our chosen set class.

In [283]:
class GraphUsingSet:
    def __init__(self, setClass):
        self.setClass = setClass
        self.nodes = {}
    
    def addNode(self, n):
        if n not in self.nodes:
            self.nodes[n] = self.setClass()
    
    def addEdge(self, n1, n2):
        if n1 not in self.nodes:
            raise KeyError
        self.nodes[n1].add(n2)
    
    def hasNode(self, n):
        return n in self.nodes
    
    def neighbors(self, n):
        if n not in self.nodes:
            raise KeyError
        return self.nodes[n].as_list()

    def makeEmpty(self):
        return GraphUsingSet(self.setClass)

In [284]:
find_path(create_graph(graph, GraphUsingSet(NativeSet)), 'A', 'D', Path()).as_list()

['A', 'B', 'C', 'D']

In [285]:
find_path(create_graph(graph, GraphUsingSet(TreeSet)), 'A', 'D', EmptyLinkedPath()).as_list()

['A', 'B', 'C', 'D']

Those are just two examples of the three choices we get to make in invoking the generic algorithm: graph, set, and path.

## A Very Specific Graph Implementation

Actually, remembering our performance test before with the degenerate linear graph, we can take a shortcut to the ultimate optimized implementation!

In [325]:
class LinearGraph:
    def __init__(self, n):
        self.n = n
    
    def addNode(self, n):
        raise ValueError("Hey, I'm trying to cheat here!")
    
    def addEdge(self, n1, n2):
        raise ValueError("Hey, I'm trying to cheat here!")
    
    def hasNode(self, n):
        return 0 <= n and n < self.n
    
    def neighbors(self, n):
        if not self.hasNode(n):
            raise KeyError
        return [n+1]

    def makeEmpty(self):
        return LinearGraph(0)

Here's the best `BasicGraph` was able to offer us before, with our optimized path implementation.  It takes long enough to run that we notice the delay.

In [334]:
find_path(create_graph(straight_line(2000), BasicGraph()), 0, 2000, RangePath())

<__main__.RangePath at 0x7fc472c90828>

Now check out this call on our customized graph.  Execution is instant, because we don't need to spend time initializing graph data structures -- all the structure is encoded in the definition of `LinearGraph`.

In [336]:
find_path(LinearGraph(2000), 0, 2000, RangePath())

<__main__.RangePath at 0x7fc490067908>

## Exhaustive Testing

Here's our last take on exhaustive testing, this time for graphs.

In [339]:
class GraphAddNode:
    def __init__(self, node):
        self.node = node
    
    def call(self, graph):
        return graph.addNode(self.node)

class GraphAddEdge:
    def __init__(self, node1, node2):
        self.node1 = node1
        self.node2 = node2
    
    def call(self, graph):
        return graph.addEdge(self.node1, self.node2)

class GraphHasNode:
    def __init__(self, node):
        self.node = node
    
    def call(self, graph):
        return graph.hasNode(self.node)

class GraphNeighbors:
    def __init__(self, node):
        self.node = node
    
    def call(self, graph):
        return graph.neighbors(self.node)

def all_calls(bound):
    return [GraphAddNode(i) for i in range(bound)] \
        + [GraphAddEdge(i, j) for i in range(bound) for j in range(bound)] \
        + [GraphHasNode(i) for i in range(bound)] \
        + [GraphNeighbors(i) for i in range(bound)]

def all_call_seqs(bound, length):
    if length == 0:
        return [[]]
    else:
        return [[call] + calls
                for call in all_calls(bound)
                for calls in all_call_seqs(bound, length-1)]

def agree_on(graph1, graph2, tests):
    for calls in tests:
        # We forced graphs to support methods to create empty instances of the same class.
        # Why not pass in a class name like before?  It would get messy because we need to pass
        # set implementations to some of our graph classes.
        g1 = graph1.makeEmpty()
        g2 = graph2.makeEmpty()

        for call in calls:
            # The new wrinkle this time: some method calls are meant to raise exceptions,
            # and we need to check that the same exceptions are raised on both sides.
            try:
                v1 = call.call(g1)
            except Exception as e:
                v1 = e
            
            try:
                v2 = call.call(g2)
            except Exception as e:
                v2 = e

            assert v1 == v2 or (isinstance(v1, Exception) and type(v1) == type(v2))
            # Why compare the exceptions' types instead of their values?
            # We actually get different KeyError objects created on different invocations.

In [340]:
agree_on(BasicGraph(), GraphUsingSet(ListSet), all_call_seqs(3, 4))

In [341]:
agree_on(BasicGraph(), GraphUsingSet(TreeSet), all_call_seqs(3, 4))

# Summary

*Encapsulation*, or information hiding, is one of the big ideas in effective program design.  One common embodiment of the idea is *abstract data types*, which package together operations on some hidden representation type.  We've highlighted a few key points about the versatility of abstract data types:

- **Code can be written generically in some ADT, so that the code works when any implementation of the ADT is plugged in.**  This way, ADT implementations and their client programs can be coded, understood, and maintained separately.
- **A correct implementation of an ADT appears to behave identically to the simple reference implementation on any sequence of method calls.**  That way, the former really is a drop-in replacement for the latter, and programmers may pretend that they are using the simpler reference implementation when they are actually using a more clever implementation.
- There is not just one “best” implementation of any ADT: **we often want to implement one ADT differently for different problem contexts, appealing to the same generic algorithms that work “for free” given any implementation of the ADT.**  These custom implementations often improve performance by taking advantage of restrictions on the data that appear in the problem at hand.
- **The same principles apply in roughly the same way to ADTs that are mutable (operations modify the data structure) or immutable (operations return new versions of the data structure).**
- **It is possible and useful to parameterize one ADT implementation over another.**  This design technique supports *hierarchical* program decomposition, where a complex design is broken into a tree of pieces that are successively combined into more complex pieces.  Different members of a coding team can take responsibility for different pieces and work independently of improvements to the others, and even lone programmers can benefit from the chance to focus in on one part of a program, knowing that details of other parts don't directly influence its workings.