# 6.0002 Lecture 3: Graph-theoretic Models

**Speaker:** Prof. Eric Grimson

## Computational Models
- programs that help us understand the world and solve practical problems
- saw how we could map the informal problem of choosing what to eat into an optimization problem, and how we could design a program to solve it
- now want to look at class of models called graphs

## What's a graph?
- set of nodes (vertices)
    - might have properties associated with them
- set of edges (arcs), each consisting of a pair of nodes
    - undirected (graph)
    - directed (digraph)
        - source (parent) and destination (child) nodes
    - unweighted or weighted

## Why graphs?
- to capture useful relationships among entities
    - rail links between Paris and London
    - how the atoms in a molecule are related to one another
    - ancestral relationships

## Trees: an important special case
- a special kind of **directed** graph in which any pair of nodes is connected by a **single path**
    - recall the search trees we used to solve knapsack problem
        - looks like a graph starting at a single node, then branching out to more and more nodes downwards

## Why graphs are so useful
- world is full of networks based on relationships
    - computer networks
    - transportation networks
    - financial networks
    - sewer or water networks
    - political networks
    - criminal networks
    - social networks
- we will see that not only do graphs capture relationships in connected networks of elements, they also support **inference** on those structures
    - finding sequences of links between elements -- is there a path from A to B
    - finding the least expensive path between elements 
        - (a.k.a. shortest path problem)
    - partitioning the graph into sets of connected elements 
        - (a.k.a. graph partition problem)
    - finding the most efficient way to separate sets of connected elements 
        - (a.k.a. the min-cut / max-flow problem)

## First reported use of graph theory
- bridges of Königsberg (1735)
- possible to take a walk that traverses each of the 7 bridges exactly once?

## Leonhard Euler's model
- make each island a node
- each bridge an undirected edge
- this model abstracts away irrelevant details
    - size of islands
    - lengths of bridges
- is there a path that contains each edge exactly once?
    - no!

## Implementing and using graphs
- building graphs
    - nodes
    - edges
    - stitching together to make graphs
- using graphs
    - searching for paths between nodes
    - searching for optimal paths between nodes

## Class Node

In [1]:
class Node(object):
    def __init__(self, name):
        """Assumes name is a string"""
        self.name = name
    def getName(self):
        return self.name
    def __str__(self):
        return self.name

## Class Edge

In [2]:
class Edge(object):
    def __init__(self, src, dest):
        """Assumes src and dest are nodes"""
        self.src = src
        self.dest = dest
    def getSource(self):
        return self.src
    def getDestination(self):
        return self.dest
    def __str__(self):
        return self.src.getName() + '->'\
                + self.dest.getName()

## Common representations of Digraphs
- digraph is a directed graph
    - edges pass in one direction only
- adjacency matrix
    - rows: source nodes
    - columns: destination nodes
    - Cell[s, d] = 1 if there is an edge from s to d, =0 otherwise
    - note that in digraph, matrix is **not** symmetric
- adjacency list
    - associate with each node a list of destination nodes

## Class Digraph

In [18]:
class Digraph(object):
    """edges is a dict mapping each node to a list of its children"""
    def __init__(self):
        self.edges = {}
    def addNode(self, node):
        if node in self.edges:
            raise ValueError('Duplicate node')
        else:
            self.edges[node] = []
    def addEdge(self, edge):
        src = edge.getSource()
        dest = edge.getDestination()
        if not (src in self.edges and dest in self.edges):
            raise ValueError('Node not in graph')
        self.edges[src].append(dest)
    def childrenOf(self, node):
        return self.edges[node]
    def hasNode(self, node):
        return node in self.edges
    def getNode(self, name):
        for n in self.edges:
            if n.getName() == name:
                return n
        raise NameError(name) # only returns error if n not returned (above)
    def __str__(self):
        result = ''
        for src in self.edges:
            for dest in self.edges[src]:
                result = result + src.getName() + '->'\
                        + dest.getName() + '\n'
        return result[:-1] # omit final newline

## class Graph

In [5]:
# subclass of digraph
class Graph(Digraph):
    def addEdge(self, edge):
        Digraph.addEdge(self, edge)
        rev = Edge(edge.getDestination(), edge.getSource())
        Digraph.addEdge(self, rev)

- Graph does not have directionality associated with an edge
    - edges allow passages in either direction
- why is Graph a subclass of Digraph?
- Remember the substitution rule?
    - if client code works correctly using an instance of the supertype, it should also work correctly when an instance of the subtype is substituted for the instance of the supertype
- any program that works with a Digraph will also work with a Graph (but not *vice versa*)

## A classic Graph optimization problem
- shortest path from n1 to n2
    - shortest sequence of edges such that
        - source node of first edge is n1
        - destination of last edge is n2
        - for edges, e1 and e2, in the sequence, if e2 follows e1 in the sequence, the source of e2 is the destination of e1
- shortest weighted path
    - minimize the sum of the weights of edges in the path

## Some shortest path problems
- finding a route from one city to another
- desinging communication networks
- finding a path for a molecule through a chemical labyrinth

## An example
- adjacency list:
    - Boston: Providence, New York
    - Providence: Boston, New York
    - New York: Chicago
    - Chicago: Denver, Phoenix
    - Denver: Phoenix, New York
    - Los Angeles: Boston
    - Phoenix:

## Build the graph

In [7]:
def buildCityGraph(graphType):
    g = graphType()
    for name in ('Boston', 'Providence', 'New York', 'Chicago', 
                 'Denver', 'Phoenix', 'Los Angeles'): # create 7 nodes
        g.addNode(Node(name))
    g.addEdge(Edge(g.getNode('Boston'), g.getNode('Providence')))
    g.addEdge(Edge(g.getNode('Boston'), g.getNode('New York')))
    g.addEdge(Edge(g.getNode('Providence'), g.getNode('Boston')))
    g.addEdge(Edge(g.getNode('Providence'), g.getNode('New York')))
    g.addEdge(Edge(g.getNode('New York'), g.getNode('Chicago')))
    g.addEdge(Edge(g.getNode('Chicago'), g.getNode('Denver')))
    g.addEdge(Edge(g.getNode('Chicago'), g.getNode('Phoenix')))
    g.addEdge(Edge(g.getNode('Denver'), g.getNode('Phoenix')))
    g.addEdge(Edge(g.getNode('Denver'), g.getNode('New York')))
    g.addEdge(Edge(g.getNode('Los Angeles'), g.getNode('Boston')))
    
    return g

## Finding the shortest path
- algorithm 1: **depth-first search** (DFS)
- similar to left-first depth-first method of enumerating a search tree (lecture 2)
- main difference is that graph might have cycles, so we must keep track of what nodes we have visited to avoid going in infinite loops
- note that we are using **divide-and-conquer**: 
    - if we can find a path from a source to an intermediate node, and a path from the intermediate node to the destination, the combination is a path from source to destination

## Depth-First Search
- start at an initial node
- consider all the edges that leave that node, in some order
- follow the first edge, and check to see if at goal node
- if not, repeat the process from new node
- continue until either find goal node, or run out of options
    - when run out of options, backtrack to the previous node and try the next edge, repeating this process

## Depth-First Search (DFS)

In [8]:
def DFS(graph, start, end, path, shortest, toPrint=False):
    path = path + [start]
    if toPrint:
        print('Current DFS path:', printPath(path))
    if start == end:
        return path
    for node in graph.childrenOf(start): # returning to this point in recursion to try next node
        if node not in path: # avoid cycles
            if shortest == None or len(path) < len(shortest):
                newPath = DFS(graph, node, end, path, shortest, toPrint)
                if newPath != None:
                    shortest = newPath
        elif toPrint:
            print('Already visited', node)
    return shortest

In [14]:
def shortestPath(graph, start, end, toPrint=False):
    return DFS(graph, start, end, [], None, toPrint)

- DFS called from a wrapper function: shortestPath
- gets recursion started properly
- provides appropriate abstraction

In [15]:
# helper function to print the path
def printPath(path):
    """Assumes path is a list of nodes"""
    result = ''
    for i in range(len(path)):
        result = result + str(path[i])
        if i != len(path) - 1:
            result = result + '->'
    return result 

## Test DFS

In [19]:
def testSP(source, destination):
    g = buildCityGraph(Digraph)
    sp = shortestPath(g, g.getNode(source), g.getNode(destination), toPrint=True)
    if sp != None:
        print('Shortest path from', source, 'to', destination, 'is', printPath(sp))
    else:
        print('There is no path from', source, 'to', destination)

In [20]:
testSP('Boston', 'Chicago')

Current DFS path: Boston
Current DFS path: Boston->Providence
Already visited Boston
Current DFS path: Boston->Providence->New York
Current DFS path: Boston->Providence->New York->Chicago
Current DFS path: Boston->New York
Current DFS path: Boston->New York->Chicago
Shortest path from Boston to Chicago is Boston->New York->Chicago


In [21]:
testSP('Chicago', 'Boston')

Current DFS path: Chicago
Current DFS path: Chicago->Denver
Current DFS path: Chicago->Denver->Phoenix
Current DFS path: Chicago->Denver->New York
Already visited Chicago
Current DFS path: Chicago->Phoenix
There is no path from Chicago to Boston


In [22]:
testSP('Boston', 'Phoenix')

Current DFS path: Boston
Current DFS path: Boston->Providence
Already visited Boston
Current DFS path: Boston->Providence->New York
Current DFS path: Boston->Providence->New York->Chicago
Current DFS path: Boston->Providence->New York->Chicago->Denver
Current DFS path: Boston->Providence->New York->Chicago->Denver->Phoenix
Already visited New York
Current DFS path: Boston->Providence->New York->Chicago->Phoenix
Current DFS path: Boston->New York
Current DFS path: Boston->New York->Chicago
Current DFS path: Boston->New York->Chicago->Denver
Current DFS path: Boston->New York->Chicago->Denver->Phoenix
Already visited New York
Current DFS path: Boston->New York->Chicago->Phoenix
Shortest path from Boston to Phoenix is Boston->New York->Chicago->Phoenix


## Breadth-First Search
- start at an initial node
- consider all the edges that leave the node, in some order
- follow the first edge, and check to see if at goal node
- if not, try the next edge from the current node
- continue until either find goal node, or run out of options
    - when run out of edge options, move to next node at same distance from start, and repeat
    - when run out of options, move to next level in the graph (all nodes one step further from start), and repeat

## Algorithm 2: Breadth-First Search (BFS)

In [27]:
# explore all paths with n hops before explorign any path with more than n hops
def BFS(graph, start, end, toPrint=False):
    initPath = [start]
    pathQueue = [initPath]
    while len(pathQueue) != 0:
        # Get and remove oldest element in pathQueue
        if printQueue:
            print('Queue:', len(pathQueue))
            for p in pathQueue:
                print(printPath(p))
        tmpPath = pathQueue.pop(0)
        if toPrint:
            print('Current BFS path:', printPath(tmpPath))
            print()
        lastNode = tmpPath[-1]
        if lastNode == end:
            return tmpPath
        for nextNode in graph.childrenOf(lastNode):
            if nextNode not in tmpPath:
                newPath = tmpPath + [nextNode]
                pathQueue.append(newPath)
    return None

## Test BFS

In [28]:
def shortestPath(graph, start, end, toPrint = False):
    """Assumes graph is a Digraph; start and end are nodes
       Returns a shortest path from start to end in graph"""
    return BFS(graph, start, end, toPrint)

In [29]:
printQueue = True

testSP('Boston', 'Phoenix')

Queue: 1
Boston
Current BFS path: Boston

Queue: 2
Boston->Providence
Boston->New York
Current BFS path: Boston->Providence

Queue: 2
Boston->New York
Boston->Providence->New York
Current BFS path: Boston->New York

Queue: 2
Boston->Providence->New York
Boston->New York->Chicago
Current BFS path: Boston->Providence->New York

Queue: 2
Boston->New York->Chicago
Boston->Providence->New York->Chicago
Current BFS path: Boston->New York->Chicago

Queue: 3
Boston->Providence->New York->Chicago
Boston->New York->Chicago->Denver
Boston->New York->Chicago->Phoenix
Current BFS path: Boston->Providence->New York->Chicago

Queue: 4
Boston->New York->Chicago->Denver
Boston->New York->Chicago->Phoenix
Boston->Providence->New York->Chicago->Denver
Boston->Providence->New York->Chicago->Phoenix
Current BFS path: Boston->New York->Chicago->Denver

Queue: 4
Boston->New York->Chicago->Phoenix
Boston->Providence->New York->Chicago->Denver
Boston->Providence->New York->Chicago->Phoenix
Boston->New York->Ch

## What about a weighted shortest path
- want to minimize the sum of the weights of the edges
- DFS can be easily modified to do this
- BFS cannot, since shortest weighted path may have more than the minimum number of hops

## Recap
- graphs are cool
    - best way to create a model of many things
        - capture relationships among objects
    - many important problems can be posed as graph optimization problems we already know how to solve
- depth-first and breadth-first search are important algorithms
    - can be used to solve many problems