Graphs
======

Introduction
------------

Graphs are another very useful data structure in computer science.  They're highly related to trees.  Indeed, trees are just a subset of graphs.

A graph is a collection of nodes and edges linking those nodes.  If the edges have a direction (as in node A connects to node B, which might represent that you can fly from Auckland to Wellington) then the graph is directed.  Unlike with trees, cycles are possible in graphs.  (An example of a cycle could be that you could fly from Auckland to Wellington to Christchurch and then *back to* Auckland.)

*What experiences have you had using graphs?*

Nodes and Edges
---------------

We could use exactly the same class to represent nodes as the class we used for trees.  However previously, our `Node` class could contain only two edges.  In general, for both trees and graphs, a node could have many connections to other nodes.  We could store these connections as a list for each node.  This method of storing a graph is called an [adjacency list][adj-list].

  [adj-list]: https://en.wikipedia.org/wiki/Adjacency_list

In [2]:
class Node:
    def __init__(self, data):
        self.connections = list()
        self.data = data
        
    def addConnection(self, connection):
        self.connections.append(connection)

Nodes can have attributes.  In our work with trees previously the nodes had a "data" component that we used to store its name ("labrador", or "dog", for example).  It's possible that a node may have a number of attributes.  For example, if a node represented a city, then it could have the name of the city and it's latitude and longitude as attributes.

Further, it's possible that edges can have attributes too.  For example, if we're able to fly from Auckland to Wellington, an edge may be used to represent that flight and one possible attribute could be the price.


In [3]:
class Edge:
    def __init__(self, destination, data=None):
        self.destination = destination
        self.data = data


We will load an example graph by reading some data in CSV format (Python contains a CSV reader in the module [`csv`][csv]).  There is a file called `romania.txt` that contains rows such as
<pre>
  Arad,Zerind,75
</pre>
meaning that there is a road from Arad to Zerind that is 75 kilometers long.  So here we can see that our nodes (cities) will have names as attributes, and the edges (roads) will have distances as attributes.  You could imagine that because of a one-way tunnel, for example, it may not be possible to use a road in both directions.  To represent a two-way road, we could store an edge from, say, Arad to Zerind, and also from Zerind to Arad.

  [csv]: https://docs.python.org/3.5/library/csv.html

In [4]:
import csv

# Create a dictionary, which maps city names to their Node instances
cities = dict()

def connect(city1, city2, length):
    '''Connects two cities together (assumes a two-way road).'''
    city1.addConnection(Edge(city2, length))
    city2.addConnection(Edge(city1, length))

def getCity(name):
    '''Get city if it already exists, otherwise create the city, add it to the list of cities, 
    and return the new city.'''
    if name in cities:
        return cities[name]
    else:
        city = Node(name)
        cities[name] = city
        return city

def read(filename):
    '''Read a csv file with (city1, city2, road length) rows.'''
    with open(filename, 'r') as csvfile:
        csvreader = csv.reader(csvfile)
        for row in csvreader:
            name1 = row[0]
            name2 = row[1]
            length = row[2]
            city1 = getCity(name1)
            city2 = getCity(name2)
            connect(city1, city2, length)
            
# Read our example graph of Romania's cities and roads
read("romania.txt")

# How many cities did we read?
print("Number of cities: "+str(len(cities)))

Number of cities: 20


We now have a graph of the simplified roads in part of Romania.  The data came from figure 2, page 70, of AIMA.

  ![Image of simplified roads of Romania](romania.png)
  
We will re-use the code above, which can be found in [romania.py](romania.py).
  

In [5]:
from collections import deque

def breadthFirstGraphTraversal(start):
    # Initialse a queue containing the start node; these nodes represent the "frontier" of our search
    frontier = deque([start])
    
    # Initialise an explored set; we haven't explored anything so far
    explored = set()
    
    # Go through the frontier queue until it's empty
    while frontier:
        # Get the node at the front of the queue (the left side) and process it
        node = frontier.popleft()
        print("Visiting node "+node.data)
        
        # Add the node to the explored set
        explored.add(node)
        
        # Add the node's children (from left to right) to the end of the queue (the right side)
        for road in node.connections:
            destination = road.destination
            if destination not in explored and destination not in frontier:
                frontier.append(road.destination)

In [6]:
arad = cities["Arad"]
breadthFirstGraphTraversal(arad)

Visiting node Arad
Visiting node Zerind
Visiting node Timisoara
Visiting node Sibiu
Visiting node Oradea
Visiting node Lugoj
Visiting node Fagaras
Visiting node Rimnicu Vilcea
Visiting node Mehadia
Visiting node Bucharest
Visiting node Pitesti
Visiting node Craiova
Visiting node Drobeta
Visiting node Giurgiu
Visiting node Urziceni
Visiting node Hirsova
Visiting node Vaslui
Visiting node Eforie
Visiting node Iasi
Visiting node Neamt


**Excercises**
- Implement `depthFirstGraphTraversal()`.
- How would you modify the traversal functions in order to report the distances from the start city to the city being visited?
- Compare adjacency lists and adjacency matrices.  When would you use one over the other?