# Chapter 13 - Connecting Everything with Graphs

## Graphs

Here's an non-directed graph (like Facebook friends).

<img src="imgs/graphs_Part1.png">

Here's a directed graph (like Twitter follows).

<img src="imgs/graphs_Part2.png">

Hash tables are one of the simplest ways to implement a graph.

In [1]:
# non-directed graph hash table (like Facebook friends)
friends = {
    'Alice'   : ['Bob', 'Diana', 'Fred'],
    'Bob'     : ['Alice', 'Cynthia', 'Diana'],
    'Cynthia' : ['Bob'],
    'Diana'   : ['Alice', 'Bob', 'Fred'],
    'Elise'   : ['Fred'],
    'Fred'    : ['Alice', 'Diana', 'Elise']
}

We can look up a person's friends with $O(1)$ efficiency.

In [2]:
friends['Alice']

['Bob', 'Diana', 'Fred']

In [3]:
# directed graph hash table (like Twitter follows)
followees = {
    'Alice'   : ['Bob', 'Cynthia'],
    'Bob'     : ['Cynthia'],
    'Cynthia' : ['Bob']
}

In [4]:
followees['Cynthia']

['Bob']

In [5]:
class Person:
    def __init__(self, init_name):
        self.name = init_name
        self.friends = []
    
    def add_friend(self, friend):
        self.friends.append(friend)

In [6]:
alice = Person('Alice')

In [7]:
alice.name

'Alice'

In [8]:
bob = Person('Bob')

In [9]:
bob.name

'Bob'

In [10]:
alice.add_friend(bob)

In [11]:
alice.friends

[<__main__.Person at 0x110ee67b8>]

In [12]:
len(alice.friends)

1

In [13]:
len(bob.friends)

0

In [14]:
for f in alice.friends:
    print(f.name)
    if len(f.friends) > 0:
        for sub_f in f.friends:
            print(sub_f.name)

Bob


## Breadth-First Search

<img src="imgs/graphs_Part5.png">

In [15]:
class Person:
    def __init__(self, init_name):
        self.name = init_name
        self.friends = []
        self.visited = False
    
    def add_friend(self, friend):
        self.friends.append(friend)
    
    def display_network(self):
        # we keep track of every node we ever visit, so we can
        # reset their 'visited' attribute back to false after
        # our algoritm is complete
        to_reset = [self]
        
        # create the queue
        # it starts out containing the root vertex
        queue = [self]
        self.visited = True
        
        while len(queue) != 0:
            # the current vertex is whatever is 
            # removed from the queue
            current_vertex = queue.pop(0)
            print(current_vertex.name)
            #for q in queue:
            #    print(" ", q.name, end = '')
            #print()
            
            # we add all adjacent vertices of the current vertex
            # to the queue
            for friend in current_vertex.friends:
                if friend.visited == False:
                    to_reset.append(friend)
                    queue.append(friend)
                    friend.visited = True
                    
        for node in to_reset:
            node.visited = False

In [16]:
# create Persons
alice  = Person('Alice')
bob    = Person('Bob')
candy  = Person('Candy')
derek  = Person('Derek')
elaine = Person('Elaine')
fred   = Person('Fred')
gina   = Person('Gina')
helen  = Person('Helen')
irena  = Person('Irena')

In [17]:
# add Alice's friends
alice.add_friend(bob)
alice.add_friend(candy)
alice.add_friend(derek)
alice.add_friend(elaine)

In [18]:
# add Bob's friends
bob.add_friend(alice)
bob.add_friend(fred)
#bob.add_friend(irena)

In [19]:
# add Candy's friends
candy.add_friend(alice)

In [20]:
# add Derek's friends
derek.add_friend(alice)
derek.add_friend(gina)
#derek.add_friend(helen)

In [21]:
# add Elaine's friends
elaine.add_friend(alice)

In [22]:
# add Fred's friends
fred.add_friend(bob)
fred.add_friend(helen)

In [23]:
# add Gina's friends
gina.add_friend(derek)
gina.add_friend(irena)

In [24]:
# add Helen's friends
helen.add_friend(fred)

In [25]:
# add Irena's friends
irena.add_friend(gina)

In [26]:
alice.name

'Alice'

In [27]:
alice.friends

[<__main__.Person at 0x110f0d3c8>,
 <__main__.Person at 0x110ee65c0>,
 <__main__.Person at 0x110f0d5c0>,
 <__main__.Person at 0x110f0d5f8>]

In [28]:
for friend in alice.friends:
    print(friend.name)

Bob
Candy
Derek
Elaine


In [29]:
alice.visited

False

In [30]:
for friend in bob.friends:
    print(friend.name)

Alice
Fred


In [31]:
alice.display_network()

Alice
Bob
Candy
Derek
Elaine
Fred
Gina
Helen
Irena


In [32]:
bob.display_network()

Bob
Alice
Fred
Candy
Derek
Elaine
Helen
Gina
Irena


In [33]:
candy.display_network()

Candy
Alice
Bob
Derek
Elaine
Fred
Gina
Helen
Irena


In [34]:
derek.display_network()

Derek
Alice
Gina
Bob
Candy
Elaine
Irena
Fred
Helen


The efficiency of breadth-first search in our graph can be calculated by breaking down the algorithm's steps into two types:

* We remove the vertex from the queue to designate it as the current vertex.
* For each current vertex, we visit each of its adjacent vertices.

Each vertex is removed from the queue once. That's called $O(V)$ in Big O notation.

The number of times we visit adjacent vertices for each vertex is 2 times. So each edge gets used twice. That's $O(2E)$ => $O(E)$.

So, the breadth-first search has an efficiency of $O(V + E)$.

In [35]:
class Person:
    def __init__(self, init_name):
        self.name = init_name
        self.friends = []
        self.visited = False
    
    def add_friend(self, friend):
        self.friends.append(friend)
    
    def display_network(self):
        # we keep track of every node we ever visit, so we can
        # reset their 'visited' attribute back to false after
        # our algoritm is complete
        to_reset = [self]
        
        # create the queue
        # it starts out containing the root vertex
        queue = [self]
        self.visited = True
        
        while len(queue) != 0:
            # the current vertex is whatever is 
            # removed from the queue
            current_vertex = queue.pop(0)
            print(current_vertex.name)
            #for q in queue:
            #    print(" ", q.name, end = '')
            #print()
            
            # we add all adjacent vertices of the current vertex
            # to the queue
            for friend in current_vertex.friends:
                if friend.visited == False:
                    to_reset.append(friend)
                    queue.append(friend)
                    friend.visited = True
                    
        for node in to_reset:
            node.visited = False

    def display_network_depth(self, depth):
        
        # current_vertex = self
        
        if depth > 0:
            for friend in self.friends:
                print(friend.name)
                friend.display_network_depth(depth - 1)
            

In [36]:
# create Persons
alice  = Person('Alice')
bob    = Person('Bob')
candy  = Person('Candy')
derek  = Person('Derek')
elaine = Person('Elaine')
fred   = Person('Fred')
gina   = Person('Gina')
helen  = Person('Helen')
irena  = Person('Irena')

In [37]:
# add Alice's friends
alice.add_friend(bob)
alice.add_friend(candy)
alice.add_friend(derek)
alice.add_friend(elaine)

# add Bob's friends
bob.add_friend(alice)
bob.add_friend(fred)

# add Candy's friends
candy.add_friend(alice)

# add Derek's friends
derek.add_friend(alice)
derek.add_friend(gina)

# add Elaine's friends
elaine.add_friend(alice)

# add Fred's friends
fred.add_friend(bob)
fred.add_friend(helen)

# add Gina's friends
gina.add_friend(derek)
gina.add_friend(irena)

# add Helen's friends
helen.add_friend(fred)

# add Irena's friends
irena.add_friend(gina)

In [38]:
alice.display_network_depth(1)

Bob
Candy
Derek
Elaine


In [39]:
bob.display_network_depth(1)

Alice
Fred


In [40]:
alice.display_network_depth(2)

Bob
Alice
Fred
Candy
Alice
Derek
Alice
Gina
Elaine
Alice


Come back to this at a later time. The point now is to do a survey of basic data structures / algorithms.

## Weighted Graphs

In [41]:
class City:
    def __init__(self, init_name):
        self.name = init_name
        self.routes = {} # hash table instead of array
    
    def add_route(self, city, price):
        self.routes[city] = price

In [42]:
dallas = City('Dallas')

In [43]:
toronto = City('Toronto')

In [44]:
louisville = City('Louisville')

In [45]:
dallas.add_route(toronto, 138)

In [46]:
dallas.add_route(louisville, 342)

In [47]:
toronto.add_route(dallas, 216)

In [48]:
dallas.name

'Dallas'

In [49]:
for route,price in dallas.routes.items():
    print(route, route.name, price)

<__main__.City object at 0x110f14438> Toronto 138
<__main__.City object at 0x110f14fd0> Louisville 342


## Dijkstra's Algorithm

Here are the rules for Dijkstra's Algorithm:

1. We make the starting vertex our current vertex.
2. We check all the vertices adjacent to the current vertex and calculate and record the weights from the starting vertex to all known locations.
3. To determine the next current vertex, we find the _cheapest unvisited_ known vertex tht can be reached from our starting index.
4. Repeat the first three steps until we have visited every vertex in the graph.

<img src="imgs/graphs_second_half_Part10.png" style="width: 375px;">

To record the cheapest price of the routes from Atlanta to other cities, we will use a table as follows:

| .       | Boston | Chicago | Denver | El Paso |
|---------|--------|---------|--------|---------|
| Atlanta | \?     | \?      | \?     | \?      |

First we make the starting vertex (Atlanta) the current index.

Next we check all adjacent vertices and record the weights from the starting vertex (Atlanta) to all know locations. We can see right away that Atlanta -> Boston is \\$100 and Atlanta -> Denver is \\$160.

<img src="imgs/graphs_second_half_Part11.png" style="width: 375px;">

| .       | Boston | Chicago | Denver | El Paso |
|---------|--------|---------|--------|---------|
| Atlanta | \\$100 | \?      | \\$160 | \?      |

Next, we find the cheapest vertex that can be reached from Atlanta that has not yet been visited. We only know how to get to Boston and Denver from at Atlanta at this point, and it's cheaper to get to Boston (\\$100) than it is to Denver (\\$160). So we make Boston our current vertex.

<img src="imgs/graphs_second_half_Part12.png" style="width: 375px;">

We now check both routes from Boston, and record all new data about the cost of the routes _**from Atlanta**_--the starting vertex--to all known locations. Boston -> Chicago is \\$120. Since Atlanta -> Boston is \\$100 and Boston -> Chicago is \\$120, the cheapest (and only) known route from Atlanta -> Chicago is \\$220 (Atlanta -> Boston -> Chicago, \\$100 + \\$120).

| .       | Boston | Chicago | Denver | El Paso |
|---------|--------|---------|--------|---------|
| Atlanta | \\$100 | \\$220  | \\$160 | \?      |

We also look at Boston's other route, which is Denver, and that's \\$180. Now we see a new route from Atlanta to Denver: Atlanta > Boston > Denver. This new route to Denver is \\$280, but it's not cheaper than the \\$160 we've already recorded in the table, so we don't update the table.

We've explored all outgoing routes from the current vertex (Boston), we next look for the unvisited vertex that is cheapest to reach from Atlanta. Boston is cheapest, but visited. Denver is next cheapest (Chicago is \\$220), and unvisited. So we make Denver our current vertex.

<img src="imgs/graphs_second_half_Part13.png" style="width: 375px;">

We now inspect the routes that leave Denver. Denver -> Chicago is \\$40. We can update our table since we now have a cheaper path from Atlanta to Chicago. The table shows \\$220 Atlanta -> Chicago, but Atlanta -> Denver -> Chicago is \\$200. So we update the table.

| .       | Boston | Chicago | Denver | El Paso |
|---------|--------|---------|--------|---------|
| Atlanta | \\$100 | \\$200  | \\$160 | \?      |

There's a new city revealed by a flight out of Denver: El Paso. The cheapest path to El Paso would be \\$300 Atlanta -> Denver -> El Paso. We can add this to the table.

| .       | Boston | Chicago | Denver | El Paso |
|---------|--------|---------|--------|---------|
| Atlanta | \\$100 | \\$200  | \\$160 | \\$300  |

There are now two unvisited vertices: Chicago and El Paso. Since Atlanta -> Chicago (\\$200) is cheaper than Atlanta -> El Paso (\\$300), Chicago becomes the current vertex. 

<img src="imgs/graphs_second_half_Part14.png" style="width: 375px;">

The Chicago vertex has just one outbound flight (to El Paso) and we now have a new route from Atlanta -> El Paso, and this new route (Atlanta -> Denver -> Chicago -> El Paso, \\$280) is cheaper than the \\$300 route we have in our table. So we update the table.

| .       | Boston | Chicago | Denver | El Paso |
|---------|--------|---------|--------|---------|
| Atlanta | \\$100 | \\$200  | \\$160 | \\$280  |

There's only one known city left to make the current vertex: El Paso.

<img src="imgs/graphs_second_half_Part15.png" style="width: 375px;">

El Paso has only one outbound route, and that is a $100 flight to Boston. This route doesn't reveal any cheaper routes from Atlanta to anywhere, so we don't need to modify the table.

Since we've visited every vertex and checked it off, we now know every path from Atlanta to every other city. The algorithm is now complete, and our resulting table reveals the cheapest price of Atlanta to every other city on the map.

| .       | Boston | Chicago | Denver | El Paso |
|---------|--------|---------|--------|---------|
| Atlanta | \\$100 | \\$200  | \\$160 | \\$280  |

In [50]:
class City:
    def __init__(self, init_name):
        self.name = init_name
        # for adjacent nodes, we're now using a hash table
        self.routes = {}
        # for example, if this were Atlanta, its routes would be
        # {boston:100, denver:160}
    
    def add_route(self, city, price_info):
        self.routes[city] = price_info

In [51]:
atlanta = City('Atlanta')
boston  = City('Boston')
chicago = City('Chicago')
denver  = City('Denver')
el_paso = City('El Paso')

In [52]:
atlanta.add_route(boston, 100)
atlanta.add_route(denver, 160)
boston.add_route(chicago, 120)
boston.add_route(denver, 180)
chicago.add_route(el_paso, 80)
denver.add_route(chicago, 40)
denver.add_route(el_paso, 140)
el_paso.add_route(boston, 100)

In [53]:
def dijkstra(starting_city, other_cities):
    # `routes_from_city` hash table below holds the data of 
    # all `price_info`s from the given city to all other
    # destinations, and the city which took to get there
    routes_from_city = {}
    # the format of this data is:
    # {city: [price, other city which immediately precedes this
    #  city along the path from the original city]}
    
    # in our example this will end up being:
    # {atlanta: [0, None],
    #  boston:  [100, atlanta],
    #  chicago: [200, denver],
    #  denver:  [160, atlanta],
    #  el_paso: [180, chicago]}
    
    # since it costs nothing to get to the starting city
    # from the starting city:
    routes_from_city[starting_city] = [0, starting_city]
    
    # when initializing our data, we set up all other cities
    # having an infinite cost... since the cost and the path to
    # get to each other city is currently unknown:
    for city in other_cities:
        routes_from_city[city] = [float('inf'), None]
    
    # in our example, the routes_from_city starts out as:
    # {atlanta: [0, atlanta],
    #  boston:  [inf, None],
    #  chicago: [inf, None],
    #  denver:  [inf, None],
    #  el_paso: [inf, None]}
    
    # we keep track of visited cities in this array
    visited_cities = []
    
    # we begin visiting the starting city by makeing it the
    # current_city
    current_city = starting_city
    
    # we launch the heart of the algorithm, which is a loop
    # that visits each city
    while current_city != None:
        
        # we officially visit the current city
        visited_cities.append(current_city)
        
        # we check each route from the current city
        for city,price_info in current_city.routes.items():
            # if the route from starting_city to the other city
            # is cheaper than currently recorded in 
            # routes_from_city, we update it
            if routes_from_city[city][0] >\
            price_info + routes_from_city[current_city][0]:
                routes_from_city[city] =\
                [price_info + routes_from_city[current_city][0], 
                 current_city]
        
        # determine which city to visit next
        current_city = None
        cheapest_route_from_current_city = float('inf')
        # check all available routes
        for city,price_info in routes_from_city.items():
            # if this route is the cheapest from this city, and
            # it hasn't yet been visited, it should be marked as
            # the city we'll visit next
            if price_info[0] <\
            cheapest_route_from_current_city and\
            city not in visited_cities:
                cheapest_route_from_current_city = price_info[0]
                current_city = city
        
    return routes_from_city
            

In [54]:
routes = dijkstra(atlanta, [boston, chicago, denver, el_paso])
for city,price_info in routes.items():
    print(city.name, price_info[0])

Atlanta 0
Boston 100
Chicago 200
Denver 160
El Paso 280
