### Graphs

A graphs is a data structure specialized in `relationships`.  
Using a `hash` table, we can make lookups in O(1) steps.  

While all tree are graphs, `not` all graphs are tree.  
A graph is a tree when it has `no` cycles, and all nodes are connected. 

For example, in a social network, each person is represented by a `node`.  
Each `line` represents a friendship with another person.  

![](./images/graphs_friendship.png)

In [1]:
friends = {
    'Alice': ['Bob', 'Diana', 'Fred'],
    'Bob': ['Alice', 'Cynthia', 'Diana'],
    'Cynthia': ['Bob'],
    'Diana': ['Alice', 'Bob', 'Fred'],
    'Elise': ['Fred'],
    'Fred': ['Alice', 'Diana', 'Elise'],
}

print(friends['Alice'])

['Bob', 'Diana', 'Fred']


### Directed Graphs / Hash Table

Relationships are `not` always mutual.  
For example Alice may follow Bob, but Bob `doesn't` have to follow Alice.  

![directed graphs](./images/graphs_directed.png)

In [2]:
followees = {
    'Alice': ['Bob', 'Diana'],
    'Bob': ['Cynthia'],
    'Cynthia': ['Bob'],
}

assert 'Bob' in followees['Alice']
assert 'Alice' not in followees['Bob']

### Undirected Graph / Object Oriented

In graph's technical jargon, each node is called a `vertex`.  
The line between the nodes (vertices) are called `edges`.  

We can also `implement` a graph using an object oriented approach.  
In our social network example, each `vertex` represents a person.  

The `value` might be a string with the person's name.  
The `vertices` list contains all the vertices this vertex connects to.  

In [3]:
class Vertex:
    def __init__(self, name):
        self.value = name
        self.adjacent_vertices = []

    def add_adjacent(self, vertex):
        self.adjacent_vertices.append(vertex)

        # if the friendship is mutual (undirected graph), we automatically add self vertix
        if self not in vertex.adjacent_vertices:
            vertex.adjacent_vertices.append(self)

a = Vertex('Alice')
b = Vertex('Bob')
c = Vertex('Cynthia')

a.add_adjacent(b); a.add_adjacent(c)
b.add_adjacent(c)
c.add_adjacent(b)

print('Alice vertices:', [x.value for x in a.adjacent_vertices])
print('Bob vertices:', [x.value for x in b.adjacent_vertices])

Alice vertices: ['Bob', 'Cynthia']
Bob vertices: ['Alice', 'Cynthia']


### Graph Search

One of the `most` common operations is searching for a particular vertex.  
When applied to graphs, the term search usually has a more `specific` connotation.  

If we have access to a vertex, we are trying to find our `way` to another vertex.  
We can have multiple `paths` from a vertex to another.

### DFS / Traversal

The `Depth First Search` algorithm is quite similar with the binary tree traversal.  
The key to any graph search algorithm is keeping `track` of the visited vertex.  
Otherwise we can end up in an `infinite` cycle.

![dfs](./images/graphs_dfs.png)


In [4]:
class Vertex:
    def __init__(self, name):
        self.value = name
        self.adjacent_vertices = []

    def add(self, vertex):
        self.adjacent_vertices.append(vertex)

        if self not in vertex.adjacent_vertices: # mutual 
            vertex.adjacent_vertices.append(self)

        return self

a = Vertex('Alice')
b = Vertex('Bob')
c = Vertex('Candy')
d = Vertex('Derek')
e = Vertex('Elaine')
f = Vertex('Fred')
g = Vertex('Gina')
h = Vertex('Helen')
i = Vertex('Irena')

a.add(b).add(c).add(d).add(e)
b.add(f); c.add(h); d.add(e).add(g); e.add(d)
f.add(h)
g.add(i)
h.add(c)


def dfs_traverse(vertex, visited=[]):
    visited.append(vertex.value)

    for v in vertex.adjacent_vertices:
        if v.value in visited:
            continue

        dfs_traverse(v, visited) # recusion
    return visited

print([x for x in dfs_traverse(a)])
print([x for x in dfs_traverse(f, visited=[]) ])

['Alice', 'Bob', 'Fred', 'Helen', 'Candy', 'Derek', 'Elaine', 'Gina', 'Irena']
['Fred', 'Bob', 'Alice', 'Candy', 'Helen', 'Derek', 'Elaine', 'Gina', 'Irena']


### DPS / Search One

We can actually search for a `particular` vertex. 

In [5]:
def dfs(vertex, search_value, visited=[]):

    # Return the original vertex if it happens to be the one we are searching for:
    if vertex.value == search_value:
        return vertex

    visited.append(vertex.value)

    for v in vertex.adjacent_vertices:
        if v.value in visited:
            continue
        
        # Return the adjacent value if is the one we are searcing for:
        if v.value == search_value:
            return v

        search_vertex = dfs(v, search_value, visited) # recusion

        if search_vertex != None:
            return search_vertex

    return None

print('Helen =', dfs(a, 'Helen'))
print('Derek =', dfs(a, 'Derek'))
print('Unknown =', dfs(a, 'Unknown'))


Helen = <__main__.Vertex object at 0x0000017C45AEAAF0>
Derek = <__main__.Vertex object at 0x0000017C45A9EC40>
Unknown = None


### BFS / Traversal

Unlike DFS, Breadth First Search doesn't use recursion, it uses `queue`.  
The queue is a `FIFO` data structure (what goes first, comes out first).   

We can start at `any` vertex within the graph.  
Run a `loop` while the queue is not empty.

![bfs](./images/graphs_bfs.png)

In [6]:
from collections import deque

def bfs_traverse(starting_vertex):
    queue = deque()

    visited = []
    visited.append(starting_vertex.value)

    # Add starting vertex to the queue
    queue.append(starting_vertex)
    
    # While queue is not empty
    while bool(queue) == True:

        # Remove the first vertex and make it the current vertex
        current_vertex = queue.popleft()

        # Iterate over adjacent vertices
        for v in current_vertex.adjacent_vertices:
            if v.value in visited:
                continue
            
            # Add adjacent vertex to visited
            visited.append(v.value)

            # Add adjacent vertex to queue
            queue.append(v)

    return visited
            
print([x for x in bfs_traverse(a)])
print([x for x in bfs_traverse(f)])


['Alice', 'Bob', 'Candy', 'Derek', 'Elaine', 'Fred', 'Helen', 'Gina', 'Irena']
['Fred', 'Bob', 'Helen', 'Alice', 'Candy', 'Derek', 'Elaine', 'Gina', 'Irena']


### DFS vs BFS

When we want to move far `away` quickly, we use death-first search.  
When we want to stay `close` to the starting point, we use breadth-first search.  
For example to find Alice's `direct` friends, we use DFS.  

In [7]:
def dfs_traverse2(vertex, visited=None, level=0):

    if visited is None: 
        visited = []

    if level == 2:
        return visited # Look Here

    if level > 0:
        visited.append(vertex.value)

    for v in vertex.adjacent_vertices:
        if v.value in visited:
            continue
            
        dfs_traverse2(v, visited, level+1) # recusion
    return visited

print('Alice\'s direct friends:', [x for x in dfs_traverse2(a)])
print('Helen\'s direct friends:', [x for x in dfs_traverse2(h)])

Alice's direct friends: ['Bob', 'Candy', 'Derek', 'Elaine']
Helen's direct friends: ['Candy', 'Fred']


### Weighted Graphs

This type of graph adds additional information to the `edges` of the graph.  
For example, here is a basic map of several major cities and the `distance` between them.  
We can have weighted graphs that are `also` directional.  

![](./images/graphs_weighted.png) ![](./images/graphs_weighted_directonal.png)


In [46]:
class WeightedVertex:
    def __init__(self, value):
        self.value = value
        self.adjacent_vertices = {}

    def add_adjacent(self, vertex, price):
        self.adjacent_vertices[vertex.value] = price

d = WeightedVertex('Dallas')
t = WeightedVertex('Toronto')

d.add_adjacent(t, 138)
t.add_adjacent(d, 216)

print('Dallas =>', d.adjacent_vertices)
print('Toronto =>', t.adjacent_vertices)

Dallas => {'Toronto': 138}
Toronto => {'Dallas': 216}


### Dijkstra's Algorithm / Shortest Path

Weighted graphs also come with `powerfull` algorithms that different problems.  
For example, if we want to find the `lower` cost for different flights paths.  

This kind of puzzle is know as `shortest` path problem.  
One of the most famous algorithm for this was discoverd by Edsger `Dijkstra` (1959).  

We are going to find the `cheapest` prices from Atlanta to El Paso.  
Also, at the end, we will have a free `bonus`, the cheapest prices from Atlanta to all cities.

![](./images/graphs_dijkstra.png)

In [120]:
class City: # weighted graph vertex

    def __init__(self, name):
        self.name = name
        self.routes = {}

    def add_route(self, city, price):
        self.routes[city.name] = price

a = City('Atlanta')
b = City('Boston')
c = City('Chicago')
d = City('Denver')
e = City('El Paso')

a.add_route(b, 100); a.add_route(d, 160)
b.add_route(c, 120); b.add_route(d, 180)
c.add_route(e, 80)
d.add_route(c, 40); d.add_route(e, 140)
e.add_route(b, 100)

Cites = {'Atlanta': a, 'Boston': b, 'Chicago': c, 'Denver': d, 'El Paso': e}

for name, obj in Cites.items(): 
    print(name, obj.routes)

Atlanta {'Boston': 100, 'Denver': 160}
Boston {'Chicago': 120, 'Denver': 180}
Chicago {'El Paso': 80}
Denver {'Chicago': 40, 'El Paso': 140}
El Paso {'Boston': 100}


In [118]:
"""
    The CORE algorithm /
    Get the cheapest table, containing all the cheapest prices 
    to get to each city from the STARTING point
"""

def dijkstra_cheapest_prices(starting_city, debug=False):
    
    C = {} # Cheapest prices (table)
    U = [] # Unvisited cities (list)
    V = [] # Visited cities (list)

    current = starting_city
    C[current.name] = 0 # The price to itself is 0

    # Loop as long as we have unvisited cities
    while current:

        if debug: print('\nCurrent:', current.name)
        
        V.append(current.name)

        if current.name in U:
            U.remove(current.name)

        # Loop adjacent cities
        for name, price in current.routes.items():

            if debug: print(' Adjacent:', name, price)
            
            if name not in V:
                U.append(name)

            # Price of getting from STARTING to ADJACENT city 
            # using CURRENT city as the second-to-last stop:
            price_through_current_city = C[current.name] + price

            # If the price is the cheapest one we've found so far:
            if name not in C or price_through_current_city < C[name]:
                C[name] = price_through_current_city
                
            if debug: print('  ', starting_city.name, '=>', name, '/ chepest =', C[name])

        # Break the loop when there are no more unvisited cities
        if len(U) == 0:    
            break       
        
        # Set next unvisited city, the cheapest one
        current_name = min(U, key=lambda city: C[city])
        current = Cites[current_name]
    
    return C

cheapest = dijkstra_cheapest_prices(a)
print("Cheapest Atlanta =>", cheapest)

cheapest = dijkstra_cheapest_prices(b)
print("Cheapest Boston =>", cheapest)

Cheapest Atlanta => {'Atlanta': 0, 'Boston': 100, 'Denver': 160, 'Chicago': 200, 'El Paso': 280}
Cheapest Boston => {'Boston': 0, 'Chicago': 120, 'Denver': 180, 'El Paso': 200}


In [132]:
def dijkstra_shortest_path(starting_city, destination):

    C = {} # Cheapest prices (table)
    U = [] # Unvisited cities (list)
    V = [] # Visited cities (list)
    P = {} # Cheapest previous stopover city (table) - Look Here

    current = starting_city
    C[current.name] = 0 # The price to itself is 0

    # Loop as long as we have unvisited cities
    while current:        
        V.append(current.name)

        if current.name in U:
            U.remove(current.name)

        # Loop adjacent cities
        for name, price in current.routes.items():
            
            if name not in V:
                U.append(name)

            # Price of getting from STARTING to ADJACENT city 
            # using CURRENT city as the second-to-last stop:
            price_through_current_city = C[current.name] + price

            # If the price is the cheapest one we've found so far:
            if name not in C or price_through_current_city < C[name]:

                C[name] = price_through_current_city
                P[name] = current.name # Look Here
                
        # Break the loop when there are no more unvisited cities
        if len(U) == 0:    
            break       
        
        # Set next unvisited city, the cheapest one
        current_name = min(U, key=lambda city: C[city])
        current = Cites[current_name]
       
    # We have completed the core algorithm.
    # At this point, the cheapest table contains all the cheapest prices 
    # to get to each city from the STARTING point

    # We build the shortest path using an array:
    shortest_path = []

    # Work backwords from final destination
    current_name = destination.name

    # Loop until we reach the starting city:
    while current_name != starting_city.name:

        # Add each current_name to shortest_path
        shortest_path.append(current_name)

        # Follow each city to its previous stopover city
        current_name = P[current_name]

    # Add the starting city to the path
    shortest_path.append(starting_city.name)

    # We reverse the path to see it from beginning to end
    return list(reversed(shortest_path))
 
print(dijkstra_shortest_path(a, b))
print(dijkstra_shortest_path(a, c))
print(dijkstra_shortest_path(a, d))
print(dijkstra_shortest_path(a, e))

print(dijkstra_shortest_path(a, e), dijkstra_cheapest_prices(a)['El Paso'])
print(dijkstra_shortest_path(e, d), dijkstra_cheapest_prices(e)['Denver'])

['Atlanta', 'Boston']
['Atlanta', 'Denver', 'Chicago']
['Atlanta', 'Denver']
['Atlanta', 'Denver', 'Chicago', 'El Paso']
['Atlanta', 'Denver', 'Chicago', 'El Paso'] 280
['El Paso', 'Boston', 'Denver'] 280
