# DSC 40B: Graph Search Algorithms
Daniel Lee, Udaikaran Singh, and Justin Eldridge

In this notebook, we will get practice with the two graph search algorithms covered in class: Breadth-First Search (BFS) and Depth-First Search (DFS). First, the usual imports:

In [1]:
import collections

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets

The following module contains helper functions used to load data and make plots. Have a look through this file if you are interested to see how these things are done.

In [2]:
import demo

We will be using a "dict-of-dicts" graph representation, implemented in the `dsc40graph` module. Take a look at this file if you are interested in knowing more about how the graph data structure is implemented.

In [3]:
from dsc40graph import UndirectedGraph, DirectedGraph

## Flight Graph

Consider the graph of domestic flights. Each node in the graph is an airport, and there is an edge between two airports if there is a direct flight between them. In this section, we will generate such a graph and apply BFS and DFS towards analyzing it.

To visualize the flight graph, we will use a Python package called [`folium`](https://python-visualization.github.io/folium/). Folium provides a way of quickly and easily drawing interactive maps. All of the maps below are drawn using `folium`; take a look at `demo.py` if you're interested in seeing the code that generated them!

In [4]:
import folium

We will use a randomly-generated map of flights. The following function loads this graph.

In [5]:
flight_graph = demo.load_flight_graph()

This is an undirected graph:

In [6]:
flight_graph

<dsc40graph.UndirectedGraph at 0x7f3c4d9ee0f0>

We can list all of the airports (nodes) in the graph:

In [7]:
flight_graph.nodes

dict_keys(['San Jose', 'La Jolla', 'Los Angeles', 'Boston', 'Phoenix', 'San Antonio', 'Houston', 'Dallas', 'New Orleans', 'Orlando', 'Miami', 'Jacksonville', 'Atlanta', 'Nashville', 'Charlotte', 'Chicago', 'Kansas City', 'Toronto', 'Philadelphia', 'New York', 'Denver', 'Portland', 'Seattle', 'Vancouver', 'Washington DC', 'Oklahoma City', 'Milwaukee', 'Cleveland', 'Louisville', 'St. Louis', 'Salt Lake City', 'Las Vegas', 'Boise', 'El Paso', 'Durant', 'Minneapolis'])

Apparently there is an airport in La Jolla:

In [8]:
'La Jolla' in flight_graph.nodes

True

It must be the Torrey Pines Gliderport...

We can list all of the edges, too:

In [9]:
flight_graph.edges

EdgeView[('San Jose', 'New Orleans'), ('San Jose', 'Durant'), ('San Jose', 'La Jolla'), ('San Jose', 'Denver'), ('La Jolla', 'Chicago'), ('Los Angeles', 'Houston'), ('Los Angeles', 'Philadelphia'), ('Los Angeles', 'El Paso'), ('Boston', 'Chicago'), ('Boston', 'Philadelphia'), ('Boston', 'Milwaukee'), ('Boston', 'Boise'), ('Phoenix', 'Las Vegas'), ('San Antonio', 'Louisville'), ('Houston', 'Milwaukee'), ('Dallas', 'Miami'), ('Dallas', 'Cleveland'), ('Orlando', 'Kansas City'), ('Orlando', 'Louisville'), ('Miami', 'Cleveland'), ('Jacksonville', 'Minneapolis'), ('Jacksonville', 'Louisville'), ('Atlanta', 'Toronto'), ('Nashville', 'Philadelphia'), ('Nashville', 'Toronto'), ('Charlotte', 'Washington DC'), ('Chicago', 'Seattle'), ('Kansas City', 'Salt Lake City'), ('Toronto', 'Seattle'), ('Denver', 'Portland'), ('Portland', 'Minneapolis'), ('Portland', 'Boise'), ('Seattle', 'Louisville'), ('Seattle', 'El Paso'), ('Oklahoma City', 'Minneapolis'), ('Milwaukee', 'Louisville'), ('St. Louis', 'Las

We can also find all of the neighbors of a node. For instance, the neighbors of La Jolla are:

In [10]:
flight_graph.neighbors('La Jolla')

{'Chicago', 'San Jose'}

This means that there are direct flights between La Jolla and both Chicago and San Jose.

We can use `folium` to plot the graph of flights:

In [11]:
demo.plot_flight_map()

We will use BFS and DFS to analyze the graph above.

### BFS

Below is the implementation of BFS from the course notes with one minor addition: we have added a `sort = True` to the `.neighbors()` method so that neighbors are returned in ascending order. It returns two things: a dictionary mapping each node to its predecessor in the search, and a dictionary mapping each node to its distance from the source node.

In [12]:
from collections import deque

def bfs_shortest_paths(graph, source):
    """Start a BFS at `source`."""
    status = {node: 'undiscovered' for node in graph.nodes}
    distance = {node: float('inf') for node in graph.nodes}
    predecessor = {node: None for node in graph.nodes}

    status[source] = 'pending'
    distance[source] = 0
    pending = deque([source]) # use as a queue

    # while there are still pending nodes
    while pending: 
        u = pending.popleft() # pop from left (front of queue)
        for v in graph.neighbors(u, sort=True):
            # explore edge (u,v)
            if status[v] == 'undiscovered':
                status[v] = 'pending'
                distance[v] = distance[u] + 1
                predecessor[v] = u
                pending.append(v) # append to right (back of queue)
        status[u] = 'visited'

    return predecessor, distance

As we saw in class, BFS finds the shortest path from the source to every other node in an undirected graph. In the context of the flight graph, a BFS started at La Jolla will find a route to every other airport containing the fewest connecting flights. These routes are encoded in the `predecessor` dictionary.

In [13]:
bfs_predecessor, _ = bfs_shortest_paths(flight_graph, 'La Jolla')
bfs_predecessor

{'San Jose': 'La Jolla',
 'La Jolla': None,
 'Los Angeles': 'Philadelphia',
 'Boston': 'Chicago',
 'Phoenix': None,
 'San Antonio': 'Louisville',
 'Houston': 'Milwaukee',
 'Dallas': None,
 'New Orleans': 'San Jose',
 'Orlando': 'Louisville',
 'Miami': None,
 'Jacksonville': 'Louisville',
 'Atlanta': 'Toronto',
 'Nashville': 'Philadelphia',
 'Charlotte': None,
 'Chicago': 'La Jolla',
 'Kansas City': 'Orlando',
 'Toronto': 'Seattle',
 'Philadelphia': 'Boston',
 'New York': None,
 'Denver': 'San Jose',
 'Portland': 'Denver',
 'Seattle': 'Chicago',
 'Vancouver': None,
 'Washington DC': None,
 'Oklahoma City': 'Minneapolis',
 'Milwaukee': 'Boston',
 'Cleveland': None,
 'Louisville': 'Seattle',
 'St. Louis': None,
 'Salt Lake City': 'Kansas City',
 'Las Vegas': None,
 'Boise': 'Boston',
 'El Paso': 'Seattle',
 'Durant': 'San Jose',
 'Minneapolis': 'Portland'}

First, note that some cities have no predecessor (Cleveland, for example). This means that the BFS did not reach these nodes. But take a city which does have a predecessor, such as Orlando. Orlando's predecessor is Louisville, Louisville's predecessor is Seattle, Seattle's predecessor is Chicago, and Chicago's predecessor is La Jolla. Therefore, the shortest path from La Jolla to Orlando is: La Jolla $\to$ Chicago $\to$ Seattle $\to$ Louisville $\to$ Orlando.

We can define a helper function to extract a path like this from a dictionary of predecessors:

In [14]:
def extract_path(predecessor, destination, path=None):
    """Extract a path to `destination` from the predecessor dictionary."""
    if path is None:
        path = []
    
    parent = predecessor[destination]
    if parent is not None:
        extract_path(predecessor, parent, path=path)
    
    path.append(destination)
    return path

In [15]:
extract_path(bfs_predecessor, 'Orlando')

['La Jolla', 'Chicago', 'Seattle', 'Louisville', 'Orlando']

### DFS

Below is the implementation of DFS, as it appears in the course notes with one minor addition: we have added a `sort = True` to the `.neighbors()` method so that neighbors are returned in ascending order. It returns two things: a dictionary mapping each node to its predecessor in the search, and a `Times` object with `start` and `finish` dictionaries containing
the start and finish times of every node reached during the search.

In [16]:
from dataclasses import dataclass

@dataclass
class Times:
    clock: int
    start: dict
    finish: dict

def full_dfs_times(graph):
    status = {node: 'undiscovered' for node in graph.nodes}
    predecessor = {node: None for node in graph.nodes}
    times = Times(clock=0, start={}, finish={})
    for u in graph.nodes:
        if status[u] == 'undiscovered':
            dfs_times(graph, u, status, predecessor, times)
    return predecessor, times

def dfs_times(graph, u, status=None, predecessor=None, times=None):
    if predecessor is None:
        predecessor = {node: None for node in graph.nodes}
        
    if status is None:
        status = {node: 'undiscovered' for node in graph.nodes}
       
    if times is None:
        times = Times(clock=0, start={}, finish={})
    
    times.clock += 1
    times.start[u] = times.clock
    status[u] = 'pending'
    for v in graph.neighbors(u, sort=True): # explore edge (u, v)
        if status[v] == 'undiscovered':
            predecessor[v] = u
            dfs_times(graph, v, status, predecessor, times)
    status[u] = 'visited'
    times.clock += 1
    times.finish[u] = times.clock
    
    return predecessor, times

For instance, the predecessors found by a DFS started at La Jolla are:

In [17]:
dfs_predecessor, _ = dfs_times(flight_graph, 'La Jolla')

In general, the DFS predecessors and the BFS predecessors are different:

In [18]:
dfs_predecessor['Louisville']

'Jacksonville'

In [19]:
bfs_predecessor['Louisville']

'Seattle'

But in both BFS and DFS (the non-full versions), a node is visited if and only if it is reachable from the source. That is, if we look at the nodes which are not visited by DFS (i.e., those nodes with a predecessor of `None` which are not the source) and the
nodes which are not visited by BFS, they will be the same:

In [20]:
# not visited by DFS (and the source node, La Jolla)
[u for u, parent in dfs_predecessor.items() if parent is None]

['La Jolla',
 'Phoenix',
 'Dallas',
 'Miami',
 'Charlotte',
 'New York',
 'Vancouver',
 'Washington DC',
 'Cleveland',
 'St. Louis',
 'Las Vegas']

In [21]:
# not visited by BFS (and the source node, La Jolla)
[u for u, parent in bfs_predecessor.items() if parent is None]

['La Jolla',
 'Phoenix',
 'Dallas',
 'Miami',
 'Charlotte',
 'New York',
 'Vancouver',
 'Washington DC',
 'Cleveland',
 'St. Louis',
 'Las Vegas']

### Shortest Paths

We can use BFS to find a shortest path between the source and every other node in the graph. Suppose, for instance, that we wish to fly from San Diego to Seattle. BFS gives us a route with the fewest flights:

In [22]:
bfs_to_seattle = extract_path(bfs_predecessor, 'Seattle')
bfs_to_seattle

['La Jolla', 'Chicago', 'Seattle']

In [23]:
demo.plot_route(bfs_to_seattle)

Depth-first search does *not* find the shortest path in general. In this case, it's not even close:

In [24]:
dfs_to_seattle = extract_path(dfs_predecessor, 'Seattle')
dfs_to_seattle

['La Jolla',
 'Chicago',
 'Boston',
 'Boise',
 'Portland',
 'Minneapolis',
 'Jacksonville',
 'Louisville',
 'Milwaukee',
 'Houston',
 'Los Angeles',
 'El Paso',
 'Seattle']

In [25]:
demo.plot_route(dfs_to_seattle)

We instead use DFS in other applications, such as when we wish to find a topological sort of a directed graph's nodes.

### Create your Own Routes

In [26]:
origin = widgets.Dropdown(
    options=sorted(flight_graph.nodes),
    value='La Jolla',
    description='First:',
)
destination = widgets.Dropdown(
    options=sorted(flight_graph.nodes),
    value='Toronto',
    description='Second:',
)
traversalMethods = widgets.Dropdown(
    options=["BFS", "DFS"],
    value='DFS',
    description='Traversal:',
)

display(origin)
display(destination)
display(traversalMethods)

Dropdown(description='First:', index=13, options=('Atlanta', 'Boise', 'Boston', 'Charlotte', 'Chicago', 'Cleve…

Dropdown(description='Second:', index=33, options=('Atlanta', 'Boise', 'Boston', 'Charlotte', 'Chicago', 'Clev…

Dropdown(description='Traversal:', index=1, options=('BFS', 'DFS'), value='DFS')

Place in a pair of cities and pick your traversal algorithm.

Then run the cell below to see the route that is created

In [27]:
if traversalMethods.value == "DFS":
    predecessor, times = dfs_times(flight_graph, origin.value)
else:
    predecessor, times = bfs_shortest_paths(flight_graph, origin.value)
route = extract_path(predecessor, destination.value)

print(route)
demo.plot_route(route)

['La Jolla', 'Chicago', 'Boston', 'Boise', 'Portland', 'Minneapolis', 'Jacksonville', 'Louisville', 'Milwaukee', 'Houston', 'Los Angeles', 'El Paso', 'Seattle', 'Toronto']


## Practice Problems

## Problem 1: Undirected Graph

![Undirected Graph](./undirected.png)

Find a shortest path between nodes 3 and 8. Use the convention that neighbors are produced in ascending order by label.

In [28]:
graph = UndirectedGraph()
edges = [
    (4,7), (7,8), (4,8), (4,5), (5,9), (6,9), (5,6), (1,5), (1,2), (2,5), (2,3), (3,6)
]
for edge in edges:
    graph.add_edge(*edge)
    
predecessor, distance = bfs_shortest_paths(graph, 3)
correct_answer = extract_path(predecessor, 8)

In [29]:
# place your answer here:
your_answer = [3, ..., 8]

In [30]:
# run this to check your answer:
if correct_answer == your_answer:
    print('Correct!')
else:
    print('Sorry, incorrect...')

Sorry, incorrect...


If you are still looking to get practice with BFS, try creating your own graph above, using the code provided to check your answer.

## Problem 2: Start and Finish Times

![Example Graph 2](https://upload.wikimedia.org/wikipedia/commons/5/51/Directed_graph.svg)

Find the start and finish times resulting from a DFS on the graph above, starting at node 1.

In [31]:
graph = DirectedGraph()
graph.add_edge(1, 2)
graph.add_edge(1, 3)
graph.add_edge(3, 2)
graph.add_edge(4, 3)
graph.add_edge(3, 4)

In [32]:
predecessor, times = dfs_times(graph, 1)
# uncomment to check your answer
# print('start:', times.start)
# print('finish:', times.finish)

## Problem 3: Number of Executions

Consider again the directed graph above:

Suppose we modify BFS by placing a `print`-statement within the `for`-loop, as follows:

In [33]:
def noisy_bfs(graph, source):
    """Start a BFS at `source`."""
    status = {node: 'undiscovered' for node in graph.nodes}

    status[source] = 'pending'
    pending = deque([source]) # use as a queue

    # while there are still pending nodes
    while pending: 
        u = pending.popleft() # pop from left (front of queue)
        for v in graph.neighbors(u, sort=True):
            # explore edge (u,v)
            print(u, v)
            if status[v] == 'undiscovered':
                status[v] = 'pending'
                pending.append(v) # append to right (back of queue)
        status[u] = 'visited'

How many lines are printed when we run `noisy_bfs` on this graph, starting at node 3? What are they?

In [34]:
# uncomment to check your answer
# noisy_bfs(graph, 3)

## Problem 4: Modified BFS

Modify BFS or DFS so that it counts the number of nodes *reachable* from the source node in the `flight_graph`.
To check your code, there are 26 airports reachable from La Jolla.

### Further Reading

DFS and BFS algorithms work on unweighted graphs. In the example of the flight chart, BFS returns the route with the lowest amount of edges (flight changes). This ignores the actual distance between the 2 points (example: BFS from La Jolla to Toronto).

If you are wondering how to find the path with the shortest distance traveled, you should look towards some weighted graph algorithms. Here are some suggestions on further readings if you are interested:

    - Djikstra: https://medium.com/basecs/finding-the-shortest-path-with-a-little-help-from-dijkstra-613149fbdc8e
    - A*: https://medium.com/@nicholas.w.swift/easy-a-star-pathfinding-7e6689c7f7b2