# Traversal of graphs

A lot of graph algorithms involve traversing the graph structure. This can be done recursively, but unlike the case with trees, there is not necessarily a natural basis case. Trees have leaves but graphs do not necessarily have that. Graphs can have cyclic structure, so a simple recursion will not work. Instead, we need to keep track of which nodes we have already seen during the traversal, so we only process those we haven't yet processed. A depth first traversal, where we keep track of the nodes we have seen in a list, could look like this:

In [1]:
def depth_first_traversal(graph, v, f, seen = None):
    if seen is None: seen = []
    seen.append(v)
    for w in graph[v]:
        if w not in seen:
            depth_first_traversal(graph, w, f, seen)
    f(v)

Here, the `seen = None` is a way to provide a default value to `seen`. If `seen` is `None`, we create an empty list. Don't try to initialise `seen` with the empty list as a default parameter, `seen = []`. This would make the default value *the same* for all calls where the `seen` parameter is not provided--this would give us a full set when we expected an empty one.

In [2]:
def make_list_graph(n_vertices):
    return [[] for _ in range(n_vertices)]
    
def add_list_edge(graph, source, target):
    if target not in graph[source]:
        graph[source].append(target)

In [3]:
g = make_list_graph(6)

add_list_edge(g, 0, 1)
add_list_edge(g, 0, 5)
add_list_edge(g, 1, 2)
add_list_edge(g, 1, 3)
add_list_edge(g, 2, 4)
add_list_edge(g, 3, 5)
add_list_edge(g, 5, 1)

print(g)

[[1, 5], [2, 3], [4], [5], [], [1]]


In [4]:
depth_first_traversal(g, 0, print)

4
2
5
3
1
0


Testing for membership in a list, however, is a linear time operation, so we want to avoid this. One approach would be to use Python's built-in `set` data structure instead.

In [5]:
def depth_first_traversal_set(graph, v, f, seen = None):
    if seen is None: seen = set()
    seen.add(v)
    for w in graph[v]:
        if w not in seen:
            depth_first_traversal_set(graph, w, f, seen)
    f(v)

We can also be more explicit with the set we represent by using a boolean/bit vector for the `seen` data structure. Then, we can set indices to `True` or `False` depending on whether we have seen them.

In [6]:
def depth_first_traversal_bv(graph, v, f, seen = None):
    if seen is None:
        seen = [False] * len(graph)
        
    seen[v] = True
    for w in graph[v]:
        if not seen[w]:
            depth_first_traversal_bv(graph, w, f, seen)
    f(v)

Except for the different data structure, the flow of the traversal is the same as the previous two implementations.

In [7]:
print("bit vector")
depth_first_traversal_bv(g, 0, print)
print("\nset")
depth_first_traversal_set(g, 0, print)

bit vector
4
2
5
3
1
0

set
4
2
5
3
1
0


For the small graph we have traversed so far, there isn't much difference between the different implementations. It doesn't matter if the search take linear time if we have six nodes at most.

In [8]:
def do_nothing(v):
    pass

%timeit depth_first_traversal(g, 0, do_nothing)
%timeit depth_first_traversal_bv(g, 0, do_nothing)
%timeit depth_first_traversal_set(g, 0, do_nothing)

2.35 µs ± 24.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
2.02 µs ± 28.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
2.18 µs ± 38.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


For a larger graph, however, we see that the linear time membership test makes the list implementation substantially slower than the other two. The bit-vector implementation is somewhat faster than the set implementation.

In [11]:
import numpy as np

def make_random_graph(n, k):
    graph = make_list_graph(n)
    for i in range(n):
        for j in np.random.choice(n, size = k):
            add_list_edge(graph, i, j)
    return graph        

rg = make_random_graph(1000, 50)

%timeit depth_first_traversal(rg, 0, do_nothing)
%timeit depth_first_traversal_set(rg, 0, do_nothing)
%timeit depth_first_traversal_bv(rg, 0, do_nothing)

419 ms ± 19.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.71 ms ± 78.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.22 ms ± 39.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## Exercises
