# CS446/519 Class Session 5 - Exercise
## Exploring the running time for testing if there is an edge between a pair of vertices

In this exercise, we'll compare the asymptotic computational running time for testing if there is an edge between a pair of vertices, averaged over all the vertices in the graph. We'll do it for an undirected graph (Barabasi-Albert model) with an average vertex degree that will be fixed for a given graph but that we will vary. The  number of vertices of the graph will be fixed at 1000.

We'll need the "bintrees" package in order to get an implementation of a binary search tree (AVLTree is the class that we will use).

In [1]:
import numpy as np
import igraph
import timeit
import itertools
import bintrees 

The next four functions show how to test whether a given pair of vertices has an edge between them:

In [2]:
def find_matrix(gmat, i, j):
    return (gmat[i,j] == 1)

In [3]:
def find_adj_list(adj_list, i, j):
    return j in adj_list[i]

In [4]:
def find_edge_list(edge_list, i, j):
    inds1 = np.where(edge_list[:,0] == i)[0]
    elems1 = edge_list[inds1, 1].tolist()
    inds2 = np.where(edge_list[:,1] == i)[0]
    elems2 = edge_list[inds2, 0].tolist()
    return j in (elems1 + elems2)

In [5]:
def find_bst_forest(bst_forest, i, j):
    return bst_forest[i].__contains__(j)

This function takes a graph in adjacency list format and makes an "adjacency forest" data structure.

In [8]:
def get_bst_forest(theadjlist):
    g_adj_list = theadjlist
    n = len(g_adj_list)
    theforest = []
    for i in range(0,n):        
        itree = bintrees.AVLTree()
        for j in g_adj_list[i]:
            itree.insert(j,1)
        theforest.append(itree)
    return theforest

Here is the code to run the simulation (generate the graph and obtain timing statistics). To keep the code running time reasonable, I decided to only compare the running times for the "adjacency list" and "adjacency forest" (aka "adjacency trees") graph data structures.  The parameter "n" is the number of vertices (fixed at 1000) and the parameter "k" is the average vertex degree (which we will vary in this exercise). For speed, I have turned off replication (by setting nrep=1 and nsubrep=1), but you can try it with larger values of nrep to see if the results hold up (I expect they will):

In [9]:
def do_sim(n, k):

    retlist = []
    
    nrep = 1
    nsubrep = 1
    
    for _ in itertools.repeat(None, nrep):
      
        # make the random undirected graph
        g = igraph.Graph.Barabasi(n, k)
        
        # get the graph in three different representations
        g_adj_list = g.get_adjlist()
        
        g_bst_forest = get_bst_forest(g_adj_list)
        
        start_time = timeit.default_timer()
        
        # inner loop only needs to go from i+1 to n, since the graph is undirected        
        for _ in itertools.repeat(None, nsubrep):
            for i in range(0, n):
                for j in range(i+1, n):
                    find_adj_list(g_adj_list, i, j)     
        
        adjlist_elapsed = timeit.default_timer() - start_time
            
        start_time = timeit.default_timer()
        
        # inner loop only needs to go from i+1 to n, since the graph is undirected
        for _ in itertools.repeat(None, nsubrep):
            for i in range(0, n):
                for j in range(i+1, n):
                    j in g_bst_forest[i]
                    
        forest_elapsed = timeit.default_timer() - start_time
        
        retlist.append([adjlist_elapsed, forest_elapsed])

        # get the results in microseconds, and make sure to divide by number of vertex pairs
    return 1000000*np.mean(np.array(retlist), axis=0)/(n*(n-1)/2)

Compare the results for differing average degree (i.e., k) values.  At k=50, the "adjacency forest" method (aka "adjacency tree" method) is a bit faster than the adjacency list method. By k=100, the "adjacency forest" method is substantially faster than the "adjacency list" method.

In [10]:
do_sim(1000,5)

array([ 0.3204156 ,  1.16994051])

In [11]:
do_sim(1000,10)

array([ 0.51273477,  1.26613626])

In [12]:
do_sim(1000,20)

array([ 0.873007  ,  1.38149768])

In [13]:
do_sim(1000,50)

array([ 1.83869825,  1.50265984])

In [14]:
do_sim(1000,100)

array([ 3.19050297,  1.56077285])