# Parallel processing demo

This notebook contains a brief demo of running experiments in parallel, and verifies that the results agree with running in series


In [1]:
%load_ext autoreload
%autoreload 2

import igraph
from moo.data_generation import ExpConfig, DataGenerator
from moo.data_generation import ExpConfig, DataGenerator
from moo.contestant import get_best_community_solutions, draw_best_community_solutions
from moo.communities import run_parallel_communities
import moo.contestant as contestant
import matplotlib.pyplot as plt

from joblib import Parallel, delayed


import pandas as pd

import time


We first set up the experiment (i.e. the configuration of the set of graphs), and the algorithms we want to run on each graph.   

Note that comDetMultiLevel doesn't appear to be deterministic - haven't investigted this further, as it's using the igraph method as a black box)

In [2]:
expconfig = ExpConfig(
    L=100, U=500,
    NumEdges=1000, ML=0.4, MU=0.4,
    BC=0.1, NumGraphs=30,
    shuffle=True, 
    seed=1234  
    )


algos = [
    contestant.ComDetMultiLevel(), # Multi-Level approach
    contestant.ComDetEdgeBetweenness(), # EdgeBetweenness approach
    contestant.ComDetWalkTrap(), # WalkTrap approach
    contestant.ComDetFastGreedy(), # FastGreedy approach
]



Then create the data generator iterators (we make two; one for the parallel run, and one for the series run to verify the answers are the same):



In [3]:
expgen = DataGenerator(expconfig=expconfig) # Pass defined parameters

datagenSeries = expgen.generate_data() 
datagenParallel = expgen.generate_data() 


We can then run the jobs in parallel:


In [4]:
start = time.time()
parallelResults = run_parallel_communities(datagenParallel, algos, n_jobs = 7)

parallelTime = time.time()-start
print("Parallel time taken", parallelTime)


Parallel time taken 52.734846115112305


And run the same thing in series:

In [6]:

start = time.time()

results = [] # Holds results of contestants
for g_idx, graph in enumerate(datagenSeries):
#    print(f'Processing Graph {g_idx+1}')
    for algo in algos:
#        print(f'\tUsing algoithm {algo.name_}')
        result = algo.detect_communities(graph=graph).get_results()
        # Result is a list of dictionaries, each dictionary stores the metrics of one iteration (see code for details)
        for r in result: # Appending graph index to results, for debugging purposes
            r['graph_idx'] = g_idx + 1
        results.extend(result)

seriesTime = time.time()-start
print("Series time taken", seriesTime)



Series time taken 160.54513263702393


In [None]:
print("Speedup:", seriesTime/parallelTime)

## Compare results

In this section of the notebook we verify that both approaches give identical results
