In [None]:
import graphAlgorithms as ga

In this example file, we show how to use different community detection algorithms.  

# Load & Preprocess Network

First step of the pipeline consists in loading the chosen data set.
You can store your networks in any common format, however the xx package requires that the networks are provided as NetworkX Graph objects (refer to its documentation for detailed instructions). Moreover, the networks should be weighted: if you have an unweighted network, then assign all edges the same edge weight. The package assumes "weight" to be the default edge weight label, but this can be set when needed.

An example on how to pre-process a network, stored as an edgelist, is provided below. Different loading and storing examples are provided in the "import and export of networks" jupyter notebook. 

In [None]:
#location where the raw data files are stored, it is set to run from the installation folder
#- if applicable please change or CHANGE to the location of your networks

graph_location = "../networks/edgelists/"

#location where output should be saved
#Please set location
location = ""

In [None]:
import glob
import pandas as pd
import networkx as nx
import numpy as np
from collections import Counter
import matplotlib.pyplot as plt

In [None]:

labels = []
networks_graphs = []
cnt = 0
print("load networks")
#gets all files located in the specified folder that end on .edgelist
#CHANGE the ending if your files end differently
for path in glob.glob(graph_location +"*.edgelist"):
    if cnt == 5:
        #you can specify that only part of the file name should be used as network name for later identification
        name =  path.split("/")[-1].replace(".rds.edgelist", "")


        #read the edgelist file as a dataframe
        fh = pd.read_csv(path, sep="\t")
        #convert it into a NetworkX graph G and specify the column names of the node pairs
        G=nx.from_pandas_edgelist(fh, "V1", "V2")

        #if you have an unweighted network assign all edges the same edge weight - here a value of 1 is assigned
        for u, v, d in G.edges(data=True):
            d['weight'] = 1


        #save the graph objects to a list (only suitable if small networks are processed)
        #this is the main objects used for the examples below, which contains all networks
        networks_graphs.append(G)
        labels.append(name)




        print("loaded", name)
    cnt = cnt + 1

Get the union of all nodes 

In [None]:
nodes = []
for net in networks_graphs:
    for node in net.nodes():
        if node not in nodes:
            nodes.append(node)

The community detection algorithms require you to provide NetworkX graph objects directly

## Some unweighted community detection algorithms

All communiy detection algorithms return a dictionary data format, where keys are node IDs and values are list of communities that node belongs to. In case of overlapping communities this may be more than one community. Three different algorithms are reported hereafter. 

In [None]:
asy = ga.communities.async_fluid(networks_graphs[0], k=10, return_object=False)

In [None]:
label = ga.communities.label_propagation(networks_graphs[0], return_object=False)

In [None]:
walktrap = ga.communities.walktrap(networks_graphs[0], return_object=False)

## Some weighted community detection algorithms

Similarly, three differen community detection algorithms are shown for weighted networks. 

In [None]:
mod =  ga.communities.greedy_modularity(networks_graphs[0], weights="weight", return_object=False)

In [None]:
leiden = ga.communities.leiden(networks_graphs[0], weights="weight", return_object=False)

In [None]:
louvain = ga.communities.louvain(networks_graphs[0], weights="weight", return_object=False)

## Ensembl community detection

The ensembl algorithm is based on an adapted implementation developed by Tandon et al. (https://arxiv.org/pdf/1902.04014.pdf)
Here, we want to create an ensembl out of labelpropagation, leiden and the louvain algorithm.

Reference: Tandon Aditya, Albeshri Aiiad, Thayananthan Vijey, Alhalabi Wadee, Fortunato Santo; "Fast consensus clustering in complex networks"; Phys. Rev. E, 99 (2019), Article 042301

In [None]:
community_list=[label, leiden, louvain]

Only algorithms that are not requireing connected graphs can be provided as selected algorithms

In [None]:
algorithms = [ga.communities.label_propagation,ga.communities.leiden, ga.communities.louvain ]
parameters = [{"return_object": False}, {"return_object": False, "weights":"weight"},{"weights":"weight","return_object": False}]

In [None]:
consensuscommunities, cons1, consgraph  = ga.communities.fast_consensus(networks_graphs[0], communities = community_list, algorithms=algorithms, parameters=parameters, thresh=0.5, max_iter=10)

## Overlapping Communities

Example of an overlapping community detection algorithm. This means that a node can be assigned to more than one community.

In [None]:
angel = ga.communities.angel(networks_graphs[0], return_object=False)

## Some community evaluation metrics

Once the communities have been detected, we can study them through some evaluation metrics and evaluate which algorithm(s) give the "best" paritioning (given the evaluation metrics & your requirements) of your networks.

In [None]:
community_list=[asy, label,  walktrap,  mod, leiden, louvain, consensuscommunities, angel]
algo_labels=["async_fluids","labelpropagation",  "walktrap",  "modularity", "leiden", "louvain", "consensuscommunities", "angel"]

The firt simple consideration refers to the number of communities detected by each algorithm.

In [None]:
for i in range(len(community_list)):
    print("algo ", algo_labels[i], "detects ", ga.communities.get_number_of_communities(community_list[i]), "communities")

After, we can derive the community size distribution as follows. 

In [None]:
for i in range(len(community_list)):
    print("algorithm ", algo_labels[i], "has a community size distribution of:")
    print(ga.communities.get_number_of_nodes_community(community_list[i], in_detail=False))

The average internal degree for the individual communities are returned in the sc object.

In [None]:
for i in range(len(community_list)):
    print("algo ", algo_labels[i], "has a community internal degree distribution of:")
    sc, dist = ga.communities.average_internal_degree(community_list[i], networks_graphs[0])
    print(dist)

The internal edge density for the individual communities are returned in the sc object.

In [None]:
for i in range(len(community_list)):
    print("algo ", algo_labels[i], "has a community internal edge density distribution of:")
    sc, dist = ga.communities.internal_edge_density(community_list[i], networks_graphs[0])
    print(dist)

The partitions modularity for the individual communities are returned in the sc object.

In [None]:
for i in range(len(community_list)):
    print("algorithm ", algo_labels[i], "has a modularity of ", ga.communities.community_modularity(community_list[i], networks_graphs[0]))

The fraction of weak members for the individual communities are returned in the sc object.

In [None]:
for i in range(len(community_list)):
    print("algo ", algo_labels[i], "has a community fraction of weak members distribution of:")
    sc, dist = ga.communities.fraction_of_weak_members(community_list[i], networks_graphs[0])
    print(dist)

The cut ratio distibution for the individual communities are returned in the sc object.

In [None]:
for i in range(len(community_list)):
    print("algo ", algo_labels[i], "has a community cut ratio distribution distribution of:")
    sc, dist = ga.communities.cut_ratio(community_list[i], networks_graphs[0])
    print(dist)

The community density with respect to the all graph are returned in the sc object.

In [None]:
for i in range(len(community_list)):
    print("algo ", algo_labels[i], " has a community density w.r.t. the graph density distribution of:")
    sc, dist = ga.communities.community_density_to_graph(community_list[i], networks_graphs[0])
    print(dist)

The hub dominance for the individual communities with respect to the all graph are returned in the sc object.

In [None]:
for i in range(len(community_list)):
    print("algo ", algo_labels[i], " has a community hub dominance distribution of:")
    sc, dist = ga.communities.hub_dominace(community_list[i], networks_graphs[0])
    print(dist)

More evaluation metrics are implemented, which you find more information about in the documentation but are not shown in this example file.

You can for example select the community detection algorithm that performs on average the best on a multitude of selected evaluation metrics or select metrics that are focused on the commmunity types that are needed for your experiments.