Description of the project, datasets and experimental results can be found at the following website: http://fragkiskos.me/projects/communities_sensitivity/
- Linux (and other Unix-like systems)
- Python 2.7.9 (Anaconda 2.2.0)
- python-igraph 0.7.1.post4 (containing all community detection algorithms except Spectral clustering and Metis)
- scipy 0.15.1
- networkx 1.9.1
- conda 3.10.0
- metis 0.1a0
- scikit-learn 0.15.2
- matplotlib 1.4.3
- numpy 1.9.2
-
NetworksFromSNAP.py
: script containing all functions needed to read the datasets (i.e., graphs), put them in iGraph/NetworkX or Metis format, extract the greatest connected component of each graph (GCC) and perturb the GCC using one of the perturbation strategies. It also contains the function METIS_Clustering that allows to use Metis algorithm to detect communities. -
Add_Noise.py
: sciprt containing the functions used for graph perturbation: function configuration_model_perturbation (which is also the null model normally used in the definition of the modularity), function uniform_perturbation (Noise based on uniform perturbation (Erdos-Renyi G(n,e/n) model)), function preferential_perturbation (Noise based on preferential perturbation to high degree nodes (Chung-Lu model)). -
NCP_plot_for_paper.py
: script allowing to produce the figure "NCP plot of a real-world network (CA-HEP-TH) using Infomap algorithm." -
Plot_and_save_NCP_charts.py
: function allowing to produce the structural sensitivity plots for the community detection algorithms. It also produces plots of the distribution (CDF) of clusters' sizes as function of noise. -
normalized_laplacian_spectrum.py
: function that return the eigenvalues of the Laplacian of the graph. -
NCP_plot.py
: function computing the NCP plot of a graph. The NCP plot measures the quality of the best possible community in a large network, as a function of the community size. See Lekovec et al. 2008 : "Community Structure in Large Networks Natural Cluster Sizes and the Absence of Large Well-Defined Clusters". -
variation_of_information_score.py
: function used to compute the VI score for Spectral and Metis algorithm (their output format, i.e. the community structure, differs from the ones of algorithms using Igraph format graphs). -
community_quality_NetworkX.py
: script containing functions modularity and conductance scores. -
community_utils_NetworkX.py
: utils used by community_quality_NetworkX.py. -
LAUNCHER_ALL_ALGOS_V3.py
: produces the results using theCA-HEP-TH
network as example
How to set the paprameters if the LAUNCHER_ALL_ALGOS_V3.py
script:
-
Which community detection algorithm? The choice is done by setting the boolean variables to
True
orFalse
. Lines 30, 32, 34, 36, 39, 41, 43, 45 for Blondel Multilevel, FastGreedyMM, Infomap, LeadingEigenvectors, Walktrap, LabelPropag, Spectral clustering and Metis algos. -
Line 50 : it is important to set the right number of active algorithms in
variable nb_active_algos
(i.e., number of previous boolean variables set toTrue
). -
Line 54 : chose the network by setting the variable network_name with the appropriate index number from the list
network_name
('Email-Enron'=network_name[0] , 'CA-GrQc'=network_name[1] , 'CA-AstroPhys'=network_name[2] , 'Wiki-Vote'=network_name[3] , 'AS-Caida'=network_name[4], 'CA-HepTh'=network_name[5], 'P2P-GNutella'=network_name[6]). -
Line 62 : chose the perturbation model (ERP=noise_model[0], CLP=noise_model[1] or CONFIG=noise_model[2]).
-
Line 64 : chose the perturabation type (
Add
orDelete
edges. Note that for CONFIG model this parameter is not important). -
Line 66 to line 75 : noise levels. The user has only to set the intensity of the perturbation at lines 69 (for
Delete
type) and 72 (forAdd
type). For example, for the CA-HEP-TH network, we have set the following parameters for the different perturbation models:- ERP+Add =>
epsilon_values = [i * 0.285 for i in epsilon_valuesI]
(line 72) - ERP+Delete =>
epsilon_values = [i * 475 for i in epsilon_valuesI]
(line 69) - CLP+Add =>
epsilon_values = [i * 0.285 for i in epsilon_valuesI]
(line 72) - CLP+Delete =>
epsilon_values = [i * 91.6 for i in epsilon_valuesI]
(line 69) - Configuration model => nothing to do (line 75)
- ERP+Add =>
-
Set directory paths in lines 79 to 87 (paths and directories from where we read the input datasets (graphs in .txt format), do conversions in format readable by iGraph/Metis and write results):
-
Line 79 :
source_path = ***put your own path***
-
Line 81 :
convert_path = ***put your own path*** + network_name + '_iGraphFormat.txt'
-
Line 82 :
convert_path_metis = ***put your own path*** + network_name + '_MetisFormat.txt'
-
Line 84 :
convert_path_P = ***put your own path*** + network_name + '_iGraphFormat_Perturbed.txt'
-
Line 85 :
convert_path_metis_P = ***put your own path*** + network_name + '_MetisFormat_Perturbed.txt'
-
Line 87 :
path_results = ***put your own path***
-
Line 95 to line 113 : we declare variables that will store results.
-
Line 104 : boolean variable
ifSave
that allows to save plots on the disk (always set toTrue
) -
Community detection
-
Line 123 to 272 : do clustering of original (unperturbed) graphs and compute metrics.
-
Line 290 to 805 : do clustering of perturbed graphs and compute metrics.
-
Plot results
-
Line 814 to 968 : produces functional sensitivity plots for several community detection algorithms (NMI, VI, ARI, Modularity, and Number of clusters versus noise levels).
-
Line 979 to 1023 : produces structural sensitivity plots for the community detection algorithms (as well as CDF of clusters' sizes as function of noise).