Reproducibility of the results from ''A Simple Embedding for Classifying Networks with a few Graphlets''
run stats.m
run run_pca.m
To change the normalisation, modify global variable type_norm
(line 43)
run run_lda.m
To change the normalisation, modify global variable type_norm
(line 43)
run run_pca.m
To change the kind of k-node graphlets to be used, modify motif3/4/5
(lines 16-18)
run run_graph2vec.m
To change the size of the embeddings, modify the global variable t_emb
(line 21)
Precomputed embedding sizes are 2, 4, 8, 16, 32, 64, 128, 256, 512.
To change the depth of the WL kernel, modify the global variable t_wl
(line 21)
Precomputed WL depths are 1, 2, 3
Go in the GCNs folder and create the dataset : python3 create_graphtest.py
Train and run the neural network :python3 run_GCN.py -pooling max/mean/sum -hidd 4/8/12 -nbLay 1/2/3
You may also change the variable ``nbItes`` (line 15), so that ``nbItes = 10``
Obtain scores : python3 analysis.py GCN_poolmax/mean/sum_nbLay1/2/3_hidd4/8/12
Train and run the neural network : python3 run_RGCN.py -pooling max/mean/sum -hidd 4/8/12 -nbLay 1/2/3
You may also change the variable ``nbItes`` (line 15), so that ``nbItes = 10``
Obtain scores : python3 analysis.py GCN_poolmax/mean/sum_nbLay1/2/3_hidd4/8/12
run run_pca.m
run run_gl2vec.m
run run_graph2vec.m
Train and run the neural network : python3 run_GCN.py
Change the variable ``nbItes = 50`` (line 15)
Obtain scores : python3 analysis.py GCN_poolmean_nbLay2_hidd12
Train and run the neural network : python3 run_RGCN.py
Change the variable ``nbItes = 50`` (line 15)
Obtain scores : python3 analysis.py RGCN_poolmean_nbLay2_hidd8
run run_pca_wiki.m
run run_Wiki_figures.m
run run_GammaAnalysis.m
run run_RFAnalysis.m
run run_feature_select.m
run run_RFAnalysis.m
run afty_threshold.m
run afty_knn.m
Matlab files, scripts and functions used by the main run_*
scripts
Networks used in the tests, on the following format :
# one or several lines
# that give indication about
# the network
!n:number_of_nodes
!m:number_of_egdes
v_src1 v_tgt1
v_src2 v_tgt2
...
Wikipedia Networks, on the following format :
# Article : Name_of_the_Wikipedia_Article
# FROM date_of_the_first_version_of_interest TO date_of_the_last_version_of_interest
number_of_nodes number_of_edges
v_src1 v_tgt1
v_src2 v_tgt2
...
Each mat file contains a struct Pbm that contains information about the network:
-> Pbm.entete : textual information (website, preprocessing, etc.)
-> Pbm.nb_nodes/nb_edges : number of nodes/edges (a bidirected edges counts for two edges)
-> Pbm.motif3 : a matrix 13x2. Pbm.motif3(k,1) : id of 3-node kth motif
Pbm.motif3(k,2) : occurrence number of motif k in the network
-> Pbm.motif4 : same for 4-node motif
-> Pbm.motif5 : same for 5-node motif (does not exist for all networks)
-> Pbm.edges : a matrix Pbm.nb_edges x 2 where (Pbm.edges(i,1), Pbm.edges(i,2)) = (v_srci,v_tgti)
Outputs of the java code from https://github.com/kuntu/JGraphlet-JMotif for our benchmarks.
The SRPs of each networks in Matlab files (generated using convert2Mat.m
)
A Python code using NetworkX and karate-club Benchmarks to generate the embeddings using graph2vec, for deep of WL-kernel from 1 to 4 and embedding size from 2 to 512.
Outputs of the Python code
The corresponding Matlab files (generated using convert_csv2mat.m
)
A Python Code to obtain the average Gini importance of each graphlets by training a forest of 100 trees (using scikit-learn).
Matlab files to run the unsupervised clustering algorithm used in Section V. (See https://pdfs.semanticscholar.org/6235/cf4b551f768fa793ed759de75f2a01475e77.pdf for the algorithm description).