Skip to content

puzzlef/slpa-communities

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Single-threaded CPU-based Speaker-listener Label Propagation Algorithm (SLPA) for community detection.

I was trying out the Speaker-listener Label Propagation Algorithm (SLPA) which i had implemented yesterday (results out today). This is one of the algorithms given in literature review of LabelRank. Here each node has a fixed memory where it remembers all the popular labels that it listened. The size of this memory is fixed to the total number of iterations to be performed + 1. The algorithm is as follows.

  1. Each vertex is initialized such that it remembers itself as popular.
  2. Each neighbor speaks one of the random labels in its memory.
  3. The vertex (listener) adds the most popular label to its memory.
  4. Repeat from 2 until a fixed number of iterations is performed (labels - 1).
  5. I allow early convergence if at least n% of vertices remember their previous label.
  6. For each vertex, i pick the most popular label in its memory as its community.

Unlike RAK (also called LPA) and COPRA, this is a randomized historical-label based technique. A random number generator (RNG) is used by each neighbor of a vertex to pick a random label to speak from its memory. I originally used C++'s default random engine (which is probably mersenne twister) which seems slow. So i updated it to use a xorshift32 random engine. The listener vertex listens to all its neigbors, and based on edge weight, picks the most popular label. In strict mode, the listener pick the first most popular label as to record in its memory, and in non-strict mode, the listener pick randomly one of the most popular labels to record. It records this in the next index in its memory. The authors of this paper suggest to perform a fixed number of iterations (which is one less than the space we have reserved for storing the labels in the memory of each vertex). However, if the latest popular label (for a vertex) is same as the previous one in its memory, i consider it as a sign of convergence. So, when n% of vertices show this sign of convergence (tolerance), i perform an early exit. By default, a tolerance of 0.05 is used. As we want to find disjoint communities, i use the most popular label in the memory of each vertex as its final community.



The experiment is run with labels (memory space of each vertex) ranging from 4 to 64.



It appears that increasing the number of labels increases the time required for completion (as expected), and also increases the final modularity. However, on graphs soc-Slashdot* modularity seems to decrease with increasing labels.



It is also observed that the strict variant is slightly faster than the non-strict one, but achieves similar modularity. On road networks it seems strict approach obtains lower modularity, and on other types of graphs, strict approach achieves higher modularity.



However, its seems that this method of community detection does not provide a good return on investment (it takes longer but does not achieve great modularity). I do not perform any post-processing of communities as of now (such as splitting disconnected communities).

All outputs are saved in a gist and a small part of the output is listed here. Some charts are also included below, generated from sheets. The input data used for this experiment is available from the SuiteSparse Matrix Collection. This experiment was done with guidance from Prof. Kishore Kothapalli and Prof. Dip Sankar Banerjee.


$ g++ -std=c++17 -O3 main.cxx
$ ./a.out ~/data/web-Stanford.mtx
$ ./a.out ~/data/web-BerkStan.mtx
$ ...

# Loading graph /home/subhajit/data/web-Stanford.mtx ...
# order: 281903 size: 2312497 [directed] {}
# order: 281903 size: 3985272 [directed] {} (symmetricize)
# [-0.000497 modularity] noop
# [00337.151 ms; 0004 iters.; 0.474440932 modularity] slpaSeqStatic       {labels=04, tolerance=5e-02}
# [00304.351 ms; 0004 iters.; 0.479095578 modularity] slpaSeqStaticStrict {labels=04, tolerance=5e-02}
# [00901.554 ms; 0008 iters.; 0.752845287 modularity] slpaSeqStatic       {labels=08, tolerance=5e-02}
# [00866.540 ms; 0008 iters.; 0.679729104 modularity] slpaSeqStaticStrict {labels=08, tolerance=5e-02}
# [02229.548 ms; 0016 iters.; 0.826291323 modularity] slpaSeqStatic       {labels=16, tolerance=5e-02}
# [02193.593 ms; 0016 iters.; 0.789462328 modularity] slpaSeqStaticStrict {labels=16, tolerance=5e-02}
# [03706.551 ms; 0032 iters.; 0.847930729 modularity] slpaSeqStatic       {labels=32, tolerance=5e-02}
# [03655.065 ms; 0032 iters.; 0.817599833 modularity] slpaSeqStaticStrict {labels=32, tolerance=5e-02}
# [07928.522 ms; 0064 iters.; 0.857062638 modularity] slpaSeqStatic       {labels=64, tolerance=5e-02}
# [07843.098 ms; 0064 iters.; 0.826945186 modularity] slpaSeqStaticStrict {labels=64, tolerance=5e-02}
#
# ...


References




ORG DOI

About

Single-threaded CPU-based Speaker-listener Label Propagation Algorithm (SLPA) for community detection.

Resources

License

Stars

Watchers

Forks

Languages