Skip to content

Comparing static vs dynamic approaches of the Louvain algorithm for community detection.

License

Notifications You must be signed in to change notification settings

puzzlef/louvain-communities-dynamic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Comparing static vs dynamic approaches of the Louvain algorithm for community detection.

Louvain is an algorithm for detecting communities in graphs. Community detection helps us understand the natural divisions in a network in an unsupervised manner. It is used in e-commerce for customer segmentation and advertising, in communication networks for multicast routing and setting up of mobile networks, and in healthcare for epidemic causality, setting up health programmes, and fraud detection is hospitals. Community detection is an NP-hard problem, but heuristics exist to solve it (such as this). Louvain algorithm is an agglomerative-hierarchical community detection method that greedily optimizes for modularity (iteratively).

Modularity is a score that measures relative density of edges inside vs outside communities. Its value lies between −0.5 (non-modular clustering) and 1.0 (fully modular clustering). Optimizing for modularity theoretically results in the best possible grouping of nodes in a graph.

Given an undirected weighted graph, all vertices are first considered to be their own communities. In the first phase, each vertex greedily decides to move to the community of one of its neighbors which gives greatest increase in modularity. If moving to no neighbor's community leads to an increase in modularity, the vertex chooses to stay with its own community. This is done sequentially for all the vertices. If the total change in modularity is more than a certain threshold, this phase is repeated. Once this local-moving phase is complete, all vertices have formed their first hierarchy of communities. The next phase is called the aggregation phase, where all the vertices belonging to a community are collapsed into a single super-vertex, such that edges between communities are represented as edges between respective super-vertices (edge weights are combined), and edges within each community are represented as self-loops in respective super-vertices (again, edge weights are combined). Together, the local-moving and the aggregation phases constitute a pass. This super-vertex graph is then used as input for the next pass. This process continues until the increase in modularity is below a certain threshold. As a result from each pass, we have a hierarchy of community memberships for each vertex as a dendrogram. We generally consider the top-level hierarchy as the final result of community detection process.

Louvain algorithm is a hierarchical algorithm, and thus has two different tolerance parameters: tolerance and passTolerance. tolerance defines the minimum amount of increase in modularity expected, until the local-moving phase of the algorithm is considered to have converged. We compare the increase in modularity in each iteration of the local-moving phase to see if it is below tolerance. passTolerance defines the minimum amount of increase in modularity expected, until the entire algorithm is considered to have converged. We compare the increase in modularity across all iterations of the local-moving phase in the current pass to see if it is below passTolerance. passTolerance is normally set to 0 (we want to maximize our modularity gain), but the same thing does not apply for tolerance. Adjusting values of tolerance between each pass have been observed to impact the runtime of the algorithm, without significantly affecting the modularity of obtained communities. In this experiment, we compare the performance of three different types of dynamic Louvain with respect to the static version.

Naive dynamic:

  • We start with previous community membership of each vertex (instead of each vertex its own community).

Delta screening:

  • All edge batches are undirected, and sorted by source vertex-id.
  • For edge additions across communities with source vertex i and highest modularity changing edge vertex j*, i's neighbors and j*'s community is marked as affected.
  • For edge deletions within the same community i and j, i's neighbors and j's community is marked as affected.

Frontier-based:

  • All source and destination vertices are marked as affected for insertions and deletions.
  • For edge additions across communities with source vertex i and destination vertex j, i is marked as affected.
  • For edge deletions within the same community i and j, i is marked as affected.
  • Vertices whose communities change in local-moving phase have their neighbors marked as affected.

The input data used for the experiments is available from the SuiteSparse Matrix Collection. These experiments are done with guidance from Prof. Kishore Kothapalli and Prof. Dip Sankar Banerjee.


Comparing various Naive-dynamic approaches

In this experiment (approaches-naive), we compare the performance of two different types of naive dynamic Louvain with respect to the static version. The last approach (louvainSeqDynamicLast) considers the community membership of each vertex after the Louvain algorithm has converged (community membership from the "last" pass) and then performs the Louvain algorithm upon the new (updated) graph. This is similar to naive dynamic approaches with other algorithms. On the other hand, the first approach (louvainSeqDynamicFirst) considers the community membership of each vertex right after the first pass of the Louvain algorithm (this is the first community membership hierarchy) and then performs the Louvain algorithm upon the updated graph. With this approach, we allow the affected vertices to choose their community membership from the first pass itself, which which to my intuition would lead to better communities.

First, we compute the community membership of each vertex using the static Louvain algorithm (louvainSeqLast). We also run the static Louvain algorithm for only one pass (louvainSeqFirst). We then generate batches of insertions (+) and deletions (-) of edges of sizes 500, 1000, 5000, ... 100000. For each batch size, we generate five different batches for the purpose of averaging. Each batch of edges (insertion / deletion) is generated randomly such that the selection of each vertex (as endpoint) is equally probable. We choose the Louvain parameters as resolution = 1.0, tolerance = 1e-2 (for local-moving phase) with tolerance decreasing after every pass by a factor of toleranceDeclineFactor = 10, and a passTolerance = 0.0 (when passes stop). In addition we limit the maximum number of iterations in a single local-moving phase with maxIterations = 500, and limit the maximum number of passes with maxPasses = 500. We run the Louvain algorithm until convergence (or until the maximum limits are exceeded), and measure the time taken for the computation (performed 5 times for averaging), the modularity score, the total number of iterations (in the local-moving phase), and the number of passes. This is repeated for seventeen different graphs.

From the results, we make make the following observations. The performance of dynamic approaches upon a batch of deletions appears to increase with increasing batch size*. This makes sense since, as the graph keeps getting smaller, the computation would complete sooner. Next, the first naive dynamic approach is found to be significantly slower (~0.3x speedup) than the last approach. However, the first approach is still faster than the static approach upto a batch size of 50000. On the other hand, the last approach is faster than the static approach for all batch sizes. A similar behavior is observed with the total number of iterations. The first approach seems to have a slightly higher modularity with respect to the last approach. Since the modularity between the two dynamic approaches are almost the same, the last approach is clearly the best choice.


Comparision with Static approach

First (compare-static, main), we compute the community membership of each vertex using the static Louvain algorithm. We then generate batches of insertions (+) and deletions (-) of edges of sizes 500, 1000, 5000, ... 100000. For each batch size, we generate five different batches for the purpose of averaging. Each batch of edges (insertion / deletion) is generated randomly such that the selection of each vertex (as endpoint) is equally probable. We choose the Louvain parameters as resolution = 1.0, tolerance = 1e-2 (for local-moving phase) with tolerance decreasing after every pass by a factor of toleranceDeclineFactor = 10, and a passTolerance = 0.0 (when passes stop). In addition we limit the maximum number of iterations in a single local-moving phase with maxIterations = 500, and limit the maximum number of passes with maxPasses = 500. We run the Louvain algorithm until convergence (or until the maximum limits are exceeded), and measure the time taken for the computation (performed 5 times for averaging), the modularity score, the total number of iterations (in the local-moving phase), and the number of passes. This is repeated for seventeen different graphs.

From the results, we make make the following observations. The frontier-based dynamic approach converges the fastest, which obtaining communities with only slightly lower modularity than other approaches. We also observe that delta-screening based dynamic Louvain algorithm has the same performance as that of the naive dynamic approach. Therefore, frontier-based dynamic Louvain would be the best choice. All outputs are saved in a gist. Some charts are also included below, generated from sheets.




References




ORG DOI