GitHub - puzzlef/labelrank-communities-openmp: Comparing approaches for community detection using OpenMP-based LabelRank algorithm.

Comparing approaches for community detection using OpenMP-based LabelRank algorithm.

LabelRank is an algorithm for detecting communities in graphs. Community detection helps us understand the natural divisions in a network in an unsupervised manner. It is used in e-commerce for customer segmentation and advertising, in communication networks for multicast routing and setting up of mobile networks, and in healthcare for epidemic causality, setting up health programmes, and fraud detection is hospitals. Community detection is an NP-hard problem, but heuristics exist to solve it (such as this). LabelRank is an iterative algorithm that is based on the concept of propagation of weighted labels on a weighted (directed) network, where the highest weight label determines the community membership of each vertex.

Our implementation of LabelRank differs from the original algorithm in that there is a fixed upper limit on the number of labels per vertex (labelset capacity). Therefore we do not use the cutoff operator (which removes low-weighted labels), but instead trim-off labels if they do not fit within labelset capacity. Labels are sorted by weight such that only low-weighted labels are eliminated. The input data used for each experiment given below is available from the SuiteSparse Matrix Collection. All experiments are done with guidance from Prof. Kishore Kothapalli and Prof. Dip Sankar Banerjee.

Adjusting OpenMP schedule

In this experiment (adjust-schedule, main), we attempt performing the OpenMP-based LabelRank algorithm with all different schedule kinds (static, dynamic, guided, and auto), which adjusting the chunk size from 1 to 65536 in multiples of 2. In all cases, we use a total of 12 threads. We choose the LabelRank parameters as inflation = 1.5 and conditionalUpdate = 0.5. In addition, we use a fixed labelset capacity of 4, and run the algorithm for exactly 10 iterations. We measure the time taken for the computation (performed 5 times for averaging), and measure the modularity score (one of the ways to measure quality of communities). This is repeated for seventeen different graphs.

From the results, we observe that a dynamic schedule kind, with a chunk size of 1024 appears to perform the best in most cases. Exceptions include soc-Epinions1 graphs, where using a chunk size of 256 would be a better choice; and web-Google, where using a guided schedule kind with chunk size of 256 would be better. Note that these paramter choices might differ if a different labelset capacity, or different LabelRank parameters are used.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
main.cxx		main.cxx
main.py		main.py
main.sh		main.sh
process.js		process.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

CITATION.cff

CITATION.cff

LICENSE

LICENSE

README.md

README.md

main.cxx

main.cxx

main.py

main.py

main.sh

main.sh

process.js

process.js

Repository files navigation

Adjusting OpenMP schedule

References

About

Languages

License

puzzlef/labelrank-communities-openmp

Folders and files

Latest commit

History

Repository files navigation

Adjusting OpenMP schedule

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages