Skip to content

krzysztof-turowski/koala-networkit

Repository files navigation

koala-networkit

This project is a KOALA library fork built on top of the structures provided by the NetworKit library.

Table of Contents

  1. Overview of the library
  2. Installation
  3. Usage
  4. References

Overview of the library

In order to provide a simple, fast, powerful unified interface for a broad class of algorithms we joined the power and scalability of a fast, popular, and modern NetworKit library with an equally unique set of algorithmic tools for classical discrete optimization problems provided by the KOALA library.

NetworKit

NetworKit is an open-source general-purpose network analysis and graph mining library written in C++ (see [1] for details). Graphs are represented in a very compact form while the efficiency of changes to their structure, such as node and edge additions or deletions, is preserved. The memory-saving design is related to the main aim of the library: the analysis of large-scale random graphs and real-world networks.

The library contains algorithms mostly for the structural analysis of massive complex real-world networks, i.e. finding global network properties (such as density of a graph or its diameter), centrality measures (betweenness, closeness, local clustering), density and motifs measures (clustering coefficient, triangle counting, clique detection), and community detection. It also includes implementations of many generative network models such as Erdős–Rényi, Barabási–Albert or the hyperbolic unit-disk model. It has some procedures related to the classical graph problems e.g. Edmonds-Karp maximum flow algorithm, but it is lacking depth in this dimension, for example, there are no algorithms for graph coloring.

The authors of this library emphasize the usage of parallelization, heuristics, and efficient data structures to deal with computationally intensive problems on large data. For design, they aim at a modular architecture with encapsulation of algorithms into software components (classes and modules) for extensibility and code reuse. The code is written very clearly with promoting the compatibility of good coding practices and new standards of C++ in mind. For users, the library provides seamless integration with Python and its libraries e.g. pandas, matplotlib. It also contains interfaces to some external products e.g. graph visualization tool Gephi.

This library comes out as one the fastest in the field (see here for a comparison), it is quite readable and efficient in practice, especially compared to ad-hoc made up solutions. Although not as popular as Stanford Network Analysis Platform (SNAP) library, and not as endowed by Chan Zuckerberg Initiative as igraph library, NetworKit still has certain lively community (over 450 stars and 160 forks on GitHub) keeping with its development.

Koala

KOALA is an open-source library of C++ templates, developed at the Gdansk University of Technology, Department of Algorithms and System Modeling (see [2] for details). Its main part consists of an implementation of a broad set of procedures in the fields of algorithmic graph theory and network problems in discrete optimization. In particular, the library contains algorithms for:

  • graph coloring:
    • vertex coloring: heuristics (greedy, LF, SL, SLF, GIS, also with color interchange), $\Delta$-coloring, exact exponential-time,
    • edge coloring: heuristics (greedy, greedy with interchange), $(\mu + \Delta)$-coloring,
    • list vertex coloring, list edge coloring, interval vertex coloring, interval edge coloring,
  • graph search:
    • graph traversal: DFS, BFS, lexicographic BFS (with pre- and post-order versions),
    • shortest paths: Dijkstra, Bellman-Ford, Johnson, Floyd-Warshall,
    • spanning forest: Kruskal, Prim,
    • computation of connected components and biconnected components, negative cycle detection,
  • flow problems: maximum flow (Fulkerson-Ford, Dinic), minimum cost maximum flow, minimum edge or vertex cut, Gomory-Hu tree,
  • independent set (heuristics, approximations, and exact exponential-time) and factorization,
  • task scheduling on parallel identical processors with classical measures of the schedule cost: $C_{max}$, $L_{max}$, $\sum C_i$, $\sum U_i$, $\sum T_i$,
  • graph recognition for graph families: empty graphs, cliques, paths, caterpillar, trees, forests, cycles, connected, complete $k$-partite, regular, subcubic, block, bipartite, chordal, comparability, interval, split, cographs,
  • graph generation for several graph families (cliques, paths, cycles, fans, wheels, caterpillars, complete $k$-partite, regular trees) and random graphs models (Erdős–Rényi, Barabási–Albert, Watts–Strogatz),
  • operation on graphs: closures (reflexive, symmetric, transitive), line graphs, products of graphs (Cartesian, tensor, lexicographic, strong).
  • dominating set: exact exponential-time

The library was built on a custom, versatile, object-oriented templated graph structure, capable of handling edges of multiple types (undirected, directed, loops) at the same time. The creators wanted to provide a convenient and friendly user experience through interface procedures on various levels of complexity. At the most basic level, they presented generic versions of the algorithms, but using C++ language mechanisms they also allow more advanced programmers e.g. to customize the data structures (arrays, maps, priority queues) used by the algorithms instead of using the STL containers. For example, users can pick the structures which are tailored to guarantee computational complexity or the ones which are tailored to special practical cases.

Moreover, they set up an online graph editor Zgred, written in JavaScript. It allows to create, edit and visualize graphs. Furthermore, it is capable of running several algorithms from the library.

List of algorithms

  1. Reading and writing graphs: graph6, sparse6, digraph6, DIMACS, DIMACS binary formats
  2. Graph recognition:
    1. Perfect graphs
    2. Cographs: Habib-Paul, Bretscher-Corneil-Habib-Paul, Corneil-Stewart-Perl, Dahlhaus (sequential)
  3. Graph traversal: BFS, DFS
  4. Graph width parameters:
    1. Algorithm for treewidth and pathwidth in cographs
  5. Minimum spanning tree algorithms: Kruskal, Prim, Borůvka, Klein-Karger-Tarjan
    1. Hagerup algorithm for minimum spanning tree verification
  6. Flow algorithms
    1. Maximum flow: King-Rao-Tarjan
  7. Vertex coloring
    1. Greedy heuristics: RandomSequential, LargestFirst, SmallestLast, SaturatedLargestFirst, GreedyIndependentSet
    2. Exact exponential-time algorithms: Brown, Christofides, Brélaz, Korman
    3. Grötschel-Lovász-Schrijver algorithm for perfect graphs
    4. Algorithm for cographs
  8. Maximum independent set
    1. Algorithm for cographs
  9. Maximum clique
    1. Algorithm for cographs
  10. Minimum dominating set: Grandoni, Fomin-Grandoni-Kratsch, van Rooij-Bodlaender, Fomin-Kratsch-Woeginger, Schiermeyer
  11. Minimum set cover: exact branch and reduce

For further planned changes, see the Issues section.

List of datasets

To assess the speed of the algorithms we use primarily the publicly available Stanford Large Network Dataset, a set of real-world networks (social, information, biological, etc.) and datasets [3].

Installation

cmake -B build
cmake --build build --parallel 4

Note: it may take a while to download and compile dependencies (e.g. googletest, networkit, and boost).

Additionally, users need to install beforehand the following packages (or their equivalents): g++/clang, cpplint, gfortran, libblas-dev, liblapack-dev, libgtest-dev, libboost-all-dev.

Usage

References

[1] Christian Staudt, Aleksejs Sazonovs, Henning Meyerhenke, NetworKit: A tool suite for large-scale complex network analysis, Network Science 4(4):508-530, 2016.

[2] Krzysztof Giaro, Krzysztof Ocetkiewicz, Tomasz Goluch, KOALA Graph Theory Internet Service, TASK Quarterly 19(4):455-470, 2015.

[3] Jure Leskovec, Rok Sosic, SNAP Datasets: Stanford Large Network Dataset Collection, 2014.