# K-Graph Partitioning Problem
This project is a collection of some algorithm for the k-Graph Partitioning Problem. The algorithms that we are going to suggest includes:
- Backtracking
- Constraint/Integer Programming with Google OR-Tools
- Overview of Spectral Clustering with Scikit-learn
- Using Karlsruhe High Quality Graph Partitioning (KaHIP) Python module

## Problem Statement
Given an undirected graph $G = (V,E,w)$. We need to partition $V$ into $k$ subsets $V_1, ..., V_k$ such that:
- The $k$ subsets are of nearly equal size. The constraint might be either in the form 
$$\max_{i} |V_i| - \min_{i} |V_i| <= \alpha $$
or 
$$|V_i| < (\epsilon + 1) \frac{|V|}{k} \quad \forall i $$
for some $\alpha \in \mathbf{Z_{++}}$ or $\epsilon > 0$ are imbalance factors. We denote the two versions of the problems "$\alpha$-constrained" or "$\epsilon$-constrained". 
- The total weight of edges that connects two different subsets is minimized (the cut size), i.e. $$\text{minimize} \sum_{\{u,v\}\in C} w(\{u,v\}) \\\text{ for } C = \{\{u,v\}\in E : u \in V_i, v\in V_j, 1 \le i < j \le k \}$$

## Input Description
We use the format in the KaHIP description to define our graph data structure. It uses an adjacency structure stored in a compressed sparsed row format, which consists of four arrays:  
- vwgt: the vertex weight array of size n. In our problem, the vertices are of equal weight.
- xadj, adjncy: adjacency arrays of size (n+1) and (2m) to describe the edges. We will explain this later.
- adjcwgt: the edge weight array of size 2m. 

For example, the following arrays:  
<code>
xadj = [0,2,5,7,9,12]  
adjncy = [1,4,0,2,4,1,3,2,4,0,1,3]
</code>

would describe the following graph:

<img src="https://drive.google.com/uc?export=view&id=1mcnBJoyxmOyaG5CNFyOyWjV-kyYB_5HL" 
     width="100" />

The vertices are 0-indexed. The ith vertex is adjacent to vertices in <code>adjncy[xadj[i]:xadj[i+1]]</code>. For example, the vertex $1$ in the graph is adjacent to the edges with index <code> adjncy[2:5] </code>, which are $0,2,4$. Since one edge has to be described from two vertices, the <code>adjncy</code> and <code>adjcwgt</code> arrays are of size (2m).

In [1]:
import numpy as np
# Input for the problem in the above format
input_xadj       = np.array([0,2,5,7,9,12])
input_adjncy     = np.array([1,4,0,2,4,1,3,2,4,0,1,3])
input_vwgt       = np.array([1,1,1,1,1])
input_adjcwgt    = np.array([1,1,1,1,1,1,1,1,1,1,1,1])
epsilon          = 0.03 
alpha            = 1 
k                = 2
n = np.shape(input_vwgt)[0] # number of vertices
m = np.shape(input_adjcwgt)[0] // 2 # number of edges

In [2]:
# Generate adjacency matrix, since for some of these algorithms to work, n should not be too large

adjacency = [[0 for i in range(n)] for j in range(n)]
for i in range(n):
    adjacency[i][i] = 0
for i in range(n):
    s,t = input_xadj[i], input_xadj[i+1]
    for j in range(s, t):
        adjacency[i][input_adjncy[j]] = input_adjcwgt[j]

## Backtracking Approach
The backtracking algorithm works well for small datasets, and would generate k-partitions of the set {0,...,n-1}. But for larger graphs, time would rise exponentially. The total number of k-partitions would be the Stirling number of the second kind (S(n,k)). 

We could combine the algorithm with branch and bound method to reduce the number of total partitions. 

In [3]:
partition = [0 for i in range(n)]

ans_partition = [-1 for i in range(n)]

lower_bound = float('inf')


def Try(i, mx, ans=0):
    global lower_bound, ans_partition
    def check_valid_partition(setting=0):
        def check_alpha(count):
            mn = min(count)
            mx = max(count)
            return (mx - mn <= alpha)
        
        def check_epsilon(count):
            mx = max(count)
            return mx < (1 + epsilon)
        
        count = [0 for _ in range(k)]
        for i in range(n):
            count[partition[i]] += 1
        if setting == 0:
            return check_alpha(count)
        else:
            return check_epsilon(count)
    
    if i == n:
        if check_valid_partition() == True and ans < lower_bound:
            ans_partition = partition[:]
            lower_bound = ans
    elif i == 0:
        partition[0] = 0
        Try(1,0)
    else:
        if mx==k-1:
            for j in range(k):
                new_ans = ans
                partition[i] = j
                # Update the weight
                for t in range(i):
                    if partition[t] != j:
                        new_ans += adjacency[i][t]
                if new_ans <= lower_bound:
                    Try(i+1, k-1, new_ans)
                else:
                    continue

        elif (i-mx)+k == n + 1:
            new_ans = ans
            partition[i] = mx+1
            for t in range(i):
                if partition[t] != mx+1:
                    new_ans += adjacency[i][t]
            if new_ans <= lower_bound:
                Try(i+1, mx+1, new_ans)
            else:
                return
        else:
            for j in range(mx+2):
                new_ans = ans
                partition[i] = j 
                for t in range(i):
                    if partition[t] != j:
                        new_ans += adjacency[i][t]
                if new_ans <= lower_bound:
                    Try(i+1, max(mx, j), new_ans)
                else:
                    continue

Try(0,0)
print(*ans_partition)

0 0 1 1 0


## Constraint/Integer Programming using OR-Tools
OR-Tools is a strong tool for operation research, developed by Google. It is used for constraint programming and integer linear programming (ILP) problems. Today we are going to suggest the mathematical model for both versions of the partitioning problem: the alpha-constrained problem and the epsilon-constrained problem.

Let start with installing the OR-Tools package in Python.

In [4]:
# %pip install ortools

Let us begin with introducing an integer programming formulation for the $\epsilon$-constrained version. First we introduce binary decision variable for all edges and vertices of the graph. For each edge $e=\{u,v\}\in E$, let $e_{uv} \in \{0,1\}$, i.e. whether $e$ is a cut edge. Moreover, for each $v\in V$ and subset $p$, let $x_{v,p} \in \{0.1\}$ to denote if $v$ is in subset $p$ or not. We have a total of $|E|+k|V|$ variables. 

The maximum size of a subset should be:
$$ M = (1+\epsilon) \lceil \frac{|V|}{k} \rceil $$ 

To ensure a valid partition, we have:
$$ \forall \{u,v\} \in E, \forall p: e_{uv} \ge + x_{u,p} - x_{v,p} $$
$$ \forall \{u,v\} \in E, \forall p: e_{uv} \ge - x_{u,p} + x_{v,p}  $$
$$ \forall p : \sum_{v\in V} x_{v,p} \le M \quad (*)$$ 
$$ \forall v\in V: \sum_{p} x_{v,p} = 1 $$

If we want the partition size to be constrained by alpha instead of epsilon, then $(*)$ should be replaced with

$$ \forall p : \sum_{v \in V} x_{v,p} \le mx $$
$$ \forall p : \sum_{v \in V} x_{v,p} \ge mn $$
$$ mx - mn \le \alpha $$
The objective function is:

$$ \text{minimize } \sum_{\{u,v\}\in E}e_{uv} w(\{u,v\}) $$

In [13]:
from ortools.linear_solver import pywraplp
import math
solver = pywraplp.Solver.CreateSolver('SCIP')

V_max = (1 + epsilon) * math.ceil(n / k) 
infinity = solver.infinity()
X = dict()
E = dict()
for v in range(n):
    for p in range(k):
        X[(v,p)] = solver.IntVar(0,1,'X[{},{}]'.format(v,p))

for u in range(n):
    for j in range(input_xadj[u],input_xadj[u+1]):
        v = input_adjncy[j] 
        if u < v:
            E[(u,v)] = solver.IntVar(0,1,'E[{},{}]'.format(u,v))

print("Number of Variables: {}".format(solver.NumVariables()))

for u,v in E:
    for p in range(k):
        solver.Add(E[u,v] >= X[u,p] - X[v,p])
        solver.Add(E[u,v] >= -X[u,p] + X[v,p])

# EPSILON-STYLED CONSTRAINT
# for p in range(k):
#     constraint = solver.RowConstraint(0,V_max,'')
#     for v in range(n):
#         constraint.SetCoefficient(X[v,p], 1)
#########

# ALPHA-STYLED CONSTRAINT

mx = solver.IntVar(0,n, 'mx')
mn = solver.IntVar(0,n, 'mn')


for p in range(k):
    constraint = solver.RowConstraint(0,infinity,'')
    for v in range(n):
        constraint.SetCoefficient(X[v,p], 1)
    constraint.SetCoefficient(mn, -1)
for p in range(k):
    constraint = solver.RowConstraint(0, infinity,'')
    for v in range(n):
        constraint.SetCoefficient(X[v,p], -1)
    constraint.SetCoefficient(mx, 1)
    
solver.Add(mx - mn <= alpha)
########
for p in range(k):
    constraint = solver.RowConstraint(0,infinity,'')
    for v in range(n):
        constraint.SetCoefficient(X[v,p], 1)
        
for v in range(n):
    constraint = solver.RowConstraint(1,1,'')
    for p in range(k):
        constraint.SetCoefficient(X[v,p], 1)

objective = solver.Objective()
for u,v in E:
    objective.SetCoefficient(E[(u,v)], int(adjacency[u][v]))
objective.SetMinimization()

status = solver.Solve()

if status == pywraplp.Solver.OPTIMAL:
    print('Objective Value =', solver.Objective().Value())
    for v in range(n):
        for p in range(k):
            if X[(v,p)].solution_value() == 1.:
                print("Vertex {} belongs to Partition {}".format(v,p))
    print()
    print('Problem solved in %f milliseconds' % solver.wall_time())
    print('Problem solved in %d iterations' % solver.iterations())
    print('Problem solved in %d branch-and-bound nodes' % solver.nodes())
else:
    print('The problem does not have an optimal solution.')


Number of Variables: 16
Objective Value = 2.0
Vertex 0 belongs to Partition 0
Vertex 1 belongs to Partition 0
Vertex 2 belongs to Partition 1
Vertex 3 belongs to Partition 1
Vertex 4 belongs to Partition 0

Problem solved in 8.000000 milliseconds
Problem solved in 28 iterations
Problem solved in 1 branch-and-bound nodes


## Overview of Spectral Clustering with `scikit-learn`

Spectral clustering is a powerful clustering algorithm that could be used effectively for nonconvex clusters. In this case, we are going to apply it into an undirected edge-weighted graph, i.e. this problem.




## Karlsruhe High Quality Graph Partitioning Module

This library is a powerful library for graph partitioning problem. Its multilevel algorithm incorporates many valuable idea, including graph coursening, local search refinements, etc. 



To install KaHIP package for Python, please refer to https://github.com/KaHIP/KaHIP where there are detailed instructions. To make it work in Colab, it requires a little bit of tweak. The cell below is to download and build the package with cmake. Uncomment it and run

In [None]:
!python3 -m pip install pybind11
!git clone https://github.com/KaHIP/KaHIP
!KaHIP/compile_withcmake.sh BUILDPYTHONMODULE

Go to directory: <code>KaHIP/deploy</code>

In [None]:
!cd KaHIP/deploy
!pwd

Then you could import kahip in your program. Note that it might not work first try. Even I do not know why my setup works.

In [None]:
import kahip

#build adjacency array representation of the graph
xadj           = input_xadj 
adjncy         = input_adjncy 
vwgt           = input_vwgt 
adjcwgt        = input_adjcwgt 
supress_output = 0
imbalance      = epsilon
nblocks        = input_k
seed           = 0

# set mode 
#const int FAST           = 0;
#const int ECO            = 1;
#const int STRONG         = 2;
#const int FASTSOCIAL     = 3;
#const int ECOSOCIAL      = 4;
#const int STRONGSOCIAL   = 5;
mode = 2 

edgecut, blocks = kahip.kaffpa(vwgt, xadj, adjcwgt, 
                              adjncy,  nblocks, imbalance, 
                              supress_output, seed, mode)

print(edgecut)
print(blocks)