# Exploring: *On comparing clusterings...*

[On comparing clusterings:
an element-centric framework unifies overlaps and hierarchy](../papers/1706.06136.pdf)

[CluSim: a package for calculating clustering similarity](https://github.com/ajgates42/clusim)
- [docs](https://ajgates42.github.io/clusim/html/clusim.html)

# Notes

existing clustering comparison measures have critical biases which undermine their usefulness

no measure accomodates both overlapping and hierarchical clusterings



## Measures


## Biases

What are the critical biases? (see references: [14, 15, 3, 16, 17])


# Idea

unify the comparison of disjoint, overlapping, and hierarchical structued clusterings


# Approach

elements are compared based on the relationships induced by the cluster structure

the framework does not suffer from critical biases and naturallyt provides unique insights into how clusterinsg differ

element-centric similarity can provide detailed insights into how two clusterings differ because the similarity is calculated at the level of individual elements

simply examining individual element-wise scores reveal how consistently each element is grouped across clusterings



## Concepts

### cluster affiliation graph
An undirected bipartite graph where one vertex set corresponds to the elements, the other corresponds to the clusters, and a weighted edge exists between a cluster and each of its elements.

### cluster-induced element graph
Formed by projecting the cluster affiliation graph (with $N \times K_{c}$ bipartite adjacency matrix $\mathbb{A}$) onto the element vertices resulting in a directed graph with the edge $w_{ij}$ between elements $v_{i}$ and $v_{j}$ having weight:

$$w_{ij} = \sum_{\gamma}\frac{a_{i\gamma}a_{j\gamma}}{\sum_{k}a_{ik} \sum_{m}a_{m\gamma}}$$

### element affinity matrix

### element-wise comparison

### clustering similarity

### clustering differences

### agreement
The *average agreement* between a reference clustering and a set of clusterings measures the regular grouping of elements with respect to a reference clustering.

### frustration
The *frustration* within a set of clusterings reflects the consistency with which elements are grouped by the clustering.

## Terms

### Types of Clusters
**partition** - a clustering in which all elemtns are members of one, and only one, cluster<br>
**overlapping** - clustering which allows elements to be members of multiple clusters<br>
**hierarchical** - clusterings capture the nested organization of clusters at different scales 


## Notation

$C$ - clustering<br>
$E$ - <br>
$K_{c}$ - number of clusters<br>
$N$ - number of elements (vertices, nodes)<br>
$V$ - elements, vertices (nodes)<br>

$c_{\beta}$ - a cluster in $C$<br>
$h(l_{\beta}) = e^{rl_{\beta}}$ - hierarchical weighting function<br>
$l_{\beta}$ - hierachical level $\in [0, 1]$<br>
$r$ - scaling parameter<br>
$w_{ij}$ - edge weight between elements $v_{i}$ and $v_{j}$<br>

# Reproducing Results

## The convolution of meta-data in social networks

### References

[Traud, A. L., Kelsic, E. D., Mucha, P. J. & Porter, M. A. Comparing community
structure to characteristics in online collegiate social networks. SIAM Review 53, 526–
543 (2011).](https://arxiv.org/abs/0809.0690)

[Traud, A. L., Mucha, P. J. & Porter, M. A. Social structure of facebook networks.
Physica A: Statistical Mechanics and its Applications 391, 4165–4180 (2012).](https://arxiv.org/abs/1102.2166)

[Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre, *Fast unfolding of communities in large networks*, J. Stat. Mech. (2008) P10008](https://arxiv.org/abs/0803.0476)

## Faceboook 100 dataset

[Porter, M., *Facebook100 Data Set*, Feb. 11, 2011](http://masonporter.blogspot.com/2011/02/facebook100-data-set.html)

[Lee, C., *Facebook100 data and a parser for it*, Blogspot, March 5, 2011](http://sociograph.blogspot.com/2011/03/facebook100-data-and-parser-for-it.html)

[*Social Structure of Facebook Networks Facebook Data Scrape*, arXiv:1102.2166v1 \[cs.SI\]](https://archive.org/details/oxford-2005-facebook-matrix)

Note: the Facebook 100 data set has been uploaded to my IU Box account, in the D699 folder.<br>
The data files are in MATLAB format (MATLAB 5.0 MAT-file); they should be readable by [Scipy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html), [Read .mat files in Python (Stack Overflow)](https://stackoverflow.com/questions/874461/read-mat-files-in-python)

## Software and Tools

Python<br>
NetworkX<br>
[Louvain Community Detection](https://github.com/taynaud/python-louvain)<br>
Jupyter Notebook<br>
Gephi<br>

### CluSim
```
git clone git@github.com:ajgates42/clusim.git
python setup.py install
```

In [1]:
from clusim.clustering import Clustering, print_clustering

DendroPY not supported.


In [2]:
elm2clu_dict = {0:[0], 1:[0], 2:[0,1], 3:[1], 4:[2], 5:[2]}

In [3]:
clu = Clustering()

In [4]:
clu.from_elm2clu_dict(elm2clu_dict)

In [5]:
print_clustering(clu)

012|23|45
