# Using CONDOR for community detection in bipartite graphs
Author: John Platig<sup>1</sup>

<sup>1</sup> Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA.

# 1. Introduction
COmplex Network Description Of Regulators (CONDOR) implements methods for clustering bipartite networks and estimating the contribution of each node to its community's modularity. For an application of this method to identify diesease-associated single nucleotide polymorphisms, see<sup>1</sup> (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005033).

## Implementing the Bipartite Modularity Maximization
The code in **condor.modularity.max** is an implementation of the method described in Michael Barber's paper<sup>2</sup> **Modularity and community detection in bipartite networks** ([Phys. Rev. E 76, 066102 (2007)](https://journals.aps.org/pre/abstract/10.1103/PhysRevE.76.066102)). A few general comments:

-  Maximizing bipartite modularity is an NP-hard problem
-  This method is heuristic and can depend on initial assignments of the nodes to communities
-  For the implementation in **condor.cluster**, I use a non-bipartite community detection method from the **igraph** package to use as initial assignments of nodes to communities, which are then used in **condor.modularity.max**.
-  Community structure is designed to cluster networks that form a giant connected component. All of the analysis in this package uses the giant connected component.

# 2. Workflow

In [None]:
library(netZooR)

**condor** works with an edgelist (**elist** in the code below) as its input. 

In [None]:
r = c(1,1,1,2,2,2,3,3,3,4,4);
b = c(1,2,3,1,2,4,2,3,4,3,4);
reds <- c("Alice","Sue","Janine","Mary")
blues <- c("Bob","John","Ed","Hank")
elist <- data.frame(red=reds[r], blue=blues[b])

In **elist**, notice all nodes of the same type--women and men in this case--appear in the same column together. This is a requirement. **create.condor.object** will throw an error if a node appears in both columns. 

In [None]:
condor.object <- createCondorObject(elist)

A condor.object is just a list. You can look at the different items using **names**

In [None]:
names(condor.object)

**condor.cluster** will cluster the nodes and produce the overall modularity along with two community membership **data.frames**:

In [None]:
condor.object <- condorCluster(condor.object)
print(condor.object$red.memb)
print(condor.object$blue.memb)

Nodes in first community are {Alice, John, Bob, Sue}, nodes in second community are {Ed, Janine, Hank, Mary} based on the modularity maximization. Here's a picture:

In [None]:
gtoy = graph.edgelist(as.matrix(elist),directed=FALSE)
set.graph.attribute(gtoy, "layout", layout.kamada.kawai(gtoy))
V(gtoy)[c(reds,blues)]$color <- c(rep("red",4),rep("blue",4))

In [None]:
plot(gtoy,vertex.label.dist=2)

To get each node's modularity contribution (as a fraction of the community's modularity), run

In [None]:
condor.object <- condorQscore(condor.object)

If you have a subset of nodes that you think are more likely to lie at the cores of your communities, you can test this using **condor.core.enrich**:

In [None]:
q_women <- condor.object$qscores$red.qscore
core_stats <- condorCoreEnrich(test_nodes=c("Alice","Mary"),
                                 q=q_women,perm=TRUE,plot.hist=TRUE)

**condor** also works on weighted bipartite networks. The package comes with a quantitative pollination network data set (Small 1976) taken from the NCEAS interaction webs data base, containing interactions between 13 plants and 34 pollinators.

In [None]:
data(small1976)
condor.object <- createCondorObject(small1976)
condor.object <- condorCluster(condor.object, project=F)

We can see that this graph has a modularity of 0.52, which indicates a rich community structure.

In [None]:
condorPlotHeatmap(condor.object)

This function plots the adjacency matrix of the graph as a heatmap and then group the edges by community. The rows have the source nodes and the columns are the target nodes, and the edge weighted are represented in the heatmap. We see that the CONDOR detected 8 communities in this graph.

# References

1- Platig, John, et al. "Bipartite community structure of eQTLs." PLoS computational biology 12.9 (2016): e1005033.

2- Barber, Michael J. "Modularity and community detection in bipartite networks." Physical Review E 76.6 (2007): 066102.