# CONDOR usage example

### Author: 
Genís Calderer*. 

*Kuijjer Lab (NCMM) - genis.calderer@gmail.com

## Introduction
The condor method is an implementation of the brim algorithm for the analysis of bipartite networks. The purpose of this algorithm is to find a community structure in bipartite networks that takes into account the bipartite structure of the network as opposed to using the network as if it did not have an extra structure.
This algorithm was first described in the paper "Modularity and community detection in bipartite networks" by Michael J. Barber." The python implementation of condor is based on the R version presented in the paper "Bipartite Community Structure of eQTLs" by John Platig , Peter J. Castaldi, Dawn DeMeo, John Quackenbush.

This guide will show how to use CONDOR using a toy network of pollinization between bee species and flower species. It is a small network but as we will see it has a quite well defined modularity structure.

## 1. Importing CONDOR from netZooPy

In order to use the CONDOR functions it has to be imported from the netZooPy as follows:

In [1]:
from netZooPy import condor

To check the parameters and information of the methods in the condor_object class you can do the following:

In [3]:
help(condor.condor_object)

Help on class condor_object in module condor.condor:

class condor_object(builtins.object)
 |  condor_object(network_file=None, sep=',', index_col=0, header=0, dataframe=None)
 |  
 |  Methods defined here:
 |  
 |  __init__(self, network_file=None, sep=',', index_col=0, header=0, dataframe=None)
 |      Description:
 |          Initialization of the condor object. The function gets a network in edgelist format as a path to a file or encoded in a pandas dataframe.
 |          Builds a condor_object with an edgelist,an igraph network, names of the targets and regulators.
 |          
 |          Note: The edgelist is assumed to contain a bipartite network. The program will relabel the nodes so that the edgelist represents a bipartite network anyway.
 |          It is on the user to know that the network they are using is suitable for the method.
 |      Inputs:
 |          network_file: Path to file encoding an edgelist.
 |          sep: Separator used in the file.
 |          index_col

## 2.1 Create a CONDOR object from an edgelist in a file

A CONDOR object can be created from an edgelist encoded into a file as following:

In [2]:
condor_object = condor.condor_object(network_file="toynetwork.csv")

Object creation:
  Elapsed time: 0.01 sec.


By default the method uses csv edgelists with header and index colum, consult the function documentation for details about modifying the format. 

## 2.2 Create a CONDOR object from a DataFrame

A CONDOR objecte can also be created from a pandas DataFrame.

In [4]:
import pandas as pd
network = pd.read_csv("toynetwork.csv",index_col=0)
network.head(5)

Unnamed: 0,pollinator,plant,interactions
1,Adela.purpurea,Salix.fragilis,20
2,Adela.purpurea,Chamaedaphne.calyculata,0
3,Adela.purpurea,Nemopanthus.mucronata,0
4,Adela.purpurea,Andromeda.glaucophylla,0
5,Adela.purpurea,Kalmia.polifolia,0


We initialize the CONDOR object with the parameter dataframe=network

In [5]:
condor_object = condor.condor_object(dataframe=network)

Object creation:
  Elapsed time: 0.01 sec.


## 3. Running CONDOR

The next step is computing the initial community structure. By default we use the Louvain method. Consult the documentation of the method for different initial community structure algorithms and projection.

In [6]:
condor_object.initial_community()

interactions <class 'str'>
Initial community structure without projection:
Initial modularity:  0.5253469286550468
  Elapsed time: 0.00 sec.


The condor object now has a community structure associated but it is not specific for bipartite networks. We apply the brim algorithm to find the bipartite community structure.

In [8]:
condor_object.brim(deltaQmin="def")

Matrix computation:
  Elapsed time: 0.04 sec.
BRIM: 
0.5266669647502601
0.5266669647502601
  Elapsed time: 0.01 sec.


The numbers in the output of this function show the bipartite modularity score for each iteration. The modularity of a bipartite network is a value from 0 to 1 that quantifies how well separated is the network into modules. A score of 0.52 is quite high.

## 4. Results

The resulting condor_object of the above process has the membership of the target and regulator nodes into the different communities that have been found. These are stored in the variables ```tar_memb``` and ```reg_memb```

For example if we want to see the membership of the $reg$ nodes we can do it as follows:

In [9]:
condor_object.reg_memb.head(5)

Unnamed: 0,reg,com
0,reg_Adela.purpurea,0
1,reg_Andrena.alleghaniensis,1
2,reg_Andrena.bradleyi,2
3,reg_Andrena.carlini,2
4,reg_Andrena.carolina,3


## 5. Running CONDOR the whole process from a filename

We note that the guide above shows how to use the method step by step. There is also the possibility to run automatically the whole process starting only with the filename of the network's edgelist in csv format.
This is done using the condor function, and it outputs the target and regulator memberships into csv files.

In [10]:
condor.run_condor("toynetwork.csv")

Object creation:
  Elapsed time: 0.01 sec.
interactions <class 'str'>
Initial community structure without projection:
Initial modularity:  0.5253469286550468
  Elapsed time: 0.00 sec.
Matrix computation:
  Elapsed time: 0.01 sec.
BRIM: 
0.5266669647502601
0.5266669647502601
  Elapsed time: 0.00 sec.


Check the documentation of the ```run_condor``` function to toggle the options of the method.

In [11]:
help(condor.run_condor)

Help on function run_condor in module condor.condor:

run_condor(network_file, sep=',', index_col=0, header=0, initial_method='LCS', initial_project=False, com_num='def', deltaQmin='def', resolution=1, return_output=False, tar_output='tar_memb.txt', reg_output='reg_memb.txt')
    Description:
        Computation of the whole condor process. It creates a condor object and runs all the steps of BRIM on it. The function outputs 
        
        Note: The edgelist is assumed to contain a bipartite network. The program will relabel the nodes so that the edgelist represents a bipartite network anyway.
        It is on the user to know that the network they are using is suitable for the method.
    Inputs:
        network_file: Path to file encoding an edgelist.
        sep: Separator used in the file.
        index_col: Column that stores the index of the edgelist. E.g. None, 0...
        header: Row that stores the header of the edgelist. E.g. None, 0...
        initial_method: Method to d