Cancer - Gene Co-Expression Network Construction
====

We have a theory to explain why cancer happens. The theory is that cancer happens when the cell system's entroy increases due to external forces and the cell moves uphill from the minimum free energy. As the cell system is robust, it tries to maintain itself and control the energy. But it is not always successful, the cell may enter a local minimum of free energy where it is now stabilized. These local minimums may be the points that cell becomes cancerous. We want to see if our hypothesis is correct.

To this end, we use Prostate cancer data derived from NCBI Gene Omnibus GDS2545 dataset. In this iPython notebook, we generate gene co-expression networks for the 4 cell phenotypes: Normal, Adjacent, Tumor, Metastasis. We then use the constructed networks in another notebook to explore the hypothesis.

---

First, load data into notebook in a desirable format.

In [53]:
import numpy as np
from scipy.stats import spearmanr

def read_data(csv,metacols=None):
    """
    Reads expression data into a ndarray of expression data and tuple of gene identifiers.
    
    Parameters
    ----------
    csv : str
        Comma-separated csv file to read from. The first two columns are gene identifiers
        and prob_ids respectively. Next columns are expression data of each gene (rows)
        from different samples.
    metacols : int or None
        If csv contains meta data columns as defined in csv parameter, use this integer to
        separate them from expression data.
    
    Returns
    -------
    expression_data : ndarray or False
        The expression data. False when IO error happens.
    meta_data : ndarray
        The meta data matrix according to metacols provided.
    """
    try:
        raw_exp = np.genfromtxt(csv,delimiter=',',names=True,dtype=None)
        data_cols = list(raw_exp.dtype.names[metacols:])
        meta_cols = list(raw_exp.dtype.names[0:metacols])
        return raw_exp[data_cols].view(np.float64).reshape(raw_exp.shape[0],len(data_cols)),raw_exp[meta_cols]
    except IOError:
        return False,None

In [56]:
normal_exp,normal_meta = read_data("normal.csv",metacols=2)
adjacent_exp,adjacent_meta = read_data("adjacent.csv",metacols=2)
tumor_exp,tumor_meta = read_data("tumor.csv",metacols=2)
metastasis_exp,metastasis_meta = read_data("metastasis.csv",metacols=2)

In [58]:
print normal_exp.shape[0]

12625


Then use Spearmans correlation coefficient to construct a network, according to: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0087075.

In [63]:
import networkx as nx
from scipy.stats import spearmanr

def network(expression,pvaluefile=None,p=0.2,corr=0.7):
    if type(expression) is str:

        if type(pvaluefile) is not str:
            raise "Should provide pvaluefile."

        rs = np.fromfile(expression)
        size = np.sqrt(rs.shape[0])

        rs = rs.reshape((size,size))
        pvalue = np.fromfile(pvaluefile).reshape((size,size))
    else:
        rs,pvalue = spearmanr(expression,axis=1)

    zeros = np.zeros(rs.shape)
    rs_sig = np.where(pvalue < p,rs,zeros)
    rs_adj = np.where(np.absolute(rs_sig) > corr,rs_sig,zeros)
    np.fill_diagonal(rs_adj,0)
    G = nx.from_numpy_matrix(rs_adj)
    return G

try:
    network("normal-rs",pvaluefile="normal-rs-p",p=0.2,corr=0.7)
except IOError as e:
    print e
    #network(normal_exp,p=0.2,corr=0.7)


[Errno 2] No such file or directory: 'normal-rs'


In [None]:
spearmanr(split1,axis=1)