In [1]:
import collections

In [24]:
import csv

In [2]:
import itertools

In [3]:
import numpy as np

In [4]:
import os

In [5]:
from sklearn import metrics

---

In [6]:
os.chdir('/work/jyoung')

In [7]:
import pyuserfcn

In [8]:
os.chdir('/work/jyoung/genetic_interact/src')

In [9]:
import func_net_pred

---

**2015 September 9**

Start with *Saccharomyces cerevisiae*. For predictive seed sets (0.8 &le; AUC < 1.0), examine all possible pairs and count the number of genetic interactions between them. Execute for each type of genetic interaction. Seed sets are from BIOGRID v3.4.127 with all interactions before 2007 removed. 

**2015 September 10**

It is not feasible to only examine all of the counts. There are 240 seed gene sets with 0.8 &le; AUC < 1.0, which yields 28680 total pairs - far too many to examine one-by-one. Try using a binomial distribution to calculate the significance of interaction between seed sets. Suppose the number of genes in the 1<sup>st</sup> seed set is *n*<sub>1</sub>, the number of genes in the 2<sup>nd</sup> seed set is n<sub>2</sub>, and the number of interactions between the sets is *k*. If the total number of all interacting pairs is *N*, then from the probability mass function of the binomial distribution, 

$$ \Pr(X = k) = {n_1 n_2 \choose k} \frac{1}{N^k} \left(1 - \frac{1}{N}\right)^{n_1 n_2 - k} $$

In [10]:
experimentSys = 'Dosage Growth Defect'

In [11]:
node2edgewt = func_net_pred.process_func_net()
gene2idx = func_net_pred.assign_gene_indices(node2edgewt)

In [12]:
matrixPath = '/work/jyoung/genetic_interact/data/YeastNet2_adj_matrix.npy'
adjMat = np.load(matrixPath)

In [13]:
len(gene2idx.keys())  # number of genes in functional net

5483

In [13]:
seedSets = func_net_pred.read_biogrid(experimentSys)

Number of genes in interactions: 1192


In [14]:
seedAUC, seed2interactors = func_net_pred.seed_set_predictability(gene2idx, adjMat, seedSets)

Create a list containing predictive seed genes.

In [15]:
lowerAUC = 0.8
upperAUC = 1.0

In [16]:
predictiveSeeds = list()
for p in seedAUC:  # p=(AUC, gene)
    if p[0] >= lowerAUC and p[0] < upperAUC:
        predictiveSeeds.append(p[1])

In [23]:
len(predictiveSeeds)

240

Now assemble all the genetic interactions into a single set. 

In [18]:
interactPairs = set()
for seed in predictiveSeeds:
    for interactor in seed2interactors[seed]:
        interactPairs.update([(seed, interactor), (interactor, seed)])

In [19]:
len(interactPairs)  # number of interacting pairs

1624

For each set of interactors, count the total number of interactions between them. 

In [21]:
counts = list()
for seedPair in itertools.combinations(predictiveSeeds, 2):  # seedPair=(seed1, seed2)
    interactionCount = 0
    num1stSet = len(seed2interactors[seedPair[0]])
    num2ndSet = len(seed2interactors[seedPair[1]])
    for p in itertools.product(seed2interactors[seedPair[0]], seed2interactors[seedPair[1]]):
        if p in interactPairs:
            interactionCount += 1
    counts.append((num1stSet, num2ndSet, interactionCount))

In [22]:
len(counts)

28680

Create directory to store text files containing counts.

    cd /work/jyoung/genetic_interact/results
    mkdir yeastInteractClustCounts

Write out counts to a text file in CSV format.

In [25]:
os.chdir('/work/jyoung/genetic_interact/results/yeastInteractClustCounts')

In [27]:
with open(''.join(experimentSys.split()) + 'Counts.txt', 'w', newline='') as csvfile:
    csvwriter = csv.writer(csvfile)
    csvwriter.writerow(['# interactions in 1st seed set', 
                       '# interactions in 2nd seed set',
                       '# interactions between sets'])
    for t in counts:
        csvwriter.writerow(t)