# 3.3 Generate tables for the multi-layer network

We limited the multilayer network to the layers 1 to 9 (10-15vs0-5 to 90-95vs80-85), as there were no more query nodes within the next layers.

First, the interactome was built such as:

1. All interactions were set as undirected.
2. A confidence score of 1 was assigned to the PDIs.
3. Multi-edges were aggregated from the PPI table, such as the mean of PPI scores for a given node pair was calculated.
4. The final score of an edge between a given node pair was calculated with the function: `(2 * P + D) / 4`, where P is the PPI score and D is the number of PDI edges between the node pair.

Node weights were calculated such as:

1. Node weights were computed with: `- log10(Padj) * |LFC|`. Nodes which were not present in the RNA-sequencing dataset were filtered out.
2. Then, for each column, weights were normalized to a range between 0.001 and 1, as the null weight is not managed by AnatApp.

A node was tagged as a query for a given layer if its gene was differentially expressed (adjusted p-value <= 0.05) at this layer. 

Inter-layer edge's weights were computed using `( wDys1 + wDys2 ) / ( 1 + wDys1 + wDys2 )`, where wDys1 and wDys2 are the weights of the node in the layer i and i+1, respectively.

For all tables, the node ID of core cell-cycle proteins were replaced by their gene name to help to identify them.

## Input

* `data-create_networks/cell_cycle_genes/core_cell_cycle_proteins.tsv` core proteins involved in the yeast cell-cycle.
* `data-create_networks/yeastract_TF/protein_DNA_interactions.tsv`: cleaned protein-dna interactions of transcription factors from YEASTRACT database with locus names.
* `data-create_networks/hitpredit-03Aug2020/protein_protein_interactions.tsv`: cleaned HitPredict protein-protein interaction network.
* Files within the folder `data-create_networks/yeast_Kelliher_2016/comparisons/`: results of the DEA for the 18 comparisons.

## Output

* `data-create_networks/yeast_multiLayerNetwork/nodeTable.tsv`: node table of the multi-layer network.
* `data-create_networks/yeast_multiLayerNetwork/intraLayerEdgeTable.tsv`: intra-layer edge table of the multi-layer network.
* `data-create_networks/yeast_multiLayerNetwork/interLayerEdgeTable.tsv`: inter-layer edge table of the multi-layer network.

In [1]:
import pandas as pd
import numpy as np
import glob

In [2]:
coreGenes_file = '../../data-create_networks/cell_cycle_genes/core_cell_cycle_proteins.tsv'
pdiTable_file = '../../data-create_networks/yeastract_TF/protein_DNA_interactions.tsv'
ppiTable_file = '../../data-create_networks/hitpredit-03Aug2020/protein_protein_interactions.tsv'
dea_folder = '../../data-create_networks/yeast_Kelliher_2016/comparisons/'
nodeTable_file = '../../data-create_networks/yeast_multiLayerNetwork/nodeTable.tsv'
intraTable_file = '../../data-create_networks/yeast_multiLayerNetwork/intraLayerEdgeTable.tsv'
interTable_file = '../../data-create_networks/yeast_multiLayerNetwork/interLayerEdgeTable.tsv'

fdrCutoff = 0.05

## Import data

In [3]:
ppi = pd.read_csv( ppiTable_file, sep='\t' )
pdi = pd.read_csv( pdiTable_file, sep='\t' )

## Build the interactome

Merge PPI and PDI edges and compute their scores.

In [4]:
# sort sources and targets to compare both dataframes
undiPPI = ppi.copy()
undiPPI[['source','target']] = np.sort( undiPPI[['source','target']].values )
undiPDI = pdi.copy()
undiPDI[['source','target']] = np.sort( undiPDI[['source','target']].values )

# give a score to the PDI equal to the number of PDIs
undiPDI['score'] = 1.
undiPDI = undiPDI.groupby(by=['source', 'target']).count().reset_index()

# aggregate multi edges of PDI table and multiple by 2 (because 1 undirected edge = 2 directed edge)
undiPPI = undiPPI.groupby(['source', 'target']).mean().reset_index()
undiPPI['score'] = undiPPI['score'] * 2

# merge dataframes
interactome = pd.merge( undiPPI, undiPDI, on=['source', 'target'], how='outer',
                        indicator=True, suffixes=['_ppi', '_pdi']
                      ).rename({'_merge': 'edge type'}, axis=1
                      ).replace({'left_only': 'PPI', 'right_only': 'PDI', 'both': 'PPI+PDI'})

# compute final scores
interactome['score'] = interactome[['score_ppi', 'score_pdi']].fillna(0).sum(axis=1) / 4
interactome = interactome[['source', 'target', 'score', 'edge type']]


x = interactome.groupby('edge type').count()[['source']].rename({'source': 'interactions'}, axis=1).T
x['Total'] = x.sum(axis=1)
x

edge type,PDI,PPI,PPI+PDI,Total
interactions,11291,99042,151,110484


## Build the node table

Compute node weights for each layer from the results of the differential expression analysis.

In [5]:
nodes = set(interactome['source']) | set(interactome['target'])
print('interactome nodes:', len(nodes))

# get DEA results
comparisons, names = [], []
for file in sorted( glob.glob(dea_folder + '*') ):
    data = pd.read_csv(file, sep='\t', index_col=0)
    name = file.split('/')[-1].split('.')[0].split('_')[-1]
    comparisons.append( data[['logFC', 'FDR']] )
    names.append( name )
comparisons = pd.concat( comparisons, axis=1, keys=names, names=['comparison','value'] )

print('genes from the RNA-seq:', len(comparisons))

# remove layers
comparisons = comparisons.iloc[ :, :(9*2) ]

# keep the genes with RNA-seq values and which are within the interactome
genesToKeep = set(comparisons.index) & nodes
comparisons = comparisons.loc[ genesToKeep , : ]
intraLayerEdgeTable = interactome[ interactome['source'].isin(genesToKeep)
                                   & interactome['target'].isin(genesToKeep)
                                 ].reset_index(drop=True)

print('genes with RNA-seq values and within the interactome:', len(genesToKeep))

print('edges filtered out:', len(interactome) - len(intraLayerEdgeTable))
    
x = intraLayerEdgeTable.groupby('edge type').count()[['source']
                      ].rename({'source': 'interactions'}, axis=1).T
x['Total'] = x.sum(axis=1)
x

interactome nodes: 5979
genes from the RNA-seq: 6506
genes with RNA-seq values and within the interactome: 5787
edges filtered out: 1048


edge type,PDI,PPI,PPI+PDI,Total
interactions,10956,98329,151,109436


In [6]:
print('meaning of layer IDs:')
for i, c in enumerate( comparisons.columns.get_level_values(0).unique() ): print(i+1, '\t', c)

meaning of layer IDs:
1 	 10-15vs0-5
2 	 20-25vs10-15
3 	 30-35vs20-25
4 	 40-45vs30-35
5 	 50-55vs40-45
6 	 60-65vs50-55
7 	 70-75vs60-65
8 	 80-85vs70-75
9 	 90-95vs80-85


In [7]:
# get values
lfc = comparisons.xs( 'logFC', level='value', axis=1 )
fdr = comparisons.xs( 'FDR', level='value', axis=1 )
fdr = fdr.replace( 0, fdr[ fdr > 0 ].min() ) # remove null FDR

# compute the score
dysregWeight = ( np.abs( lfc ) * -np.log10( fdr ) )

# normalize between 0.001 and 1, according to anatapp needs
dysregWeight = ( 1 - 0.001 ) * ( dysregWeight / dysregWeight.max() ).fillna(0) + 0.001
dysregWeight.columns = [ 'Weight_' + str(i) for i in range(1, len(dysregWeight.columns) + 1) ]

In [8]:
# get queries per layer
queries = ( fdr <= fdrCutoff )
queries.columns = [ 'Query_' + str(i) for i in range(1, len(queries.columns) + 1) ]

In [9]:
queries.sum()

Query_1    1694
Query_2    2707
Query_3     989
Query_4     483
Query_5      90
Query_6      67
Query_7      77
Query_8      45
Query_9       5
dtype: int64

In [10]:
# build table
nodeTable = dysregWeight.reset_index().rename_axis("",axis=1).rename({'index': 'Node'}, axis=1)

# add tag for core genes
core_genes = pd.read_csv( coreGenes_file, sep='\t', header=None )
core_genes['Core'] = True
nodeTable = pd.merge( core_genes, nodeTable, left_on=1, right_on='Node', how='right' ).iloc[:, 2:]
nodeTable = nodeTable[ ['Node', 'Core'] + list(nodeTable.columns[2:]) ]
nodeTable['Core'] = nodeTable['Core'].fillna(False)

# add query columns
nodeTable = pd.merge( nodeTable, queries, left_on='Node', right_index=True, how='left' ).fillna(False)

print('core genes within the node table:', len(nodeTable[nodeTable['Core']]))

core genes within the node table: 30


## Build the inter-layer edge table

In [11]:
interLayerEdgeTable = []

# compute the inter-layer edge weights
cols = dysregWeight.columns
for i in range(1, len(cols)):
    wDys1 = dysregWeight[ cols[i-1] ]
    wDys2 = dysregWeight[ cols[i] ]
    interLayerEdgeTable.append( ( wDys1 + wDys2 ) / ( 1 + wDys1 + wDys2 ) )
    
# build the table
interLayerEdgeTable = pd.concat( interLayerEdgeTable, axis=1 )
colNames = [ f'Weight_{i}>{i+1}' for i in range(1, len(cols)) ]
interLayerEdgeTable.columns = colNames
interLayerEdgeTable = interLayerEdgeTable.reset_index().rename({'index': 'source'}, axis=1)
interLayerEdgeTable['target'] = interLayerEdgeTable['source']
interLayerEdgeTable = interLayerEdgeTable[['source', 'target'] + colNames]

## Export

In [12]:
# replace locus name with gene name for core genes
core_genes = pd.read_csv(coreGenes_file, sep='\t', header=None)
conversionDict = { k: v for k, v in zip(
    core_genes[1].to_list(),
    core_genes[0].to_list() ) }

nodeTable['Node'] = nodeTable['Node'].replace(conversionDict)
intraLayerEdgeTable['source'] = intraLayerEdgeTable['source'].replace(conversionDict)
intraLayerEdgeTable['target'] = intraLayerEdgeTable['target'].replace(conversionDict)
interLayerEdgeTable['source'] = interLayerEdgeTable['source'].replace(conversionDict)
interLayerEdgeTable['target'] = interLayerEdgeTable['target'].replace(conversionDict)

# export
nodeTable.to_csv( nodeTable_file, sep='\t', index=False )
intraLayerEdgeTable.to_csv( intraTable_file, sep='\t', index=False )
interLayerEdgeTable.to_csv( interTable_file, sep='\t', index=False )