# 3.2 (Mouse) Generate tables for the multilayer network


First, the interactome was built such as:

1. All interactions were set as undirected.
2. A confidence score of 1 was assigned to the PDIs.
3. Multi-edges were aggregated from the PPI table, such as the mean of PPI scores for a given node pair was calculated.

Node weights were calculated such as:

1. Node weights were computed with: `- log10(Padj) * |LFC|`. Nodes which were not present in the RNA-sequencing dataset were filtered out.
2. Then, for each column, weights were normalized to a range between 0.001 and 1, as the null weight is not managed by AnatApp.

A node was tagged as a query for a given layer if its gene was differentially expressed (adjusted p-value <= 0.05) at this layer. 

Aggregated weight and query columns were added to the node table in order to have the same file for both tests with TimeNexus and the extracting apps. Aggregated weights were the mean of weights across layers. For the aggregated query column, if a gene was differentially expressed in any comparison, then it was tagged as a query node.

Inter-layer edge's weights were computed using `( wDys1 + wDys2 ) / ( 1 + wDys1 + wDys2 )`, where wDys1 and wDys2 are the weights of the node in the layer i and i+1, respectively.

## Input

* `data-create_networks/hitpredit-03Aug2020/mouse_ppi.tsv`: cleaned HitPredict protein-protein interaction network.
* Files within the folder `data-create_networks/mouse/comparisons/`: results of the DEA for the 18 comparisons.

## Output

* `data-create_networks/mouse_multiLayerNetwork/nodeTable.tsv`: node table of the multi-layer network.
* `data-create_networks/mouse_multiLayerNetwork/intraLayerEdgeTable.tsv`: intra-layer edge table of the multi-layer network.
* `data-create_networks/mouse_multiLayerNetwork/interLayerEdgeTable.tsv`: inter-layer edge table of the multi-layer network.

In [1]:
import pandas as pd
import numpy as np
import glob

In [2]:
ppiTable_file = '../../../data-create_networks/hitpredit-03Aug2020/mouse_ppi.tsv'
dea_folder = '../../../data-create_networks/mouse/comparisons/'
nodeTable_file = '../../../data-create_networks/mouse_multiLayerNetwork/nodeTable.tsv'
intraTable_file = '../../../data-create_networks/mouse_multiLayerNetwork/intraLayerEdgeTable.tsv'
interTable_file = '../../../data-create_networks/mouse_multiLayerNetwork/interLayerEdgeTable.tsv'

fdrCutoff = 0.05

## Import data

In [3]:
ppi = pd.read_csv( ppiTable_file, sep='\t' )

## Build the interactome

In [4]:
interactome = ppi.copy()
interactome['edge type'] = 'PPI'

interactome['source'] = interactome['source'].str.capitalize()
interactome['target'] = interactome['target'].str.capitalize()

x = interactome.groupby('edge type').count()[['source']].rename({'source': 'interactions'}, axis=1).T
x['Total'] = x.sum(axis=1)
x

edge type,PPI,Total
interactions,33565,33565


## Build the node table

Compute node weights for each layer from the results of the differential expression analysis.

In [5]:
nodes = set(interactome['source']) | set(interactome['target'])
print('interactome nodes:', len(nodes))

# get DEA results
comparisons, names = [], []
for file in sorted( glob.glob(dea_folder + '*') ):
    data = pd.read_csv(file, sep='\t', index_col=0)
    name = file.split('/')[-1].split('.')[0].split('_')[-1]
    comparisons.append( data[['logFC', 'FDR']] )
    names.append( name )
comparisons = pd.concat( comparisons, axis=1, keys=names, names=['comparison','value'] )
comparisons = comparisons.drop(['Day3vs1', 'Day30vs3'], axis=1)

print('genes from the RNA-seq:', len(comparisons))

# keep the genes with RNA-seq values and which are within the interactome
genesToKeep = set(comparisons.index) & nodes
comparisons = comparisons.loc[ genesToKeep , : ]
intraLayerEdgeTable = interactome[ interactome['source'].isin(genesToKeep)
                                   & interactome['target'].isin(genesToKeep)
                                 ].reset_index(drop=True)

print('genes with RNA-seq values and within the interactome:', len(genesToKeep))

print('edges filtered out:', len(interactome) - len(intraLayerEdgeTable))
    
x = intraLayerEdgeTable.groupby('edge type').count()[['source']
                      ].rename({'source': 'interactions'}, axis=1).T
x['Total'] = x.sum(axis=1)
x

interactome nodes: 9509
genes from the RNA-seq: 17328
genes with RNA-seq values and within the interactome: 3994
edges filtered out: 26269


edge type,PPI,Total
interactions,7296,7296


In [6]:
print('meaning of layer IDs:')
for i, c in enumerate( comparisons.columns.get_level_values(0).unique() ): print(i+1, '\t', c)

meaning of layer IDs:
1 	 Day1vs0
2 	 Day3vs0
3 	 Day30vs0


In [7]:
# get values
lfc = comparisons.xs( 'logFC', level='value', axis=1 )
fdr = comparisons.xs( 'FDR', level='value', axis=1 )
fdr = fdr.replace( 0, fdr[ fdr > 0 ].min() ) # remove null FDR

# compute the score
dysregWeight = ( np.abs( lfc ) * -np.log10( fdr ) )

# normalize between 0.001 and 1, according to anatapp needs
dysregWeight = ( 1 - 0.001 ) * ( dysregWeight / dysregWeight.max() ).fillna(0) + 0.001
dysregWeight.columns = [ 'Weight_' + str(i) for i in range(1, len(dysregWeight.columns) + 1) ]

In [9]:
# get queries per layer
queries = ( fdr <= fdrCutoff )
queries.columns = [ 'Query_' + str(i) for i in range(1, len(queries.columns) + 1) ]

pd.DataFrame(
    [ [ queries[c].sum() ]
      for c in queries.columns ],
    index = [f'layer {str(i+1)}' for i in range(len(queries.columns))],
    columns = [ '# query nodes' ]
)

Unnamed: 0,# query nodes
layer 1,24
layer 2,14
layer 3,2


In [9]:
print( 'query1:', ', '.join( queries[queries['Query_1']].index.sort_values() ) )
print( 'query2:', ', '.join( queries[queries['Query_2']].index.sort_values() ) )
print( 'query3:', ', '.join( queries[queries['Query_3']].index.sort_values() ) )

query1: Bst2, Ddx58, Dtx3l, Herc6, Ifih1, Ifit1, Ifit3, Iigp1, Irf7, Irgm1, Isg15, Ly6a, Ly6e, Myh1, Oasl2, Parp9, Rsad2, Rtp4, Stat1, Tap1, Tap2, Tifa, Zbp1, Znfx1
query2: Arrb1, Bdnf, Calca, Ccnd1, Etv5, Hif3a, Ifit1, Loxl4, Matn2, Myh1, Nptx1, Oasl2, Sgk1, Snta1
query3: Apoe, Ncam2


In [10]:
# build table
nodeTable = dysregWeight.reset_index().rename_axis("",axis=1).rename({'index': 'Node'}, axis=1)

# add query columns
nodeTable = pd.merge( nodeTable, queries, left_on='Node', right_index=True, how='left' ).fillna(False)

## Build the inter-layer edge table

In [11]:
interLayerEdgeTable = []

# compute the inter-layer edge weights
cols = dysregWeight.columns
for i in range(1, len(cols)):
    wDys1 = dysregWeight[ cols[i-1] ]
    wDys2 = dysregWeight[ cols[i] ]
    interLayerEdgeTable.append( ( wDys1 + wDys2 ) / ( 1 + wDys1 + wDys2 ) )
    
# build the table
interLayerEdgeTable = pd.concat( interLayerEdgeTable, axis=1 )
colNames = [ f'Weight_{i}>{i+1}' for i in range(1, len(cols)) ]
interLayerEdgeTable.columns = colNames
interLayerEdgeTable = interLayerEdgeTable.reset_index().rename({'index': 'source'}, axis=1)
interLayerEdgeTable['target'] = interLayerEdgeTable['source']
interLayerEdgeTable = interLayerEdgeTable[['source', 'target'] + colNames]

## Export

In [12]:
# export
nodeTable.to_csv( nodeTable_file, sep='\t', index=False )
intraLayerEdgeTable.to_csv( intraTable_file, sep='\t', index=False )
interLayerEdgeTable.to_csv( interTable_file, sep='\t', index=False )