# Preprocessing of PCNet PPI network
In this notebook, I'll preprocess the [PCNet network](https://www.sciencedirect.com/science/article/pii/S2405471218300954) from [NDEx network exchange](http://www.ndexbio.org/#/networkset/e8ebbdde-86dc-11e7-a10d-0ac135e8bacf?accesskey=7fbd23635b798321954e66c63526c46397a3f45b40298cf43f22d07d4feed0fa).
The network comes in the strange cx file format but can be preprocessed using json and similar modules.

In [2]:
import numpy as np
import pandas as pd
import json
import networkx as nx
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [8]:
pcnet_file = '../../../data/networks/pcnet.cx'
with open(pcnet_file) as cx_f:
    all_network_components = json.load(cx_f)

pcnet_nodes = None
pcnet_edges = None
for comp in all_network_components:
    if 'nodes' in comp:
        pcnet_nodes = comp['nodes']
    if 'edges' in comp:
        pcnet_edges = comp['edges']
print ("Loaded PCNet PPI network with {} nodes and {} edges".format(len(pcnet_nodes), len(pcnet_edges)))

Loaded PCNet PPI network with 19781 nodes and 2724724 edges


In [16]:
# get DataFrames for edges and nodes
edgelist = pd.DataFrame(pcnet_edges).drop('@id', axis=1)
edgelist.columns = ['Source', 'Target']
nodes = pd.DataFrame(pcnet_nodes).drop('@id', axis=1)
nodes.columns = ['Name', 'hgnc_symbol']

# join edgelist with the node names to have meaningful node names
edgelist_names_source = edgelist.join(nodes.drop('hgnc_symbol', axis=1), on='Source')
edgelist_names_source.columns = ['Source', 'Target', 'Source_Name']
edgelist_names = edgelist_names_source.join(nodes.drop('hgnc_symbol', axis=1), on='Target')
edgelist_names.columns = ['Source', 'Target', 'Source_Name', 'Target_Name']
edgelist_names.head()

Unnamed: 0,Source,Target
0,0,1
1,0,2
2,0,6239
3,0,7486
4,0,6851


In [28]:
# get networkx graph from edgelist
G = nx.from_pandas_edgelist(edgelist_names, source='Source_Name', target='Target_Name')
G.number_of_nodes(), G.number_of_edges(), nx.is_connected(G)

In [32]:
edgelist_names.to_csv('../../../data/networks/pcnet_edgelist.tsv.gz', compression='gzip', sep='\t')