# Constructing a Cancer-Specific Network by Extraction from the PCNet

__Introduction:__  

In this notebook, methods for extracting a cancer subnetwork from [__PCNet__](http://www.ndexbio.org/#/network/f93f402c-86d4-11e7-a10d-0ac135e8bacf) [1] is derived from the original Jupyter notebook used in [__cancer subnetwork construction__](https://github.com/idekerlab/pyNBS/blob/master/Supplementary_Notebooks/Cancer%20Subnetwork%20Construction.ipynb), a pre-processing step of propagating network used for the [__pyNBS algorithm__](https://github.com/idekerlab/pyNBS) [2].  

__The following is a list of the four cancer-related gene sets used to filter PCNet (~ a baseline cancer gene set):__  

|File Name|Cancer Gene Set Description|Citation|
|:---|:---|:---|
|hallmarks.txt|Genes from hallmark cancer pathways|Hanahan D and Weinberg RA (2011) Hallmarks of Cancer: The Next Generation. Cell. 144(5), 646-674.|
|vogelstein.txt|List of tumor suppressor and oncogenes from Vogelstein et al.|Vogelstein B, et al. (2013) Cancer genome landscapes. Science. 339(6127), 1546-1558.|
|sanger_CL_genes.txt|Recurrently mutated cancer genes discovered from cancer cell lines (Sanger UK)|Iorio F, et al. (2016) A Landscape of Pharmacogenomic Interactions in Cancer. Cell. 166(3), 740-754.|
|cgc.txt|Genes from the Cancer Gene Census (COSMIC v81)|Forbes SA, et al. (2017) COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45(D1), D777-D783.|


- [1] Huang, J.K., et al., Systematic evaluation of molecular networks for discovery of disease genes. Cell systems, 2018. 6(4): p. 484-495. e5.
- [2] Huang, J.K., et al., pyNBS: a Python implementation for network-based stratification of tumor mutations. Bioinformatics, 2018. 34(16): p. 2859-2861.

## Requirement: _Python 2.7 conda environment_

- $ conda create -n (your_env_name) python=2.7

- $ conda activate (your_env_name)

- $ conda install ipykernel

- $ conda install -c anaconda jupyter

- $ conda install -c conda-forge pandas=0.19.2

- $ conda install -c anaconda networkx=2.2

## STEP 1: Get cancer gene list

In [1]:
import pandas as pd
import networkx as nx

# Import pyNBS modules
from pyNBS import gene_conversion_tools as gct #<---- will be used when export network
from pyNBS import data_import_tools as dit #<---- will be used when loading network

In [2]:
# Open table of cancer genes derived from four sources 
cancer_table = pd.read_csv("SuppTable2_GeneList_CancerSubnetwork_2322_4sources.tsv",sep='\t',header=0)

# Open table of cancer genes of interest
LUAD_table = pd.read_csv("LUAD_47drivers.tsv",sep='\t',header=0)

cancer_table.head(3)

Unnamed: 0,HallmarkCan_1711,Vogelstein_138,Sanger_2369,COSMIC_595
0,A2M,ABL1,ABCB1,ABI1
1,ABCB1,ACVR1B,ABL2,ABL1
2,ABCB11,AKT1,ACACA,ABL2


In [3]:
# Get list of cancer genes of interest
Hallmark_genes = list(cancer_table['HallmarkCan_1711'].dropna())
Volgel_genes = list(cancer_table['Vogelstein_138'].dropna())
Sanger_genes = list(cancer_table['Sanger_2369'].dropna())
Cosmic_genes = list(cancer_table['COSMIC_595'].dropna())
interest_genes = list(LUAD_table['LUAD_47'].dropna())

# Check the number of genes
print(len(Hallmark_genes), 
len(Volgel_genes), 
len(Sanger_genes), 
len(Cosmic_genes),
len(interest_genes)) 

(1711, 138, 2369, 595, 47)


In [4]:
#### Combine all cancer gene lists together

cancer_genes_ori = list(set(Hallmark_genes+Volgel_genes+Sanger_genes+Cosmic_genes))
cancer_genes_interest = list(set(Hallmark_genes+Volgel_genes+Sanger_genes+Cosmic_genes+interest_genes))

print("Number of all Cancer Genes:", len(cancer_genes_ori)) # 2322 genes
print("Number of all Cancer Genes plus LUAD drivers:", len(cancer_genes_interest))  # 2331 genes 
print("Number of genes increased :", len(cancer_genes_interest)- len(cancer_genes_ori)) # 9 genes

('Number of all Cancer Genes:', 2322)
('Number of all Cancer Genes plus LUAD drivers:', 2331)
('Number of genes increased :', 9)


## STEP 2: Extract subnetwork form PCNet using cancer gene list

### STEP 2.1: Load PCNet from a network file ```'~/PCNet.txt'```

In [5]:
# Load PCNet 
PCnetwork_file = 'PCNet.txt'
PCnetwork = nx.read_edgelist(PCnetwork_file, delimiter='\t', data=True)

# PCnet
PCnodes = PCnetwork.nodes 
len(PCnodes) # = 19781

19781

### STEP 2.2: Pre-check to make sure whether your genes of interest are avilable in PCNet

In [6]:
genecheck1 = []
for i in list(range(len(interest_genes))): # len(interest_genes) = # 47 genes 
    if interest_genes[i] not in PCnodes: # len(PCnodes) # = 19781
        genecheck1.append(interest_genes[i])
print genecheck1  # = 0, meaning that all 47 genes exist in PCNet nodes

[]


In [7]:
# alternatively, manually checking
'ADGRL2' in PCnodes #True

True

### STEP 2.3: Filter PCNet to only contain genes from the combined cancer gene list and the edges between those genes


In [8]:
# Filter PCNet using .subgraph()

cancer_subnetwork1 = PCnetwork.subgraph(cancer_genes_interest) # cancer_genes_interest = 2331
gene_degree1 = pd.Series(cancer_subnetwork1.degree(), name='degree')

print ("Number of connected genes in Cancer Subnetwork:", len(cancer_subnetwork1.nodes())-len(gene_degree1[gene_degree1==0]))
print ("Number of interactions in Cancer Subnetwork:", len(cancer_subnetwork1.edges()))

#****---------Note: Genes with no edges had not been removed in this step.---------*****

('Number of connected genes in Cancer Subnetwork:', 2304)
('Number of interactions in Cancer Subnetwork:', 204826)


## STEP 3: Export subnetwork to file

In [9]:
## Write output cancer subnetwork generated from cancer_genes_af that include 47 driver genes
# Write the filtered cancer subnetwork to file
# Note: Genes with no edges connecting them to any other gene will be removed during this step

gct.write_edgelist(cancer_subnetwork1.edges(),
                   './Output_subnetwork/CancerSubnetwork1_plusInterestLUAD47.txt', binary=True)

Edge list saved: 1.07 seconds


## STEP 4: Try to load newly created cancer subnetwork

### STEP 4.1: Load subnetwork created in the previous section

In [12]:
network_filepath1 = './Output_subnetwork/CancerSubnetwork1_plusInterestLUAD47.txt'
network1 = dit.load_network_file(network_filepath1)

network1_nodes = network1.nodes()
network1_edges = network1.edges()

print ("Number of connected genes in Cancer Subnetwork:", len(network1_nodes))
print ("Number of interactions in Cancer Subnetwork:", len(network1_edges))
print ("Number of nodes removed due to no interactions (edges) :", len(cancer_subnetwork1.nodes())-len(network1_nodes))

Network File Loaded: ./Output_subnetwork/CancerSubnetwork1_plusInterestLUAD47.txt
('Number of connected genes in Cancer Subnetwork:', 2297)
('Number of interactions in Cancer Subnetwork:', 204826)
('Number of nodes removed due to no interactions (edges) :', 7)


### STEP 4.2: Check to make sure if your genes of interest are avilable in extracted subnetwork

In [13]:
genecheck2 = []
for i in list(range(len(interest_genes))): # len(interest_genes) = # 47 genes 
    if interest_genes[i] not in network1_nodes: # len(network1_nodes) # = 2297
        genecheck2.append(interest_genes[i])
print genecheck2  

[]
