# Constructing a Cancer-Specific Network

__Introduction:__  
This notebook uses [__PCNet__](http://www.ndexbio.org/#/network/f93f402c-86d4-11e7-a10d-0ac135e8bacf) from (Huang, Carlin, et al. in press) and various collections of cancer-related genes (see below) to construct a cancer-specific subnetwork that can be used in the pyNBS algorithm to stratify patients with sparse mutational profiles. This method also uses a module wrapping the [MyGene.info](http://mygene.info/) Python API (developed by Huang et al.) to normalize all gene names to HUGO symbols. 

__Steps to construct Cancer Subnetwork:__
1. Load network
2. Compile all cancer genes from cancer-related gene sets into a single list
3. Extract only edges from network connecting cancer genes together, remove all other nodes and edges from the network
4. Write the filtered network to file as an edge list.

__The following is a list of the four cancer-related gene sets used to filter PCNet:__  

|File Name|Cancer Gene Set Description|Citation|
|:---|:---|:---|
|hallmarks.txt|Genes from hallmark cancer pathways|Hanahan D and Weinberg RA (2011) Hallmarks of Cancer: The Next Generation. Cell. 144(5), 646-674.|
|vogelstein.txt|List of tumor suppressor and oncogenes from Vogelstein et al.|Vogelstein B, et al. (2013) Cancer genome landscapes. Science. 339(6127), 1546-1558.|
|sanger_CL_genes.txt|Recurrently mutated cancer genes discovered from cancer cell lines (Sanger UK)|Iorio F, et al. (2016) A Landscape of Pharmacogenomic Interactions in Cancer. Cell. 166(3), 740-754.|
|cgc.txt|Genes from the Cancer Gene Census (COSMIC v81)|Forbes SA, et al. (2017) COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45(D1), D777-D783.|


In [1]:
import pandas as pd
import networkx as nx
from pyNBS import gene_conversion_tools as gct

## Load Network

In [2]:
network_file = './CancerSubnetwork_Data/PCNet.txt'
network = nx.read_edgelist(network_file, delimiter='\t', data=True)

## Get all cancer-related genes


#### Get genes from all cancer hallmark pathways and convert them from Entrez to HUGO Symbols (Hanahan, Weinberg 2011)

In [3]:
# Load pathway gene sets
f = open('./Supplementary_Notebook_Data/CancerSubnetwork_Data/hallmarks.txt')
lines = f.read().splitlines()
hallmark_genesets = {}
for line in lines:
    if '\t' in line:
        hallmark_genesets[line.split('\t')[0].split('|')[1]] = line.split('\t')[2:]

In [4]:
# Convert cancer-hallmark gene set genes to HUGO with MyGene.info
all_hallmark_genes_entrez = []
for hallmark in hallmark_genesets:
    all_hallmark_genes_entrez = all_hallmark_genes_entrez + hallmark_genesets[hallmark]
all_hallmark_genes_entrez = list(set(all_hallmark_genes_entrez))

In [5]:
# Get gene conversion query string
query_string, valid_genes, invalid_genes = gct.query_constructor(all_hallmark_genes_entrez)
# Set scopes (gene naming systems to search)
scopes = "entrezgene, retired"
# Set fields (systems from which to return gene names from)
fields = "symbol, entrezgene"
# Query MyGene.Info
match_list = gct.query_batch(query_string, scopes=scopes, fields=fields)

1711 Valid Query Genes
0 Invalid Query Genes
1711 Matched query results
Batch query complete: 3.66 seconds


In [6]:
# Get gene conversion maps
match_table_trim, query_to_symbol, query_to_entrez = gct.construct_query_map_table(match_list, valid_genes, display_unmatched_queries=True)
# Collapse cancer-hallmark gene set genes as HUGO Symbols only
all_hallmark_genes_symbol = [str(query_to_symbol[gene]) for gene in all_hallmark_genes_entrez]

Queries with partial matching results found: 7
{u'query': u'731751', u'notfound': True}
{u'query': u'652671', u'notfound': True}
{u'query': u'651610', u'notfound': True}
{u'query': u'652799', u'notfound': True}
{u'query': u'646821', u'notfound': True}
{u'query': u'652346', u'notfound': True}
{u'query': u'650621', u'notfound': True}

0 Queries with mutliple matches found

Query mapping table/dictionary construction complete: 0.89 seconds


#### Load genes determined by Vogelstein as tumor suppressors or oncogenes (Vogelstein et al 2013)

In [7]:
# Vogelstein cancer genes list
f = open('./Supplementary_Notebook_Data/CancerSubnetwork_Data/vogelstein.txt')
lines = f.read().splitlines()
Vogelstein_genes = [line.split('\t')[0] for line in lines]

#### Load genes determined as recurrently mutated across 1,001 cancer cell lines (Iorio et al 2016)

In [8]:
f = open('./Supplementary_Notebook_Data/CancerSubnetwork_Data/sanger_CL_genes.txt')
Sanger_genes = f.read().splitlines()

#### Load genes from the Cancer Gene Census from COSMIC v81 (Forbes et al 2017)

In [9]:
COSMIC_table = pd.read_csv('./Supplementary_Notebook_Data/CancerSubnetwork_Data/cgc_v81.txt')
COSMIC_genes = list(COSMIC_table['Gene Symbol'])

#### Combine all cancer gene lists together

In [17]:
cancer_genes = list(set(all_hallmark_genes_symbol+Vogelstein_genes+Sanger_genes+COSMIC_genes))
print "Number of HUGO Cancer Genes:", len(cancer_genes)

Number of HUGO Cancer Genes: 2322


### Generate Cancer Gene Network
Note: The resulting network may not be **exactly** the same as the Cancer Subnetwork found in ```'~/Examples/Example_Data/Network_Files/CancerSubnetwork.txt'``` due to the fact that [MyGene.Info](http://mygene.info/) may be updating gene name mappings over time.

In [13]:
# Filter PCNet to only contain genes from the combined cancer gene list and the edges between those genes
cancer_subnetwork = network.subgraph(cancer_genes)

In [21]:
gene_degree = pd.Series(cancer_subnetwork.degree(), name='degree')
print "Number of connected genes in Cancer Subnetwork:", len(cancer_subnetwork.nodes())-len(gene_degree[gene_degree==0])
print "Number of interactions in Cancer Subnetwork:", len(cancer_subnetwork.edges())

Number of connected genes in Cancer Subnetwork: 2290
Number of interactions in Cancer Subnetwork: 204373


In [22]:
# Write the filtered cancer subnetwork to file
# Note: Genes with no edges connecting them to any other gene will be removed during this step
gct.write_edgelist(cancer_subnetwork.edges(), './Supplementary_Notebook_Results/CancerSubnetwork.txt', binary=True)

Edge list saved: 0.35 seconds
