# Cholangiocarcinoma

## 1. What genetic aberrations found in Cholangiocarcinoma?

According to the entry at KEGG database, Cholangiocarcinoma has several documented genetic mutations.  In below I extract those genes from said entry (https://www.kegg.jp/entry/H00046)

In [1]:
import KEGGparser as kp # customised functions to extract data from KEGG when default parser is not available.
import pubmedExtractor as pme # customized functions using Biopython to extract PubMed abstracts.
import pandas as pd
import networkx as nx
import time

In [2]:
genes = kp.get_disease_genes('H00046')

In [3]:
# documented genetic mutations related to Cholangiocarcinoma
for entry in genes:
    print (entry) 

['K-ras', 'mutation', '[HSA:3845]', '[KO:K07827]']
['p53', 'mutation', '[HSA:7157]', '[KO:K04451]']
['c-Met', 'overexpression', '[HSA:4233]', '[KO:K05099]']
['ERBB2', 'overexpression,amplification', '[HSA:2064]', '[KO:K05083]']
['p16/INK4A', 'mutation', '[HSA:1029]', '[KO:K06621]']
['COX2', 'overexpression', '[HSA:5743]', '[KO:K11987]']


In [4]:
def get_gene_network(geneids):
    """
    Cutomized parser to extract said information from KEGG
    Input: an URL of said Gene
    Output: networkIds in format of 'Nxxxxxx' that captures the signaling pathway containing the said gene
    """
    networkIds = []
    url = r'https://www.kegg.jp/entry/hsa:'+geneids
    tables = pd.read_html(url) # Returns list of all tables on page
    length = len(tables)
    for i in range(0,length,1):
        if type(tables[i][0][0])==str:
            if tables[i][0][0].startswith('N') and tables[i][0][0][-1].isnumeric():
                networkIds.append(tables[i][0][0])
    return networkIds

In [5]:
%%time
collection = []
for entry in genes:
    ids = entry[2].replace('[','').replace(']','').replace('HSA:','')
    networks = get_gene_network(ids)
    for n in networks:
        #variant = kp.get_network_variant(n)
        collection.append([entry[0], entry[1], n])

CPU times: user 730 ms, sys: 18 ms, total: 748 ms
Wall time: 4.69 s


In [6]:
# the collection of genes and their related networks
collection

[['K-ras', 'mutation', 'N00001'],
 ['K-ras', 'mutation', 'N00002'],
 ['K-ras', 'mutation', 'N00003'],
 ['K-ras', 'mutation', 'N00004'],
 ['K-ras', 'mutation', 'N00005'],
 ['K-ras', 'mutation', 'N00006'],
 ['K-ras', 'mutation', 'N00007'],
 ['K-ras', 'mutation', 'N00008'],
 ['K-ras', 'mutation', 'N00009'],
 ['K-ras', 'mutation', 'N00011'],
 ['K-ras', 'mutation', 'N00012'],
 ['K-ras', 'mutation', 'N00014'],
 ['K-ras', 'mutation', 'N00015'],
 ['K-ras', 'mutation', 'N00016'],
 ['K-ras', 'mutation', 'N00018'],
 ['K-ras', 'mutation', 'N00019'],
 ['K-ras', 'mutation', 'N00020'],
 ['K-ras', 'mutation', 'N00021'],
 ['K-ras', 'mutation', 'N00022'],
 ['K-ras', 'mutation', 'N00030'],
 ['K-ras', 'mutation', 'N00031'],
 ['K-ras', 'mutation', 'N00032'],
 ['K-ras', 'mutation', 'N00041'],
 ['K-ras', 'mutation', 'N00096'],
 ['K-ras', 'mutation', 'N00097'],
 ['K-ras', 'mutation', 'N00103'],
 ['K-ras', 'mutation', 'N00104'],
 ['K-ras', 'mutation', 'N00152'],
 ['K-ras', 'mutation', 'N00157'],
 ['K-ras', 'mu

In [7]:
%%time
interim = []
for entry in collection:
    networkName = kp.get_network_name(entry[2])
    definition = kp.get_network_definition(entry[2])
    variant = kp.get_network_variant(entry[2])
    interim.append([entry[0], entry[1], entry[2], networkName, definition[0], definition[1], variant])

    parsing for now. please report this issue with the KEGG 
    identifier (N00002                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00002                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00002                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00003                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00003                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00003                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue 

    parsing for now. please report this issue with the KEGG 
    identifier (N00014                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00016                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00016                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00016                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00018                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00018                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue 

    parsing for now. please report this issue with the KEGG 
    identifier (N00097                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00097                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00104                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00104                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00104                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00157                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue 

    parsing for now. please report this issue with the KEGG 
    identifier (N00259                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00259                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00259                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00276                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00276                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00276                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue 

    parsing for now. please report this issue with the KEGG 
    identifier (N00539                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00539                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00539                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00539                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00540                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00540                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue 

    parsing for now. please report this issue with the KEGG 
    identifier (N00996                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00996                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N01062                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N01062                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N01062                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N01064                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue 

    parsing for now. please report this issue with the KEGG 
    identifier (N01361                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N01361                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N01361                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N01408                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N01408                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N01408                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue 

    parsing for now. please report this issue with the KEGG 
    identifier (N00167                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00169                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00169                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00169                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00223                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00223                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue 

    parsing for now. please report this issue with the KEGG 
    identifier (N00535                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00535                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00592                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00592                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00592                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00697                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue 

    parsing for now. please report this issue with the KEGG 
    identifier (N00247                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00247                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00247                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00247                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00247                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00247                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue 

    parsing for now. please report this issue with the KEGG 
    identifier (N01063                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N10024                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N10024                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N10024                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N10024                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N10024                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue 

    parsing for now. please report this issue with the KEGG 
    identifier (N10009                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N10009                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N10009                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N10009                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N10009                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue with the KEGG 
    identifier (N00067                      Network) into github.com/bioservices. Thanks T.C.[0m
    parsing for now. please report this issue 

CPU times: user 5.03 s, sys: 1.07 s, total: 6.11 s
Wall time: 1min 42s


In [8]:
# Normal pathway
for entry in interim:
    if entry[6][0]=='null':
        print (entry[0], entry[2], entry[4])

K-ras N00001 EGF -> EGFR -> GRB2 -> SOS -> RAS -> RAF -> MEK -> ERK -> CCND1
K-ras N00015 PDGF -> PDGFR -> GRB2 -> SOS -> RAS -> RAF -> MEK -> ERK
K-ras N00019 FGF -> FGFR -> GRB2 -> SOS -> RAS -> RAF -> MEK -> ERK
K-ras N00021 EGF -> (ERBB2+EGFR) -> GRB2 -> SOS -> RAS -> RAF -> MEK -> ERK
K-ras N00030 EGF -> EGFR -> GRB2 -> SOS -> RAS -> PI3K -> PIP3 -> AKT -| BAD
K-ras N00096 EGF -> EGFR -> GRB2 -> SOS -> RAS -> (RASSF1+RASSF5) -> STK4
K-ras N00103 EGF -> EGFR -> GRB2 -> SOS -> RAS -> RALGDS -> RAL
K-ras N00152 CXCL8 -> CXCR2 -> GNB/G -> RAS -> RAF1 -> MEK -> ERK
K-ras N00157 vGPCR -> GNB/G -> RAS -> RAF1 -> MEK -> ERK -> (HIF1A,FOS,JUN) => (VEGFA,PDGFB,ANGPT2)
K-ras N00160 K1 -> RAS -> RAF -> MEK -> ERK
K-ras N00215 KITLG -> KIT -> GRB2 -> SOS -> RAS -> RAF -> MEK -> ERK
K-ras N00216 HGF -> MET -> GRB2 -> SOS -> RAS -> RAF -> MEK -> ERK
K-ras N00217 FLT3LG -> FLT3 -> GRB2 -> SOS -> RAS -> RAF -> MEK -> ERK
K-ras N00218 FLT3LG -> FLT3 -> GRB2 -> SOS -> RAS -> PI3K -> PIP3 -> AKT -> M

In [10]:
# Deviant pathways. The aberrant gene is indicated with an asterisk.
# Drug/target is also indicated.
for entry in interim:
    if entry[6][0]!='null':
        print (entry[0], entry[2], entry[4])

K-ras N00002 BCR-ABL -> GRB2 -> SOS -> RAS -> RAF -> MEK -> ERK
K-ras N00003 KIT* -> GRB2 -> SOS -> RAS -> RAF -> MEK -> ERK
K-ras N00004 FLT3* -> GRB2 -> SOS -> RAS -> RAF -> MEK -> ERK
K-ras N00005 MET* -> GRB2 -> SOS -> RAS -> RAF -> MEK -> ERK -> CCND1
K-ras N00006 EGFR* -> GRB2 -> SOS -> RAS -> RAF -> MEK -> ERK
K-ras N00007 EML4-ALK -> RAS -> RAF -> MEK -> ERK -> CCND1
K-ras N00008 RET* -> RAS -> RAF -> MEK -> ERK
K-ras N00009 TRK* -> RAS -> RAF -> MEK -> ERK
K-ras N00011 FGFR3* -> GRB2 -> SOS -> RAS -> RAF -> MEK -> ERK -> MSK1 -> MYC
K-ras N00012 (KRAS*,NRAS*) -> RAF -> MEK -> ERK -> CCND1
K-ras N00014 EGFR* -> GRB2 -> SOS -> RAS -> RAF -> MEK -> ERK -> CCND1
K-ras N00016 PDGF* -> PDGFR -> GRB2 -> SOS -> RAS -> RAF -> MEK -> ERK
K-ras N00018 PDGFR* -> GRB2 -> SOS -> RAS -> RAF -> MEK -> ERK
K-ras N00020 FGFR* -> GRB2 -> SOS -> RAS -> RAF -> MEK -> ERK
K-ras N00022 EGF -> (ERBB2*+EGFR) -> GRB2 -> SOS -> RAS -> RAF -> MEK -> ERK
K-ras N00031 FLT3* -> GRB2 -> SOS -> RAS -> PI3K ->

From above, you will notice some interactions are not spelled out.  For example, 'RB1 // E2F' is not elucidated in the definition per KEGG's annotation.  Next I will make use of PubMed abstracts to see if the interaction(s) can be made clear.

In [12]:
p16 = []
for entry in interim:
    if 'p16/INK4A' == entry[0]:
        p16.append([entry[0], entry[2], entry[4], entry[5], entry[6]])

In [13]:
p16

[['p16/INK4A',
  'N00066',
  'CDKN2A -| MDM2 -| TP53 => CDKN1A -| (CCND+CDK4/6) -> RB1 // E2F',
  '1029 -| 4193 -| 7157 => 1026 -| ((595,894,896)+(1019,1021)) -> 5925 // (1869,1870,1871)',
  ['null']],
 ['p16/INK4A',
  'N00067',
  'CDKN2A* // MDM2 -| TP53 => CDKN1A -| (CCND+CDK4/6) -> RB1 // E2F',
  '1029v1 // 4193 -| 7157 => 1026 -| ((595,894,896)+(1019,1021)) -> 5925 // (1869,1870,1871)',
  '1029v1 (CDKN2A*)  CDKN2A deletion'],
 ['p16/INK4A',
  'N00069',
  'CDKN2A -| (CCND+CDK4/6) -> RB1 // E2F',
  '1029 -| ((595,894,896)+(1019,1021)) -> 5925 // (1869,1870,1871)',
  ['null']],
 ['p16/INK4A',
  'N00070',
  'CDKN2A* // (CCND+CDK4/6) -> RB1 // E2F',
  '1029v2 // ((595,894,896)+(1019,1021)) -> 5925 // (1869,1870,1871)',
  '1029v2 (CDKN2A*)  CDKN2A mutation'],
 ['p16/INK4A',
  'N00071',
  'CDKN2A* // (CCND+CDK4/6) -> RB1 // E2F',
  '1029v1 // ((595,894,896)+(1019,1021)) -> 5925 // (1869,1870,1871)',
  '1029v1 (CDKN2A*)  CDKN2A deletion'],
 ['p16/INK4A',
  'N00076',
  'CDKN2A* // MDM2 -| T

In [14]:
df = pd.DataFrame(p16, columns=['pathway','networkId','definition','code','mutation'])

In [16]:
df.to_csv('p16.csv', index=False)