<h3><center> Code to access TNFR superfamily proteins from Uniprot </center></h3>

#### <font color='brown'> Author: Kalyani Dhusia ; Contributors: Zhaoqian Su, Yinghao Wu </font> <br> <font> Dated: 03.25.2021 </font>

In [1]:
# Install a pip package in the current Jupyter kernel
#import sys
#!{sys.executable} -m pip install bioservices
#!{sys.executable} -m pip install stringdb

In [2]:
import numpy as np
import csv
import matplotlib.pyplot as plt
import scipy as sp
from mpl_toolkits.mplot3d import Axes3D
from Bio import SeqIO
import re
import requests
import time
import math
import pytest
import glob
# Show plots as part of the notebook
%matplotlib inline
# Standard library packages
import io
# Import Seaborn for graphics and plotting
import seaborn as sns
# Import bioservices module, to run remote UniProt queries
from bioservices import UniProt
# Import Pandas, so we can use dataframes
import pandas as pd

#### <font color='blue'> Direct retrieve TNFR superfamily protein data from Uniprot with sequences and other details using py3 code </font >

In [3]:
filepath = "/Users/saheeba/Desktop/TNFR_work/TNFR_uniprot.tab"
TNF = pd.read_csv(filepath, sep='\t')
TNFR = TNF[['Entry','Protein names','Gene names', 'Length', 'Sequence']]
print(TNFR.shape)
TNFR.head()

(29, 5)


Unnamed: 0,Entry,Protein names,Gene names,Length,Sequence
0,Q9Y6Q6,Tumor necrosis factor receptor superfamily mem...,TNFRSF11A RANK,616,MAPRARRRRPLFALLLLCALLARLQVALQIAPPCTSEKHYEHLGRC...
1,Q9Y5U5,Tumor necrosis factor receptor superfamily mem...,TNFRSF18 AITR GITR UNQ319/PRO364,241,MAQHGAMGAFRALCGLALLCALSLGQRPTGGPGCGPGRLLLGTGTD...
2,Q9UNE0,Tumor necrosis factor receptor superfamily mem...,EDAR DL,448,MAHVGDCTQTPWLPVLVVSLMCSARAEYSNCGENEYYNQTTGLCQE...
3,Q9UBN6,Tumor necrosis factor receptor superfamily mem...,TNFRSF10D DCR2 TRAILR4 TRUNDD UNQ251/PRO288,386,MGLWGQSVPTASSARAGRYPGARTASGTRPWLLDPKILKFVVFIVA...
4,Q9NS68,Tumor necrosis factor receptor superfamily mem...,TNFRSF19 TAJ TROY UNQ1888/PRO4333,423,MALKVLLEQEKTFFTLLVLLGYLSCKVTCESGDCRQQEFRDRSGNC...


In [4]:
TNFR.dropna(inplace = True)
new = TNFR["Gene names"].str.split(" ", n = 3, expand = True)
TNFR["gene1"]= new[0]
TNFR["gene2"]= new[1]
TNFR.drop(columns =["Gene names"], inplace = True)
print(TNFR.shape)
TNFR.head(2)

(29, 6)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,Entry,Protein names,Length,Sequence,gene1,gene2
0,Q9Y6Q6,Tumor necrosis factor receptor superfamily mem...,616,MAPRARRRRPLFALLLLCALLARLQVALQIAPPCTSEKHYEHLGRC...,TNFRSF11A,RANK
1,Q9Y5U5,Tumor necrosis factor receptor superfamily mem...,241,MAQHGAMGAFRALCGLALLCALSLGQRPTGGPGCGPGRLLLGTGTD...,TNFRSF18,AITR


#### <font color='green'> Direct retrieve TNF ligand superfamily protein data from Uniprot with sequences and other details using py3 code </font >

In [5]:
TNFfullURL = ('http://www.uniprot.org/uniprot/?'
'query=family:"tumor%20necrosis%20factor"%20NOT%20family:receptor&columns=id,protein names,genes,length,sequence'
              '&format=tab&fil=organism:%22Homo%20sapiens%20(Human)%20[9606]%22%20AND%20reviewed:yes')

In [6]:
result = requests.get(TNFfullURL)

In [7]:
if result.ok:
    print(result.text[:200])
else:
    print('Something went wrong ', result.status_code)

Entry	Protein names	Gene names	Length	Sequence
Q9Y275	Tumor necrosis factor ligand superfamily member 13B (B lymphocyte stimulator) (BLyS) (B-cell-activating factor) (BAFF) (Dendritic cell-derived TNF


In [8]:
#Convert the last search result into a dataframe in Pandas
TNFL = pd.read_table(io.StringIO(result.text))
#View the dataframe
print(TNFL.shape)
TNFL.head()

(18, 5)


Unnamed: 0,Entry,Protein names,Gene names,Length,Sequence
0,Q9Y275,Tumor necrosis factor ligand superfamily membe...,TNFSF13B BAFF BLYS TALL1 TNFSF20 ZTNF4 UNQ401/...,285,MDDSTEREQSRLTSCLKKREEMKLKECVSILPRKESPSVRSSKDGK...
1,P01374,Lymphotoxin-alpha (LT-alpha) (TNF-beta) (Tumor...,LTA TNFB TNFSF1,205,MTPPERLFLPRVCGTTLHLLLLGLLLVLLPGAQGLPGVGLTPSAAQ...
2,P23510,Tumor necrosis factor ligand superfamily membe...,TNFSF4 TXGP1,183,MERVQPLEENVGNAARPRFERNKLLLVASVIQGLGLLLCFTYICLH...
3,Q06643,Lymphotoxin-beta (LT-beta) (Tumor necrosis fac...,LTB TNFC TNFSF3,244,MGALGLEGRGGRLQGRGSLLLAVAGATSLVTLLLAVPITVLAVLAL...
4,O95150,Tumor necrosis factor ligand superfamily membe...,TNFSF15 TL1 VEGI,251,MAEDLGLSFGETASVEMLPEHGSCRPKARSSSARWALTCCLVLLPF...


In [9]:
TNFL.dropna(inplace = True)
new = TNFL["Gene names"].str.split(" ", n = 3, expand = True)
TNFL["gene1"]= new[0]
TNFL["gene2"]= new[1]
TNFL["gene3"]= new[2]
TNFL["gene4"]= new[3]
TNFL.drop(columns =["Gene names"], inplace = True)
print(TNFL.shape)
TNFL.head(2)

(18, 8)


Unnamed: 0,Entry,Protein names,Length,Sequence,gene1,gene2,gene3,gene4
0,Q9Y275,Tumor necrosis factor ligand superfamily membe...,285,MDDSTEREQSRLTSCLKKREEMKLKECVSILPRKESPSVRSSKDGK...,TNFSF13B,BAFF,BLYS,TALL1 TNFSF20 ZTNF4 UNQ401/PRO738
1,P01374,Lymphotoxin-alpha (LT-alpha) (TNF-beta) (Tumor...,205,MTPPERLFLPRVCGTTLHLLLLGLLLVLLPGAQGLPGVGLTPSAAQ...,LTA,TNFB,TNFSF1,


<font color="blue"> PrePPI: high confidence data >0.5 were saved for extracting network for extracellular region protein interctome  (https://honiglab.c2b2.columbia.edu/PrePPI/ref/preppi_final600.txt.tar.gz)</font>

<font color="blue"> PrePPI is database of predicted and experimentally determined protein-protein interactions (PPIs) for yeast and human. Predicted interactions are assigned a likelihood using a Bayesian framework that combines structural, functional, evolutionary and expression information. The database contains ~2 million predictions including 31,402 for yeast and 317,813 for human that are considered high confidence based on our analysis.</font>

In [10]:
filepath = "/Users/saheeba/Desktop/STR_string/preppi_final.csv"
preppi = pd.read_csv(filepath)
print(preppi.shape)
preppi.head(3)

(1048575, 12)


Unnamed: 0,prot1,prot2,str_score,protpep_score,str_max_score,red_score,ort_score,phy_score,coexp_score,go_score,total_score,final_score
0,Q13131,P14625,18.59,6.44772,18.59,4.2492,0.6153,2.416,9.4687,10.8,12008.4,12008.4
1,P06400,Q96N96,1.8315,14.3222,14.3222,4.2492,0.0,2.416,2.1077,10.8,3346.93,3346.93
2,Q7Z6V5,Q8NCE0,4.5712,0.0,4.5712,0.0,0.0,1.5978,9.4687,24.11,1667.4,1667.4


In [11]:
TNFR_list = TNFR.iloc[:,0].tolist()
print(type(TNFR_list))
print(len(TNFR_list))
print(TNFR_list)

TNF_list = TNFL.iloc[:,0].tolist()
print(type(TNF_list))
print(len(TNF_list))
print(TNF_list)

<class 'list'>
29
['Q9Y6Q6', 'Q9Y5U5', 'Q9UNE0', 'Q9UBN6', 'Q9NS68', 'Q9NP84', 'Q9HAV5', 'Q96RJ3', 'Q969Z4', 'Q93038', 'Q92956', 'Q07011', 'Q02223', 'P43489', 'P36941', 'P28908', 'P26842', 'P25942', 'P25445', 'P20333', 'P19438', 'P08138', 'O95407', 'O75509', 'O14836', 'O14798', 'O14763', 'O00300', 'O00220']
<class 'list'>
18
['Q9Y275', 'P01374', 'P23510', 'Q06643', 'O95150', 'Q92838', 'P29965', 'P32970', 'Q9UNG2', 'O14788', 'P48023', 'P01375', 'P41273', 'O75888', 'O43557', 'P32971', 'P50591', 'O43508']


In [12]:
tnfr_colone_in_preppi = preppi.loc[preppi['prot1'].isin(TNFR_list)]
tnfl_coltwo_with_tnfr_colone = tnfr_colone_in_preppi.loc[preppi['prot2'].isin(TNF_list)]
tnfl_colone_in_preppi = preppi.loc[preppi['prot1'].isin(TNF_list)]
tnfr_coltwo_with_tnfl_colone = tnfl_colone_in_preppi.loc[preppi['prot2'].isin(TNFR_list)]
print(tnfr_colone_in_preppi.shape)
print(tnfl_coltwo_with_tnfr_colone.shape)
print(tnfl_colone_in_preppi.shape)
print(tnfr_coltwo_with_tnfl_colone.shape)

(508, 12)
(14, 12)
(602, 12)
(115, 12)


In [13]:
tnfr_colone_in_preppi.reset_index(drop=True, inplace=True)
tnfl_coltwo_with_tnfr_colone.reset_index(drop=True, inplace=True)
tnfl_colone_in_preppi.reset_index(drop=True, inplace=True)
tnfr_coltwo_with_tnfl_colone.reset_index(drop=True, inplace=True)
bigdf = pd.concat([tnfl_coltwo_with_tnfr_colone,tnfr_coltwo_with_tnfl_colone], axis=0)

print(bigdf.shape)
bigdf.head()

(129, 12)


Unnamed: 0,prot1,prot2,str_score,protpep_score,str_max_score,red_score,ort_score,phy_score,coexp_score,go_score,total_score,final_score
0,P26842,O14788,468.913,3.86531,468.913,0.5125,0.0,0.0,2.1077,24.11,12212.2,12212.2
1,Q9UNE0,Q92838,22.5605,0.0,22.5605,0.7707,0.6153,0.0,3.9008,10.8,450.711,2084830.0
2,Q9UNE0,P48023,119.916,4.98492,119.916,2.3588,0.0,2.416,2.1077,0.89,1281.92,1281.92
3,Q9UNE0,O14788,119.916,4.98492,119.916,0.9127,0.0,0.0,3.9008,2.07,883.747,883.747
4,P26842,Q06643,66.1952,0.0,66.1952,0.9127,0.0,2.416,3.9008,5.86,3336.59,3336.59


In [14]:
#drop duplicates
unique_bigdf=bigdf.drop_duplicates()
print(unique_bigdf.shape)
export_csv = unique_bigdf.to_csv (r'/Users/saheeba/Desktop/TNFR_work/only_tnfr_tnfl_preppi.csv', index = None, header=True)
unique_bigdf.head(2)

(129, 12)


Unnamed: 0,prot1,prot2,str_score,protpep_score,str_max_score,red_score,ort_score,phy_score,coexp_score,go_score,total_score,final_score
0,P26842,O14788,468.913,3.86531,468.913,0.5125,0.0,0.0,2.1077,24.11,12212.2,12212.2
1,Q9UNE0,Q92838,22.5605,0.0,22.5605,0.7707,0.6153,0.0,3.9008,10.8,450.711,2084830.0


In [15]:
tnfr_preppi_df = preppi.loc[preppi['prot1'].isin(TNFR_list)]
print(tnfr_preppi_df.shape)

(508, 12)


<br>

#### <font color="blue"> Retrieve Insteracting partners for TNFR from STRINGDB (https://github.com/gpp-rnd/stringdb) using the String API </font>

<font color="blue"> STRING has an application programming interface (API) which enables you to get the data without using the graphical user interface of the web page. The API is convenient if you need to programmatically access some information but still do not want to download the entire dataset. There are several scenarios when it is practical to use it. For example, you might need to access some interaction from your own scripts or want to incorporate STRING network in your web page.</font>

In [16]:
import stringdb
#genes = ['TP53', 'BRCA1', 'FANCD1', 'FANCL']
string_ids = stringdb.get_string_ids(TNFR_list)
enrichment_df = stringdb.get_enrichment(string_ids.queryItem)
partners = stringdb.get_interaction_partners(string_ids.queryItem)
ppi = stringdb.get_ppi_enrichment(string_ids.queryItem)
network = stringdb.get_network(string_ids.queryItem)

In [17]:
print(enrichment_df.shape)
enrichment_df.head()

(280, 10)


Unnamed: 0,category,term,number_of_genes,number_of_genes_in_background,ncbiTaxonId,inputGenes,preferredNames,p_value,fdr,description
0,COMPARTMENTS,GOCC:0005886,22,3515,9606,"Q02223,P19438,P08138,O00220,P36941,O14836,P289...","TNFRSF17,TNFRSF1A,NGFR,TNFRSF10A,LTBR,TNFRSF13...",1.74e-11,3.58e-08,Plasma membrane
1,COMPARTMENTS,GOCC:0016020,26,5640,9606,"Q02223,P19438,P08138,O00220,P36941,Q9UNE0,O148...","TNFRSF17,TNFRSF1A,NGFR,TNFRSF10A,LTBR,EDAR,TNF...",1.29e-11,3.58e-08,Membrane
2,COMPARTMENTS,GOCC:0071944,22,3658,9606,"Q02223,P19438,P08138,O00220,P36941,O14836,P289...","TNFRSF17,TNFRSF1A,NGFR,TNFRSF10A,LTBR,TNFRSF13...",3.93e-11,3.63e-08,Cell periphery
3,COMPARTMENTS,GOCC:0002947,4,6,9606,"P19438,P20333,P43489,Q07011","TNFRSF1A,TNFRSF1B,TNFRSF4,TNFRSF9",8.07e-10,5.59e-07,Tumor necrosis factor receptor superfamily com...
4,COMPARTMENTS,GOCC:0016021,12,1438,9606,"Q02223,P19438,P08138,Q9UNE0,O14836,P26842,O755...","TNFRSF17,TNFRSF1A,NGFR,EDAR,TNFRSF13B,CD27,TNF...",4.08e-07,0.00023,Integral component of membrane


In [18]:
print(partners.shape)
partner= partners[['preferredName_A','preferredName_B', 'ncbiTaxonId','score','nscore','fscore','pscore','ascore','escore','dscore','tscore']]
partner.rename({'preferredName_A':'TNFR_name','preferredName_B':'partner_ID'},axis=1,inplace=True)
partner.head()

(3618, 13)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(**kwargs)


Unnamed: 0,TNFR_name,partner_ID,ncbiTaxonId,score,nscore,fscore,pscore,ascore,escore,dscore,tscore
0,TNFRSF17,TNFSF13B,9606,0.999,0,0.0,0,0.062,0.949,0.9,0.989
1,TNFRSF17,TNFRSF13B,9606,0.989,0,0.0,0,0.14,0.0,0.0,0.987
2,TNFRSF17,TNFRSF13C,9606,0.977,0,0.0,0,0.07,0.0,0.0,0.977
3,TNFRSF17,TNFSF13,9606,0.976,0,0.0,0,0.0,0.213,0.9,0.721
4,TNFRSF17,TRAF3,9606,0.962,0,0.0,0,0.0,0.32,0.9,0.495


In [19]:
partnerdf = partner.groupby('TNFR_name').agg(lambda x: ','.join(set(x))).reset_index()
#partnerdf = partner.groupby('TNFR_name')['patner_ID'].apply(','.join).reset_index()
print(partnerdf.shape)
partnerdf.head()

(29, 2)


Unnamed: 0,TNFR_name,partner_ID
0,CD27,"B2M,FASLG,STAT3,STAT5B,ITGA4,GPR126,GPR29,CCR9..."
1,CD40,"BIRC3,FASLG,GPR29,ENTPD1,CYLD,CCL19,IL7R,CD81,..."
2,EDA2R,"PHLDA3,TNFRSF12A,FAM212B,RIOK3,HEPH,TGM3,ACER2..."
3,EDAR,"FADD,DCHS2,TBX15,TNFRSF10A,NGFR,TNFRSF1A,TRAF1..."
4,FAS,"FADD,BIRC3,FASLG,TP73,TNFRSF10A,STAT3,STAT5B,S..."


In [20]:
# applying groupby() function to
# group the data on team value.
g_partner = partner.groupby('TNFR_name')
#print(g_partner.shape)
# Let's print the first entries
# in all the groups formed.

pd.DataFrame(g_partner.first())

Unnamed: 0_level_0,partner_ID,ncbiTaxonId,score,nscore,fscore,pscore,ascore,escore,dscore,tscore
TNFR_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
CD27,CD70,9606,0.999,0,0.0,0,0.0,0.675,0.8,0.989
CD40,CD40LG,9606,0.999,0,0.0,0,0.062,0.932,0.9,0.995
EDA2R,EDA,9606,0.996,0,0.0,0,0.262,0.213,0.8,0.973
EDAR,EDA,9606,0.998,0,0.0,0,0.0,0.486,0.8,0.99
FAS,FADD,9606,0.999,0,0.0,0,0.0,0.983,0.9,0.99
LTBR,TRAF3,9606,0.999,0,0.0,0,0.0,0.882,0.8,0.984
NGFR,MAGED1,9606,0.999,0,0.0,0,0.0,0.689,0.9,0.972
RELT,OXSR1,9606,0.938,0,0.0,0,0.0,0.683,0.0,0.813
TNFRSF10A,TNFSF10,9606,0.999,0,0.0,0,0.0,0.974,0.9,0.994
TNFRSF10B,TNFSF10,9606,0.999,0,0.0,0,0.0,0.982,0.9,0.994


In [21]:
ppi.head()

Unnamed: 0,number_of_nodes,number_of_edges,average_node_degree,local_clustering_coefficient,expected_number_of_edges,p_value
0,29,154,10.62,0.634,3,0.0


In [22]:
print(network.shape)
network.head()

(308, 13)


Unnamed: 0,stringId_A,stringId_B,preferredName_A,preferredName_B,ncbiTaxonId,score,nscore,fscore,pscore,ascore,escore,dscore,tscore
0,9606.ENSP00000053243,9606.ENSP00000368538,TNFRSF17,TNFRSF4,9606,0.422,0,0,0,0.0,0.0,0.0,0.422
1,9606.ENSP00000053243,9606.ENSP00000368538,TNFRSF17,TNFRSF4,9606,0.422,0,0,0,0.0,0.0,0.0,0.422
2,9606.ENSP00000053243,9606.ENSP00000478699,TNFRSF17,TNFRSF9,9606,0.459,0,0,0,0.0,0.0,0.0,0.46
3,9606.ENSP00000053243,9606.ENSP00000478699,TNFRSF17,TNFRSF9,9606,0.459,0,0,0,0.0,0.0,0.0,0.46
4,9606.ENSP00000053243,9606.ENSP00000263932,TNFRSF17,TNFRSF8,9606,0.462,0,0,0,0.063,0.0,0.0,0.45


<br>

#### <font color="blue"> Retrieve Insteracting partners for TNF from STRINGDB using the String API </font>

In [23]:
import stringdb

string_ids = stringdb.get_string_ids(TNF_list)
enrichmenttnf = stringdb.get_enrichment(string_ids.queryItem)
partnertnf = stringdb.get_interaction_partners(string_ids.queryItem)
ppitnf = stringdb.get_ppi_enrichment(string_ids.queryItem)
networktnf = stringdb.get_network(string_ids.queryItem)

In [24]:
print(partnertnf.shape)
tpartner= partnertnf[['preferredName_A','preferredName_B', 'ncbiTaxonId','score','nscore','fscore','pscore','ascore','escore','dscore','tscore']]
tpartner.rename({'preferredName_A':'TNF_ligand','preferredName_B':'partner_ID'},axis=1,inplace=True)
tpartner.head()

(3375, 13)


Unnamed: 0,TNF_ligand,partner_ID,ncbiTaxonId,score,nscore,fscore,pscore,ascore,escore,dscore,tscore
0,TNFSF8,TNFRSF8,9606,0.998,0,0.0,0,0.08,0.487,0.8,0.989
1,TNFSF8,CD40,9606,0.779,0,0.0,0,0.074,0.0,0.0,0.771
2,TNFSF8,TNFSF4,9606,0.697,0,0.0,0,0.063,0.0,0.0,0.689
3,TNFSF8,CD70,9606,0.687,0,0.0,0,0.062,0.0,0.0,0.681
4,TNFSF8,TNFSF9,9606,0.681,0,0.0,0,0.062,0.13,0.0,0.641


In [25]:
tpartnerdf = tpartner.groupby('TNF_ligand').agg(lambda x: ','.join(set(x))).reset_index()
#partnerdf = partner.groupby('TNFR_name')['patner_ID'].apply(','.join).reset_index()
print(tpartnerdf.shape)
tpartnerdf.head()

(18, 2)


Unnamed: 0,TNF_ligand,partner_ID
0,CD40LG,"BIRC3,FASLG,GPR29,ENTPD1,GP1BA,CCL19,IL7R,C4BP..."
1,CD70,"FASLG,TNFSF18,TNFRSF1A,CD28,IL7R,BTLA,CD40LG,P..."
2,EDA,"MAP3K7,MAPK9,FN1,LRRC7,ERBB2IP,KIAA1715,FAM46D..."
3,FASLG,"FADD,BIRC3,PACSIN2,STAT3,TNFRSF10A,STAT5B,CCR2..."
4,LTA,"FADD,BIRC3,TNFRSF12A,STAT3,CCL8,IL1R1,ACOT13,C..."


In [26]:
export_csv = partner.to_csv (r'/Users/saheeba/Desktop/TNFR_work/TNFR_partners.csv', index = None, header=True)
export_csv = tpartner.to_csv (r'/Users/saheeba/Desktop/TNFR_work/TNFligand_partners.csv', index = None, header=True)

<br>

### <font color='green'> PATHBANK pathway data: To add corresponding info to the table </font>

In [27]:
pathbank = pd.read_csv('/Users/saheeba/Downloads/pathbank_all_proteins.csv')
print(pathbank.shape)
pathbank.head()

  interactivity=interactivity, compiler=compiler, result=result)


(780292, 11)


Unnamed: 0,PathBank ID,Pathway Name,Pathway Subject,Species,UniProtID,Protein Name,HMDBP ID,DrugBank ID,GenBank ID,Gene Name,Locus
0,SMP0000055,Alanine Metabolism,Metabolic,Homo sapiens,P49588,"Alanine--tRNA ligase, cytoplasmic",HMDBP00625,,AC012184,AARS,16q22
1,SMP0000055,Alanine Metabolism,Metabolic,Homo sapiens,P24298,Alanine aminotransferase 1,HMDBP00850,,U70732,GPT,8q24.3
2,SMP0000055,Alanine Metabolism,Metabolic,Homo sapiens,P11498,"Pyruvate carboxylase, mitochondrial",HMDBP00019,,K02282,PC,11q13.4-q13.5
3,SMP0000055,Alanine Metabolism,Metabolic,Homo sapiens,P21549,Serine--pyruvate aminotransferase,HMDBP00789,,CH471063,AGXT,2q37.3
4,SMP0000055,Alanine Metabolism,Metabolic,Homo sapiens,Q5JTZ9,"Alanine--tRNA ligase, mitochondrial",HMDBP10671,,BC131728,AARS2,6p21.1


In [28]:
TNF =[]
for x in TNFR_list:
    TNF.append(x)
print(TNF)

for x in TNF_list:
    TNF.append(x)
print(TNF)

print(len(TNF_list))

['Q9Y6Q6', 'Q9Y5U5', 'Q9UNE0', 'Q9UBN6', 'Q9NS68', 'Q9NP84', 'Q9HAV5', 'Q96RJ3', 'Q969Z4', 'Q93038', 'Q92956', 'Q07011', 'Q02223', 'P43489', 'P36941', 'P28908', 'P26842', 'P25942', 'P25445', 'P20333', 'P19438', 'P08138', 'O95407', 'O75509', 'O14836', 'O14798', 'O14763', 'O00300', 'O00220']
['Q9Y6Q6', 'Q9Y5U5', 'Q9UNE0', 'Q9UBN6', 'Q9NS68', 'Q9NP84', 'Q9HAV5', 'Q96RJ3', 'Q969Z4', 'Q93038', 'Q92956', 'Q07011', 'Q02223', 'P43489', 'P36941', 'P28908', 'P26842', 'P25942', 'P25445', 'P20333', 'P19438', 'P08138', 'O95407', 'O75509', 'O14836', 'O14798', 'O14763', 'O00300', 'O00220', 'Q9Y275', 'P01374', 'P23510', 'Q06643', 'O95150', 'Q92838', 'P29965', 'P32970', 'Q9UNG2', 'O14788', 'P48023', 'P01375', 'P41273', 'O75888', 'O43557', 'P32971', 'P50591', 'O43508']
18


In [29]:
tnf_pathway = pathbank.loc[pathbank['UniProtID'].isin(TNF)]
tnf_pathway.reset_index(drop=True, inplace=True)
print(tnf_pathway.shape)
tnf_pathway.head(2)

(11, 11)


Unnamed: 0,PathBank ID,Pathway Name,Pathway Subject,Species,UniProtID,Protein Name,HMDBP ID,DrugBank ID,GenBank ID,Gene Name,Locus
0,SMP0000358,Fc Epsilon Receptor I Signaling in Mast Cells,Protein,Homo sapiens,P01375,Tumor necrosis factor,HMDBP02070,,X01394,TNF,6p21.3
1,SMP0063792,TNF/Stress Related Signaling,Protein,Homo sapiens,P01375,Tumor necrosis factor,HMDBP02070,,X01394,TNF,6p21.3


In [30]:
tnf_pathway = tnf_pathway[["UniProtID", "Pathway Name"]]
print(tnf_pathway.shape)
tnf_pathway.head()

(11, 2)


Unnamed: 0,UniProtID,Pathway Name
0,P01375,Fc Epsilon Receptor I Signaling in Mast Cells
1,P01375,TNF/Stress Related Signaling
2,P19438,TNF/Stress Related Signaling
3,P01375,Cadmium Induces DNA Synthesis and Proliferatio...
4,P19438,NF-kB Signaling Pathway


In [31]:
export_csv = tnf_pathway.to_csv (r'/Users/saheeba/Desktop/TNFR_work/tnf_pathbank.csv', index = None, header=True)

<br>

### <font color='#800000'> SIGNALINK pathway data: To add corresponding info to the table </font>

In [32]:
signalink = pd.read_csv('/Users/saheeba/Desktop/TNFR_work/SignaLink_Aug28.csv')
print(signalink.shape)
signalink.head()

(87581, 20)


Unnamed: 0,source_name,source_uniprotAC,source_speciesID,source_species,source_topology,source_pathways,target_name,target_uniprotAC,target_speciesID,target_species,target_topology,target_pathways,layer,interaction_type,directness,effect,references,source,confidence_score,score_from_the_source
0,NFKB1,P19838,ENSG00000109320,H. sapiens,Transcription factor,,PRKCE,Q02156,ENSG00000171132,H. sapiens,Scaffold,,Transcriptional regulation,Transcriptional directed,indirect,unknown,14681366,"JASPAR(url: http://jaspar.cgb.ki.se/ ,pmid: 14...",GO Semantic Similarity: 0.563594,"Jaspar: 10.1804, Jaspar: 10.1804, Jaspar: 10.3..."
1,NFKB1,P19838,ENSG00000109320,H. sapiens,Transcription factor,,PAG1,Q9NWQ8,ENSG00000076641,H. sapiens,"Co-factor,Scaffold",RTK(non-core),Transcriptional regulation,Transcriptional directed,indirect,unknown,14681366,"JASPAR(url: http://jaspar.cgb.ki.se/ ,pmid: 14...",GO Semantic Similarity: 0.521359,"Jaspar: 10.1804, Jaspar: 10.1804"
2,NFKB1,P19838,ENSG00000109320,H. sapiens,Transcription factor,,MSTP055,Q58WW2,ENSG00000143164,H. sapiens,,,Transcriptional regulation,Transcriptional directed,indirect,unknown,14681366,"JASPAR(url: http://jaspar.cgb.ki.se/ ,pmid: 14...",GO Semantic Similarity: 0.294021,"Jaspar: 10.1804, Jaspar: 10.1804"
3,NFKB1,P19838,ENSG00000109320,H. sapiens,Transcription factor,,PRKCH,P24723,ENSG00000027075,H. sapiens,,,Transcriptional regulation,Transcriptional directed,indirect,unknown,14681366,"JASPAR(url: http://jaspar.cgb.ki.se/ ,pmid: 14...",GO Semantic Similarity: 0.961869,"Jaspar: 10.1412, Jaspar: 10.1412, Jaspar: 11.2..."
4,NFKB1,P19838,ENSG00000109320,H. sapiens,Transcription factor,,MDS033,Q9NZ42,ENSG00000205155,H. sapiens,"Co-factor,Scaffold",Notch(core),Transcriptional regulation,Transcriptional directed,indirect,unknown,14681366,"JASPAR(url: http://jaspar.cgb.ki.se/ ,pmid: 14...",GO Semantic Similarity: 0.732828,"Jaspar: 10.1804, Jaspar: 10.1804"


In [33]:
tnf_signalink = signalink.loc[signalink['source_uniprotAC'].isin(TNF)]
print(tnf_signalink.shape)
tnf_signalink.head()

(0, 20)


Unnamed: 0,source_name,source_uniprotAC,source_speciesID,source_species,source_topology,source_pathways,target_name,target_uniprotAC,target_speciesID,target_species,target_topology,target_pathways,layer,interaction_type,directness,effect,references,source,confidence_score,score_from_the_source


In [34]:
tnf_signalink2 = signalink.loc[signalink['target_uniprotAC'].isin(TNF)]
tnf_signalink2.reset_index(drop=True, inplace=True)
print(tnf_signalink2.shape)
tnf_signalink2.head()

(140, 20)


Unnamed: 0,source_name,source_uniprotAC,source_speciesID,source_species,source_topology,source_pathways,target_name,target_uniprotAC,target_speciesID,target_species,target_topology,target_pathways,layer,interaction_type,directness,effect,references,source,confidence_score,score_from_the_source
0,NFKB1,P19838,ENSG00000109320,H. sapiens,Transcription factor,,UNQ160/PRO186,O14763,ENSG00000120889,H. sapiens,Scaffold,,Transcriptional regulation,Transcriptional directed,indirect,unknown,14681366,"JASPAR(url: http://jaspar.cgb.ki.se/ ,pmid: 14...",GO Semantic Similarity: 0.620039,"Jaspar: 10.4906, Jaspar: 10.4906"
1,NFKB1,P19838,ENSG00000109320,H. sapiens,Transcription factor,,TNFRSF10A,O00220,ENSG00000104689,H. sapiens,Scaffold,,Transcriptional regulation,Transcriptional directed,indirect,unknown,14681366,"JASPAR(url: http://jaspar.cgb.ki.se/ ,pmid: 14...",GO Semantic Similarity: 0.620039,"Jaspar: 10.4906, Jaspar: 10.4906"
2,E2F4,Q16254,ENSG00000205250,H. sapiens,Transcription factor,,UNQ329/PRO509,Q92956,ENSG00000157873,H. sapiens,Scaffold,,Transcriptional regulation,Transcriptional directed,indirect,unknown,18971253,"PAZAR(url: http://www.pazar.info/ ,pmid: 18971...",GO Semantic Similarity: 0.241192,
3,E2F4,Q16254,ENSG00000205250,H. sapiens,Transcription factor,,TNFRSF8,P28908,ENSG00000120949,H. sapiens,Scaffold,,Transcriptional regulation,Transcriptional directed,indirect,unknown,18971253,"PAZAR(url: http://www.pazar.info/ ,pmid: 18971...",GO Semantic Similarity: 0.241192,
4,PPARG,P37231,ENSG00000132170,H. sapiens,"Scaffold,Transcription factor",NHR(core),NGFR,P08138,ENSG00000064300,H. sapiens,,,Transcriptional regulation,Transcriptional directed,indirect,unknown,14681366,"JASPAR(url: http://jaspar.cgb.ki.se/ ,pmid: 14...",GO Semantic Similarity: 0.519276,"Jaspar: 11.1587, Jaspar: 11.1587"


In [35]:
tnf_signalink2 = tnf_signalink2[["target_uniprotAC", "source_name" , "source_pathways"]]
print(tnf_signalink2.shape)
export_csv = tnf_signalink2.to_csv (r'/Users/saheeba/Desktop/TNFR_work/tnf_signalink2.csv', index = None, header=True)
tnf_signalink2.head()

(140, 3)


Unnamed: 0,target_uniprotAC,source_name,source_pathways
0,O14763,NFKB1,
1,O00220,NFKB1,
2,Q92956,E2F4,
3,P28908,E2F4,
4,P08138,PPARG,NHR(core)


In [36]:
print(tnf_signalink2.drop_duplicates(subset=['target_uniprotAC']))

   target_uniprotAC source_name     source_pathways
0            O14763       NFKB1                 NaN
1            O00220       NFKB1                 NaN
2            Q92956        E2F4                 NaN
3            P28908        E2F4                 NaN
4            P08138       PPARG           NHR(core)
5            O14836       PLAG1                 NaN
10           Q9Y6Q6        RELA                 NaN
12           P36941        RXRA           NHR(core)
14           Q9NS68        TCF3  WNT/Wingless(core)
15           Q9HAV5        TCF3  WNT/Wingless(core)
16           Q9UBN6       HIF1A     Notch(non-core)


<br>

### <font color='#800ff'> Human Expression Expression Atlas</font>
The Human Protein Atlas is a Swedish-based program initiated in 2003 with the aim to map all the human proteins in cells, tissues and organs using an integration of various omics technologies, including antibody-based imaging, mass spectrometry-based proteomics, transcriptomics and systems biology. All the data in the knowledge resource is open access to allow scientists both in academia and industry to freely access the data for exploration of the human proteome. 

In [37]:
import requests, zipfile, io
r = requests.get('https://www.proteinatlas.org/download/normal_tissue.tsv.zip')
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("/Users/saheeba/Desktop/TNFR_work/")

In [38]:
HP_atlas = pd.read_csv(z.open('normal_tissue.tsv'), sep = "\t")
print(HP_atlas.shape)
HP_atlas.head()

(1118517, 6)


Unnamed: 0,Gene,Gene name,Tissue,Cell type,Level,Reliability
0,ENSG00000000003,TSPAN6,adipose tissue,adipocytes,Not detected,Approved
1,ENSG00000000003,TSPAN6,adrenal gland,glandular cells,Not detected,Approved
2,ENSG00000000003,TSPAN6,appendix,glandular cells,Medium,Approved
3,ENSG00000000003,TSPAN6,appendix,lymphoid tissue,Not detected,Approved
4,ENSG00000000003,TSPAN6,bone marrow,hematopoietic cells,Not detected,Approved


In [40]:
tnflgenes = TNFL['gene1'].to_list()
tnfrgenes = TNFR['gene1'].to_list()
genes = tnflgenes + tnfrgenes
print(genes)

['TNFSF13B', 'LTA', 'TNFSF4', 'LTB', 'TNFSF15', 'EDA', 'CD40LG', 'CD70', 'TNFSF18', 'TNFSF11', 'FASLG', 'TNF', 'TNFSF9', 'TNFSF13', 'TNFSF14', 'TNFSF8', 'TNFSF10', 'TNFSF12', 'TNFRSF11A', 'TNFRSF18', 'EDAR', 'TNFRSF10D', 'TNFRSF19', 'TNFRSF12A', 'EDA2R', 'TNFRSF13C', 'RELT', 'TNFRSF25', 'TNFRSF14', 'TNFRSF9', 'TNFRSF17', 'TNFRSF4', 'LTBR', 'TNFRSF8', 'CD27', 'CD40', 'FAS', 'TNFRSF1B', 'TNFRSF1A', 'NGFR', 'TNFRSF6B', 'TNFRSF21', 'TNFRSF13B', 'TNFRSF10C', 'TNFRSF10B', 'TNFRSF11B', 'TNFRSF10A']


In [41]:
protein_atlas = HP_atlas.loc[HP_atlas['Gene name'].isin(genes)]
print(protein_atlas.shape)
protein_atlas.head()

(2507, 6)


Unnamed: 0,Gene,Gene name,Tissue,Cell type,Level,Reliability
9367,ENSG00000006327,TNFRSF12A,adipose tissue,adipocytes,Not detected,Approved
9368,ENSG00000006327,TNFRSF12A,adrenal gland,glandular cells,Medium,Approved
9369,ENSG00000006327,TNFRSF12A,appendix,glandular cells,Medium,Approved
9370,ENSG00000006327,TNFRSF12A,appendix,lymphoid tissue,Not detected,Approved
9371,ENSG00000006327,TNFRSF12A,bone marrow,hematopoietic cells,Medium,Approved


In [42]:
atlas = protein_atlas.groupby(['Gene name','Level'], as_index=False)['Cell type'].agg(', '.join).set_index(['Gene name','Level'])['Cell type'].unstack().reset_index()
atlas.to_csv(r'/Users/saheeba/Desktop/TNFR_work/tnf_human_atlas.csv', index = None, header=True)
print(atlas.shape) 
atlas.head(10)

(31, 5)


Level,Gene name,High,Low,Medium,Not detected
0,CD27,,"hematopoietic cells, germinal center cells, ge...","lymphoid tissue, non-germinal center cells, ce...","adipocytes, glandular cells, glandular cells, ..."
1,CD40,"germinal center cells, cells in white pulp, ge...","lymphoid tissue, macrophages","Purkinje cells - cytoplasm/membrane, Purkinje ...","adipocytes, glandular cells, hematopoietic cel..."
2,CD40LG,,,"hematopoietic cells, non-germinal center cells...","adipocytes, glandular cells, glandular cells, ..."
3,EDAR,"glandular cells, lymphoid tissue, glandular ce...","hematopoietic cells, neuronal cells, cells in ...","glandular cells, glandular cells, myoepithelia...","adipocytes, adipocytes, glial cells, cells in ..."
4,FASLG,,"glandular cells, lymphoid tissue, neuronal cel...","hematopoietic cells, trophoblastic cells","adipocytes, glandular cells, adipocytes, gland..."
5,LTB,,"lymphoid tissue, hematopoietic cells, macropha...","germinal center cells, non-germinal center cel...","adipocytes, glandular cells, glandular cells, ..."
6,LTBR,,"glandular cells, squamous epithelial cells, gl...","hematopoietic cells, neuronal cells, non-germi...","adipocytes, glandular cells, adipocytes, gland..."
7,NGFR,"peripheral nerve/ganglion, peritubular cells","respiratory epithelial cells, squamous epithel...","glandular cells, glandular cells, glandular ce...","adipocytes, glandular cells, lymphoid tissue, ..."
8,TNF,,"hematopoietic cells, macrophages, non-germinal...","germinal center cells, non-germinal center cells","adipocytes, glandular cells, glandular cells, ..."
9,TNFRSF10A,,"glandular cells, ciliated cells (cell body), n...","glandular cells, squamous epithelial cells, gl...","adipocytes, glandular cells, lymphoid tissue, ..."
