# Get the UniProt ids of the APID dataset

High quality binary PPI data is needed for this project. The APID human interactome dataset with interactions proven by at least 1 binary method (binary interactomes) therefore going to be used. APID was filtered on the human interactome, level 2, and filter out inter-species interactions

- https://academic.oup.com/database/article/doi/10.1093/database/baz005/5304002
- http://cicblade.dep.usal.es:8080/APID/init.action




In [1]:
import pandas as pd

In [5]:
df_apid = pd.read_csv('../Data/APID/HUMAN_INTACT_LVL2_FILTER_INT-SCECIES_04_05_2020.txt', sep='\t')

In [11]:
df_apid.head(3)

Unnamed: 0,InteractionID,UniprotID_A,UniprotName_A,GeneName_A,UniprotID_B,UniprotName_B,GeneName_B,ExpEvidences,Methods,Publications,3DStructures,CurationEvents
0,1818,P54727,RD23B_HUMAN,RAD23B,P55036,PSMD4_HUMAN,PSMD4,9,8,7,0,17
1,1819,P55036,PSMD4_HUMAN,PSMD4,Q9UMX0,UBQL1_HUMAN,UBQLN1,9,5,6,0,15
2,1826,Q9UMX0,UBQL1_HUMAN,UBQLN1,Q16186,ADRM1_HUMAN,ADRM1,3,4,2,0,8


In [9]:
# check if there are nan values
# not all gene are known
df_apid.isna().sum()

InteractionID       0
UniprotID_A         0
UniprotName_A       0
GeneName_A        406
UniprotID_B         0
UniprotName_B       0
GeneName_B        453
ExpEvidences        0
Methods             0
Publications        0
3DStructures        0
CurationEvents      0
dtype: int64

In [16]:
# check for unique PPIs
df_apid_ppis = pd.DataFrame(df_apid[['UniprotID_A', 'UniprotID_B']].apply(lambda x: sorted(x), axis=1).to_list())
df_apid_ppis = df_apid_ppis.drop_duplicates()
len(df_apid_ppis)

66206

In [20]:
# check unique proteins
all_unique_uniprot_ids = pd.Series(pd.concat([df_apid_ppis[0],df_apid_ppis[1]]).unique())
len(all_unique_uniprot_ids)

13346

In [21]:
# export
df_apid_ppis.to_csv('../Data/Interactome/uniprot_ids_unique_combinations_apid.csv',
                               header=False, index=False)
all_unique_uniprot_ids.to_csv('../Data/Interactome/all_unique_uniprot_ids_apid.csv',
                               header=False, index=False)