# PLAPT affinities for the best predicted druggable proteins

Protein-Ligands Binding Affinity Prediction Using Pretrained Transformers

https://www.biorxiv.org/content/10.1101/2024.02.08.575577v3.full

https://github.com/trrt-good/WELP-PLAPT/tree/main

We predict the binding affinity of the complex ligand - protein (affinity) using SMILES code of the ligand and the protein sequence of the protein. 


With the pairs SMILES - protein sequence we can predict with the deep learning model PLAPT the logaritm and the binding affinity. The model is using pre-trained transformers like ProtBERT and ChemBERTa to transform the protein sequence and the SMILEs structure into embeddings that are used for the model.

This script is calculating the binding affinities using only one protein and multiple ligands. This can help to run multiple scripts for different target proteins.

Thus, these are the steps:

- Reading the SMILES and other info for ligands.
- Reading the sequence for the protein from FASTA file.
- Predict the binding affinities.

Due to commas from ligand descriptions, we shall use TAB separated files, not CSV.

## Import the libraries

In [2]:
import time
import pandas as pd
import re

In [None]:
import torch

In [2]:
use_cuda = torch.cuda.is_available()
if use_cuda:
    print("GPU is available")
else:
    print("GPU is not available")

GPU is available


In [3]:
if use_cuda:
    print('__CUDNN VERSION:', torch.backends.cudnn.version())
    print('__Number CUDA Devices:', torch.cuda.device_count())
    print('__CUDA Device Name:', torch.cuda.get_device_name(0))
    print('__CUDA Device Total Memory [GB]:', torch.cuda.get_device_properties(0).total_memory/1e9)

__CUDNN VERSION: 8700
__Number CUDA Devices: 1
__CUDA Device Name: NVIDIA GeForce RTX 3090
__CUDA Device Total Memory [GB]: 25.769279488


## Settings

In [4]:
# info we need for ligands (extracted from multisdf_file)
LigandInfo = './results/approved_drugs.tsv'

# gene, protein and seq
BestPredProts = './results/InfoBestGenes.csv'

## Get protein sequences

In [5]:
df_BestProts = pd.read_csv(BestPredProts)

In [6]:
ProtSeqs = df_BestProts['V3'].tolist()
print(ProtSeqs[0])
print(len(ProtSeqs))

MELWRQCTHWLIQCRVLPPSHRVTWDGAQVCELAQALRDGVLLCQLLNNLLPHAINLREVNLRPQMSQFLCLKNIRTFLSTCCEKFGLKRSELFEAFDLFDVQDFGKVIYTLSALSWTPIAQNRGIMPFPTEEESVGDEDIYSGLSDQIDDTVEEDEDLYDCVENEEAEGDEIYEDLMRSEPVSMPPKMTEYDKRCCCLREIQQTEEKYTDTLGSIQQHFLKPLQRFLKPQDIEIIFINIEDLLRVHTHFLKEMKEALGTPGAANLYQVFIKYKERFLVYGRYCSQVESASKHLDRVAAAREDVQMKLEECSQRANNGRFTLRDLLMVPMQRVLKYHLLLQELVKHTQEAMEKENLRLALDAMRDLAQCVNEVKRDNETLRQITNFQLSIENLDQSLAHYGRPKIDGELKITSVERRSKMDRYAFLLDKALLICKRRGDSYDLKDFVNLHSFQVRDDSSGDRDNKKWSHMFLLIEDQGAQGYELFFKTRELKKKWMEQFEMAISNIYPENATANGHDFQMFSFEETTSCKACQMLLRGTFYQGYRCHRCRASAHKECLGRVPPCGRHGQDFPGTMKKDKLHRRAQDKKRNELGLPKMEVFQEYYGLPPPPGAIGPFLRLNPGDIVELTKAEAEQNWWEGRNTSTNEIGWFPCNRVKPYVHGPPQDLSVHLWYAGPMERAGAESILANRSDGTFLVRQRVKDAAEFAISIKYNVEVKHIKIMTAEGLYRITEKKAFRGLTELVEFYQQNSLKDCFKSLDTTLQFPFKEPEKRTISRPAVGSTKYFGTAKARYDFCARDRSELSLKEGDIIKILNKKGQQGWWRGEIYGRVGWFPANYVEEDYSEYC
23


In [7]:
# no order of the protein, list order
df_BestProts

Unnamed: 0,gene,V1,V2,V3
0,VAV1,P15498,VAV_HUMAN Proto-oncogene vav OS=Homo sapiens O...,MELWRQCTHWLIQCRVLPPSHRVTWDGAQVCELAQALRDGVLLCQL...
1,TSC1,Q92574,TSC1_HUMAN Hamartin OS=Homo sapiens OX=9606 GN...,MAQQANVGELLAMLDSPMLGVRDDVTAVFKENLNSDRGPMLVNTLV...
2,TPR,P12270,TPR_HUMAN Nucleoprotein TPR OS=Homo sapiens OX...,MAAVLQQVLERTELNKLPKSVQNKLEKFLADQQSEIDGLKGRHEKF...
3,SMARCA4,P51532,SMCA4_HUMAN Transcription activator BRG1 OS=Ho...,MSTPDPPLGGTPRPGPSPGPGPSPGAMLGPSPGPSPGSAHSMMGPS...
4,SETD2,Q9BYW2,SETD2_HUMAN Histone-lysine N-methyltransferase...,MKQLQPQPPPKMGDFYDPEHPTPEEEENEAKIENVQKTGFIKGPMF...
5,RB1,P06400,RB_HUMAN Retinoblastoma-associated protein OS=...,MPPKTPRKTAATAAAAAAEPPAPPPPPPPEEDPEQDSGPEDLPLVR...
6,PREX2,Q70Z35,PREX2_HUMAN Phosphatidylinositol 3_4_5-trispho...,MSEDSRGDSRAESAKDLEKQLRLRVCVLSELQKTERDYVGTLEFLV...
7,PPP2R1A,P30153,2AAA_HUMAN Serine/threonine-protein phosphatas...,MAAADGDDSLYPIAVLIDELRNEDVQLRLNSIKKLSTIALALGVER...
8,NBN,O60934,NBN_HUMAN Nibrin OS=Homo sapiens OX=9606 GN=NB...,MWKLLPAAGPAGGEPYRLLTGVEYVVGRKNCAILIENDQSISRNHA...
9,MUTYH,Q9UIF7,MUTYH_HUMAN Adenine DNA glycosylase OS=Homo sa...,MTPLVSRLSRLWAIMRKPRAAVGSGHRKQAASQEGRQKHAKNNSQA...


## Get ligand SMILES

In [8]:
# Read the file with ligand SMILES
df_ligands = pd.read_csv(LigandInfo, sep='\t')

In [9]:
df_ligands

Unnamed: 0,pref_name,chembl_id,indication_class,mw_freebase,canonical_smiles
0,HELIUM,CHEMBL1796997,"Gases, Diluent for",4.00,[He]
1,"AMMONIA SOLUTION, STRONG",CHEMBL1160819,Pharmaceutic Aid (solvent and source of ammoni...,17.03,N
2,AMMONIA N 13,CHEMBL1201189,Radioactive Agent; Diagnostic Aid (cardiac ima...,17.03,[13NH3]
3,WATER,CHEMBL1098659,"Diagnostic Aid (radioactive, vascular disorder...",18.02,O
4,NITROGEN,CHEMBL142438,Pharmaceutic Aid (air displacement),28.01,N#N
...,...,...,...,...,...
3586,EXENATIDE,CHEMBL414357,,4186.64,CC[C@H](C)[C@H](NC(=O)[C@H](Cc1ccccc1)NC(=O)[C...
3587,ENFUVIRTIDE,CHEMBL525076,,4491.94,CC[C@H](C)[C@H](NC(=O)[C@H](CC(C)C)NC(=O)[C@H]...
3588,INSULIN DETEMIR,CHEMBL2104391,,5916.93,CCCCCCCCCCCCCC(=O)NCCCC[C@H](NC(=O)[C@@H]1CCCN...
3589,MIPOMERSEN SODIUM,CHEMBL502097,,7177.25,COCCO[C@@H]1[C@H](SP(=O)([O-])OC[C@H]2O[C@@H](...


In [10]:
df_ligands = df_ligands[df_ligands['mw_freebase'] >= 200]
df_ligands

Unnamed: 0,pref_name,chembl_id,indication_class,mw_freebase,canonical_smiles
473,SEVOFLURANE,CHEMBL1200694,Anesthetic (inhalation),200.05,FCOC(C(F)(F)F)C(F)(F)F
474,TEGAFUR,CHEMBL20883,Antineoplastic,200.17,O=c1[nH]c(=O)n(C2CCCO2)cc1F
475,MONOBENZONE,CHEMBL1388,Depigmentor,200.24,Oc1ccc(OCc2ccccc2)cc1
476,DEXMEDETOMIDINE HYDROCHLORIDE,CHEMBL2106195,,200.28,Cc1cccc([C@H](C)c2c[nH]cn2)c1C.Cl
477,DEXMEDETOMIDINE,CHEMBL778,Tranquilizer,200.28,Cc1cccc([C@H](C)c2c[nH]cn2)c1C
...,...,...,...,...,...
3586,EXENATIDE,CHEMBL414357,,4186.64,CC[C@H](C)[C@H](NC(=O)[C@H](Cc1ccccc1)NC(=O)[C...
3587,ENFUVIRTIDE,CHEMBL525076,,4491.94,CC[C@H](C)[C@H](NC(=O)[C@H](CC(C)C)NC(=O)[C@H]...
3588,INSULIN DETEMIR,CHEMBL2104391,,5916.93,CCCCCCCCCCCCCC(=O)NCCCC[C@H](NC(=O)[C@@H]1CCCN...
3589,MIPOMERSEN SODIUM,CHEMBL502097,,7177.25,COCCO[C@@H]1[C@H](SP(=O)([O-])OC[C@H]2O[C@@H](...


In [11]:
df_ligands = df_ligands[df_ligands['mw_freebase'] <= 500]
df_ligands

Unnamed: 0,pref_name,chembl_id,indication_class,mw_freebase,canonical_smiles
473,SEVOFLURANE,CHEMBL1200694,Anesthetic (inhalation),200.05,FCOC(C(F)(F)F)C(F)(F)F
474,TEGAFUR,CHEMBL20883,Antineoplastic,200.17,O=c1[nH]c(=O)n(C2CCCO2)cc1F
475,MONOBENZONE,CHEMBL1388,Depigmentor,200.24,Oc1ccc(OCc2ccccc2)cc1
476,DEXMEDETOMIDINE HYDROCHLORIDE,CHEMBL2106195,,200.28,Cc1cccc([C@H](C)c2c[nH]cn2)c1C.Cl
477,DEXMEDETOMIDINE,CHEMBL778,Tranquilizer,200.28,Cc1cccc([C@H](C)c2c[nH]cn2)c1C
...,...,...,...,...,...
2936,PROMETHAZINE TEOCLATE,CHEMBL3833361,,499.04,CC(CN1c2ccccc2Sc2ccccc21)N(C)C.Cn1c(=O)c2[nH]c...
2937,NIRMATRELVIR,CHEMBL4802135,,499.53,CC(C)(C)[C@H](NC(=O)C(F)(F)F)C(=O)N1C[C@H]2[C@...
2938,OSIMERTINIB MESYLATE,CHEMBL3545063,,499.62,C=CC(=O)Nc1cc(Nc2nccc(-c3cn(C)c4ccccc34)n2)c(O...
2939,OSIMERTINIB,CHEMBL3353410,,499.62,C=CC(=O)Nc1cc(Nc2nccc(-c3cn(C)c4ccccc34)n2)c(O...


In [12]:
# limit  for testing
#df_ligands = df_ligands.head(2)
#df_BestProts = df_BestProts.head(2)

In [13]:
# get only the list of SMILES
smiles = list(df_ligands['canonical_smiles'])

In [14]:
# check the number of ligands / SMILES
len(df_BestProts), len(smiles)

(23, 2468)

## Calculate affinity ligand - protein

In [19]:
for index, row in df_BestProts.iterrows():
    
    from plapt import Plapt
    
    xgene  = row['gene']
    xprot  = row['V1']
    xdescr = row['V2']
    xseq   = row['V3']
    print(f"\n-> {index} = Gene:{row['gene']}, Prot:{row['V1']}, Info:{row['V2']}, Seq:{row['V3']}")
    
    sequences = [xseq] * len(smiles)
    
    # set cuda for the calculations
    plapt = Plapt(device="cuda")
    
    # set a timer
    start_time = time.time()

    # calculate affinities for all pairs of protein - ligand using 2 list of sequences and smiles
    results = plapt.predict_affinity(sequences, smiles)

    end_time = time.time()
    execution_time = end_time - start_time

    print("Execution time:", execution_time, "seconds")
    print("Exec time in hours = ", execution_time/60/60)
    
    
    # get the results as dataframe
    data = {"smiles": smiles, "neg_log10_affinity_M": [d["neg_log10_affinity_M"] for d in results], "affinity_uM": [d["affinity_uM"] for d in results]}
    df_affinities = pd.DataFrame(data)
    
    # pref_name	chembl_id	indication_class	mw_freebase	canonical_smiles
    
    # add ligand info columns to the affinity results
    df_affinities['pref_name']  = list(df_ligands['pref_name'])
    df_affinities['chembl_id']  = list(df_ligands['chembl_id'])
    df_affinities['indication_class'] = list(df_ligands['indication_class'])
    df_affinities['mw_freebase'] = list(df_ligands['mw_freebase'])
    df_affinities['canonical_smiles'] = list(df_ligands['canonical_smiles'])
    
    # sort the results by affinities
    df_affinities = df_affinities.sort_values(by='affinity_uM')
    
    # add a column with the index of the proteins in the list
    df_affinities['GeneID'] = xgene
    df_affinities['ProtID'] = xprot
    df_affinities['FastaDescription'] = xdescr
    df_affinities['ProtSequence'] = xseq
    
    outFile = './affinities_chembl_approved_drugs_'+xgene+'-'+xprot+'.tsv'
    df_affinities.to_csv(outFile, sep='\t', na_rep='N/A', index=False)
    
    
    del sequences
    del results
    del data
    del df_affinities
    del plapt
    torch.cuda.empty_cache()

  from .autonotebook import tqdm as notebook_tqdm



-> 0 = Gene:VAV1, Prot:P15498, Info:VAV_HUMAN Proto-oncogene vav OS=Homo sapiens OX=9606 GN=VAV1 PE=1 SV=4, Seq:MELWRQCTHWLIQCRVLPPSHRVTWDGAQVCELAQALRDGVLLCQLLNNLLPHAINLREVNLRPQMSQFLCLKNIRTFLSTCCEKFGLKRSELFEAFDLFDVQDFGKVIYTLSALSWTPIAQNRGIMPFPTEEESVGDEDIYSGLSDQIDDTVEEDEDLYDCVENEEAEGDEIYEDLMRSEPVSMPPKMTEYDKRCCCLREIQQTEEKYTDTLGSIQQHFLKPLQRFLKPQDIEIIFINIEDLLRVHTHFLKEMKEALGTPGAANLYQVFIKYKERFLVYGRYCSQVESASKHLDRVAAAREDVQMKLEECSQRANNGRFTLRDLLMVPMQRVLKYHLLLQELVKHTQEAMEKENLRLALDAMRDLAQCVNEVKRDNETLRQITNFQLSIENLDQSLAHYGRPKIDGELKITSVERRSKMDRYAFLLDKALLICKRRGDSYDLKDFVNLHSFQVRDDSSGDRDNKKWSHMFLLIEDQGAQGYELFFKTRELKKKWMEQFEMAISNIYPENATANGHDFQMFSFEETTSCKACQMLLRGTFYQGYRCHRCRASAHKECLGRVPPCGRHGQDFPGTMKKDKLHRRAQDKKRNELGLPKMEVFQEYYGLPPPPGAIGPFLRLNPGDIVELTKAEAEQNWWEGRNTSTNEIGWFPCNRVKPYVHGPPQDLSVHLWYAGPMERAGAESILANRSDGTFLVRQRVKDAAEFAISIKYNVEVKHIKIMTAEGLYRITEKKAFRGLTELVEFYQQNSLKDCFKSLDTTLQFPFKEPEKRTISRPAVGSTKYFGTAKARYDFCARDRSELSLKEGDIIKILNKKGQQGWWRGEIYGRVGWFPANYVEEDYSEYC
Execution time: 255.87954568862915 second

Execution time: 2122.4511806964874 seconds
Exec time in hours =  0.589569772415691

-> 5 = Gene:RB1, Prot:P06400, Info:RB_HUMAN Retinoblastoma-associated protein OS=Homo sapiens OX=9606 GN=RB1 PE=1 SV=2, Seq:MPPKTPRKTAATAAAAAAEPPAPPPPPPPEEDPEQDSGPEDLPLVRLEFEETEEPDFTALCQKLKIPDHVRERAWLTWEKVSSVDGVLGGYIQKKKELWGICIFIAAVDLDEMSFTFTELQKNIEISVHKFFNLLKEIDTSTKVDNAMSRLLKKYDVLFALFSKLERTCELIYLTQPSSSISTEINSALVLKVSWITFLLAKGEVLQMEDDLVISFQLMLCVLDYFIKLSPPMLLKEPYKTAVIPINGSPRTPRRGQNRSARIAKQLENDTRIIEVLCKEHECNIDEVKNVYFKNFIPFMNSLGLVTSNGLPEVENLSKRYEEIYLKNKDLDARLFLDHDKTLQTDSIDSFETQRTPRKSNLDEEVNVIPPHTPVRTVMNTIQQLMMILNSASDQPSENLISYFNNCTVNPKESILKRVKDIGYIFKEKFAKAVGQGCVEIGSQRYKLGVRLYYRVMESMLKSEEERLSIQNFSKLLNDNIFHMSLLACALEVVMATYSRSTSQNLDSGTDLSFPWILNVLNLKAFDFYKVIESFIKAEGNLTREMIKHLERCEHRIMESLAWLSDSPLFDLIKQSKDREGPTDHLESACPLNLPLQNNHTAADMYLSPVRSPKKKGSTTRVNSTANAETQATSAFQTQKPLKSTSLSLFYKKVYRLAYLRLNTLCERLLSEHPELEHIIWTLFQHTLQNEYELMRDRHLDQIMMCSMYGICKVKNIDLKFKIIVTAYKDLPHAVQETFKRVLIKEEEYDSIIVFYNSVFMQRLKTNILQYASTRPPTLSPIPHIPRSPYKF

KeyboardInterrupt: 

### Checking all affinities files:

In [22]:
import os

for index, row in df_BestProts.iterrows():
    
    from plapt import Plapt
    
    xgene  = row['gene']
    xprot  = row['V1']
    
    outFile = './results/affinities_chembl_approved_drugs_'+xgene+'-'+xprot+'.tsv'
    
    print(f"\n-> {index} = Gene:{row['gene']}, Prot:{row['V1']}")
    if os.path.exists(outFile) == False:
        print("---> ERROR")


-> 0 = Gene:VAV1, Prot:P15498

-> 1 = Gene:TSC1, Prot:Q92574

-> 2 = Gene:TPR, Prot:P12270

-> 3 = Gene:SMARCA4, Prot:P51532

-> 4 = Gene:SETD2, Prot:Q9BYW2

-> 5 = Gene:RB1, Prot:P06400

-> 6 = Gene:PREX2, Prot:Q70Z35

-> 7 = Gene:PPP2R1A, Prot:P30153

-> 8 = Gene:NBN, Prot:O60934

-> 9 = Gene:MUTYH, Prot:Q9UIF7

-> 10 = Gene:MARK3, Prot:P27448

-> 11 = Gene:JAG1, Prot:P78504

-> 12 = Gene:HRAS, Prot:P01112

-> 13 = Gene:DNM2, Prot:P50570

-> 14 = Gene:CDKN2C, Prot:P42773

-> 15 = Gene:CDKN2A, Prot:P42771

-> 16 = Gene:CCNE1, Prot:P24864

-> 17 = Gene:CASP8, Prot:Q14790

-> 18 = Gene:BUB1B, Prot:O60566

-> 19 = Gene:BCL10, Prot:O95999

-> 20 = Gene:ATG7, Prot:O95352

-> 21 = Gene:ASXL1, Prot:Q8IXJ9

-> 22 = Gene:ACVR1, Prot:Q04771


## Analyse the results

In [15]:
import os

fields = ['neg_log10_affinity_M', 'affinity_uM', 'pref_name', 'chembl_id', 'indication_class', 
                  'mw_freebase', 'canonical_smiles', 'GeneID', 'ProtID']

# Create an empty DataFrame
all_res_df = pd.DataFrame(columns=fields)

for index, row in df_BestProts.iterrows():
    
    from plapt import Plapt
    
    xgene  = row['gene']
    xprot  = row['V1']
    
    outFile = './results/affinities_chembl_approved_drugs_'+xgene+'-'+xprot+'.tsv'
    
    print(f"\n-> {index} = Gene:{row['gene']}, Prot:{row['V1']}")
    if os.path.exists(outFile):
        df_res = pd.read_csv(outFile, sep='\t')
        all_res_df = pd.concat([all_res_df, df_res[fields]], ignore_index=True)

  from .autonotebook import tqdm as notebook_tqdm



-> 0 = Gene:VAV1, Prot:P15498

-> 1 = Gene:TSC1, Prot:Q92574

-> 2 = Gene:TPR, Prot:P12270

-> 3 = Gene:SMARCA4, Prot:P51532

-> 4 = Gene:SETD2, Prot:Q9BYW2

-> 5 = Gene:RB1, Prot:P06400

-> 6 = Gene:PREX2, Prot:Q70Z35

-> 7 = Gene:PPP2R1A, Prot:P30153

-> 8 = Gene:NBN, Prot:O60934

-> 9 = Gene:MUTYH, Prot:Q9UIF7

-> 10 = Gene:MARK3, Prot:P27448

-> 11 = Gene:JAG1, Prot:P78504

-> 12 = Gene:HRAS, Prot:P01112

-> 13 = Gene:DNM2, Prot:P50570

-> 14 = Gene:CDKN2C, Prot:P42773

-> 15 = Gene:CDKN2A, Prot:P42771

-> 16 = Gene:CCNE1, Prot:P24864

-> 17 = Gene:CASP8, Prot:Q14790

-> 18 = Gene:BUB1B, Prot:O60566

-> 19 = Gene:BCL10, Prot:O95999

-> 20 = Gene:ATG7, Prot:O95352

-> 21 = Gene:ASXL1, Prot:Q8IXJ9

-> 22 = Gene:ACVR1, Prot:Q04771


In [16]:
all_res_df

Unnamed: 0,neg_log10_affinity_M,affinity_uM,pref_name,chembl_id,indication_class,mw_freebase,canonical_smiles,GeneID,ProtID
0,9.683601,0.000207,PIFLUFOLASTAT,CHEMBL4299851,,442.40,O=C(O)CC[C@H](NC(=O)N[C@@H](CCCCNC(=O)c1ccc(F)...,VAV1,P15498
1,9.660498,0.000219,PYRVINIUM PAMOATE,CHEMBL1908377,Anthelmintic,382.53,Cc1cc(/C=C/c2ccc3cc(N(C)C)ccc3[n+]2C)c(C)n1-c1...,VAV1,P15498
2,9.311357,0.000488,TEPOTINIB HYDROCHLORIDE,CHEMBL4594292,,492.58,CN1CCC(COc2cnc(-c3cccc(Cn4nc(-c5cccc(C#N)c5)cc...,VAV1,P15498
3,9.142309,0.000721,CERIVASTATIN SODIUM,CHEMBL1200563,Inhibitor (HMG-CoA reductase); Antihyperlipidemic,459.56,COCc1c(C(C)C)nc(C(C)C)c(/C=C/[C@@H](O)C[C@@H](...,VAV1,P15498
4,9.138790,0.000726,LORLATINIB,CHEMBL3286830,,406.42,C[C@H]1Oc2cc(cnc2N)-c2c(nn(C)c2C#N)CN(C)C(=O)c...,VAV1,P15498
...,...,...,...,...,...,...,...,...,...
56759,3.320902,477.636773,CETYLPYRIDINIUM,CHEMBL305906,"Anti-Infective, Topical; Pharmaceutic Aid (pre...",304.54,CCCCCCCCCCCCCCCC[n+]1ccccc1,ACVR1,Q04771
56760,3.274507,531.487766,FELBINAC,CHEMBL413965,Anti-Inflammatory,212.25,O=C(O)Cc1ccc(-c2ccccc2)cc1,ACVR1,Q04771
56761,3.074396,842.565181,ETHAMBUTOL HYDROCHLORIDE,CHEMBL3140361,,204.31,CC[C@@H](CO)NCCN[C@@H](CC)CO.Cl.Cl,ACVR1,Q04771
56762,2.618954,2404.615390,STREPTOZOCIN,CHEMBL1651906,Antineoplastic,265.22,CN(N=O)C(=O)N[C@H]1C(O)O[C@H](CO)[C@@H](O)[C@@...,ACVR1,Q04771


In [17]:
all_res_df.to_csv("./results/all_affinities_approved_drugs.tsv", sep='\t', na_rep='N/A', index=False)

In [19]:
print(list(all_res_df.columns))

['neg_log10_affinity_M', 'affinity_uM', 'pref_name', 'chembl_id', 'indication_class', 'mw_freebase', 'canonical_smiles', 'GeneID', 'ProtID']


In [57]:
cross_table = pd.pivot_table(all_res_df, values='affinity_uM', index=['chembl_id', 'pref_name'], columns=['GeneID'], aggfunc='min')
cross_table

Unnamed: 0_level_0,GeneID,ACVR1,ASXL1,ATG7,BCL10,BUB1B,CASP8,CCNE1,CDKN2A,CDKN2C,DNM2,...,MUTYH,NBN,PPP2R1A,PREX2,RB1,SETD2,SMARCA4,TPR,TSC1,VAV1
chembl_id,pref_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
CHEMBL1000,CETIRIZINE,1.064074,2.955100,3.319483,2.436678,5.016496,4.668501,2.744101,2.495055,0.950373,3.930842,...,4.251348,2.723156,3.517704,2.204190,3.224470,3.660538,2.811282,3.369722,3.972241,3.833917
CHEMBL100116,PENTAZOCINE,0.007201,0.017615,0.052611,0.271425,0.690279,1.166418,0.363048,0.283875,0.015221,0.300928,...,0.624485,0.010291,0.096204,0.004610,0.038712,0.143952,0.012451,0.061617,0.335268,0.232472
CHEMBL1002,LEVOSALBUTAMOL,11.941049,5.780977,5.557836,14.368045,8.009263,5.968107,12.174588,13.963600,13.469535,6.291684,...,6.759293,5.843732,5.779416,6.050853,5.473646,5.931525,5.823436,5.615014,6.352189,6.150950
CHEMBL1004,DOXYLAMINE,3.099803,7.116889,7.762661,38.663785,10.232748,92.417531,43.269752,39.411799,9.161086,8.754126,...,9.248264,6.577650,8.094548,5.073556,7.603824,8.324825,6.793725,7.846117,8.820733,8.600901
CHEMBL1005,REMIFENTANIL,0.575562,6.858335,7.707741,1.142410,12.288371,3.658769,1.234062,1.157910,0.267061,9.259530,...,10.002368,6.297096,8.171246,5.239364,7.508571,8.549039,6.507643,7.813586,9.367721,8.993213
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CHEMBL996,CEFOXITIN,45.177675,15.027988,14.845038,31.247859,11.774041,4.585907,24.241744,29.926650,60.301936,14.483991,...,14.051906,15.152747,14.817006,14.640966,14.911151,14.793484,15.098889,14.840699,14.426282,14.623287
CHEMBL9967,PIRENZEPINE,6.551348,27.119169,28.873875,13.900062,37.777206,17.669170,14.122274,13.932920,10.152876,32.066093,...,33.798687,25.837154,29.809800,22.278888,28.485691,30.598953,26.348667,29.076656,32.287847,31.543402
CHEMBL997,IBANDRONIC ACID,72.043153,78.073113,86.816964,29.533829,127.708566,19.743140,27.531614,29.124038,62.046940,100.787349,...,108.590061,72.359781,91.464960,57.398884,84.633454,94.675427,74.608861,87.998577,101.726383,98.597580
CHEMBL998,LORATADINE,7.137180,56.817023,67.229850,0.821448,100.494816,0.253226,0.661619,0.790437,3.710404,85.622222,...,91.062391,51.247594,73.960268,39.305475,64.294692,79.262805,53.358685,68.799975,86.383635,83.833195


In [58]:
cross_table['MeanByChembl_id'] =  cross_table.mean(axis=1)
cross_table

Unnamed: 0_level_0,GeneID,ACVR1,ASXL1,ATG7,BCL10,BUB1B,CASP8,CCNE1,CDKN2A,CDKN2C,DNM2,...,NBN,PPP2R1A,PREX2,RB1,SETD2,SMARCA4,TPR,TSC1,VAV1,MeanByChembl_id
chembl_id,pref_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
CHEMBL1000,CETIRIZINE,1.064074,2.955100,3.319483,2.436678,5.016496,4.668501,2.744101,2.495055,0.950373,3.930842,...,2.723156,3.517704,2.204190,3.224470,3.660538,2.811282,3.369722,3.972241,3.833917,3.152657
CHEMBL100116,PENTAZOCINE,0.007201,0.017615,0.052611,0.271425,0.690279,1.166418,0.363048,0.283875,0.015221,0.300928,...,0.010291,0.096204,0.004610,0.038712,0.143952,0.012451,0.061617,0.335268,0.232472,0.245720
CHEMBL1002,LEVOSALBUTAMOL,11.941049,5.780977,5.557836,14.368045,8.009263,5.968107,12.174588,13.963600,13.469535,6.291684,...,5.843732,5.779416,6.050853,5.473646,5.931525,5.823436,5.615014,6.352189,6.150950,7.853500
CHEMBL1004,DOXYLAMINE,3.099803,7.116889,7.762661,38.663785,10.232748,92.417531,43.269752,39.411799,9.161086,8.754126,...,6.577650,8.094548,5.073556,7.603824,8.324825,6.793725,7.846117,8.820733,8.600901,15.517783
CHEMBL1005,REMIFENTANIL,0.575562,6.858335,7.707741,1.142410,12.288371,3.658769,1.234062,1.157910,0.267061,9.259530,...,6.297096,8.171246,5.239364,7.508571,8.549039,6.507643,7.813586,9.367721,8.993213,6.225562
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CHEMBL996,CEFOXITIN,45.177675,15.027988,14.845038,31.247859,11.774041,4.585907,24.241744,29.926650,60.301936,14.483991,...,15.152747,14.817006,14.640966,14.911151,14.793484,15.098889,14.840699,14.426282,14.623287,20.074772
CHEMBL9967,PIRENZEPINE,6.551348,27.119169,28.873875,13.900062,37.777206,17.669170,14.122274,13.932920,10.152876,32.066093,...,25.837154,29.809800,22.278888,28.485691,30.598953,26.348667,29.076656,32.287847,31.543402,24.458266
CHEMBL997,IBANDRONIC ACID,72.043153,78.073113,86.816964,29.533829,127.708566,19.743140,27.531614,29.124038,62.046940,100.787349,...,72.359781,91.464960,57.398884,84.633454,94.675427,74.608861,87.998577,101.726383,98.597580,77.752807
CHEMBL998,LORATADINE,7.137180,56.817023,67.229850,0.821448,100.494816,0.253226,0.661619,0.790437,3.710404,85.622222,...,51.247594,73.960268,39.305475,64.294692,79.262805,53.358685,68.799975,86.383635,83.833195,52.347173


In [59]:
cross_table = cross_table.sort_values(by='MeanByChembl_id', ascending=True)
cross_table

Unnamed: 0_level_0,GeneID,ACVR1,ASXL1,ATG7,BCL10,BUB1B,CASP8,CCNE1,CDKN2A,CDKN2C,DNM2,...,NBN,PPP2R1A,PREX2,RB1,SETD2,SMARCA4,TPR,TSC1,VAV1,MeanByChembl_id
chembl_id,pref_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
CHEMBL1200563,CERIVASTATIN SODIUM,0.001838,0.000748,0.000756,0.001989,0.000674,0.001464,0.001667,0.001923,0.001677,0.000714,...,0.000749,0.000748,0.000774,0.000752,0.000736,0.000748,0.000758,0.000713,0.000721,0.001039
CHEMBL1908377,PYRVINIUM PAMOATE,0.001588,0.000134,0.000156,0.002923,0.000607,0.006253,0.003237,0.002975,0.002884,0.000245,...,0.000125,0.000177,0.000104,0.000147,0.000192,0.000129,0.000161,0.000258,0.000219,0.001061
CHEMBL186,CEFEPIME,0.000063,0.001207,0.001540,0.000626,0.005298,0.004068,0.000781,0.000651,0.000080,0.002292,...,0.001050,0.001757,0.000699,0.001440,0.001921,0.001110,0.001594,0.002357,0.002148,0.001687
CHEMBL260538,ULIPRISTAL ACETATE,0.000203,0.000683,0.001083,0.002115,0.005978,0.004845,0.002617,0.002211,0.000271,0.002244,...,0.000503,0.001364,0.000242,0.000965,0.001603,0.000568,0.001150,0.002357,0.001994,0.001807
CHEMBL2042122,FLUTEMETAMOL F 18,0.000023,0.000749,0.001622,0.000627,0.016622,0.004216,0.000796,0.000653,0.000017,0.004944,...,0.000457,0.002385,0.000161,0.001335,0.003091,0.000553,0.001794,0.005298,0.004193,0.003079
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CHEMBL1200381,ARGININE HYDROCHLORIDE,156.386714,624.535717,664.050254,8.953040,758.650070,7.472896,8.065141,8.676421,80.241877,713.717373,...,672.871337,689.964080,770.106211,647.846000,698.069474,652.814535,672.587625,715.661823,708.723793,508.251128
CHEMBL3545985,BENZPHETAMINE,13.601062,0.973352,1.074380,6222.340414,1.542881,9310.421611,6554.923402,6272.937561,90.055136,1.262043,...,0.916016,1.137157,0.775241,1.045117,1.180964,0.939101,1.090476,1.273659,1.233527,1238.566817
CHEMBL1213136,PILOCARPINE NITRATE,133.641274,1698.390965,2108.170236,133.267033,2667.542783,192.293053,138.907310,134.230867,135.428024,2434.557690,...,1387.170010,2228.559548,825.538912,2048.441176,2299.725986,1505.957771,2139.337458,2455.132908,2385.821081,1511.381196
CHEMBL152,CIDOFOVIR ANHYDROUS,3026.748964,1850.121162,1764.458548,17291.210096,1416.567432,24278.797298,18048.297166,17414.131471,4672.583750,1621.551238,...,1909.099515,1721.567235,2147.137958,1786.553646,1687.277294,1884.970107,1753.161998,1612.718537,1643.932580,5011.821586


In [60]:
cross_table = cross_table.reindex(['MeanByChembl_id'] + [col for col in cross_table.columns if col != 'MeanByChembl_id'], axis=1)
cross_table

Unnamed: 0_level_0,GeneID,MeanByChembl_id,ACVR1,ASXL1,ATG7,BCL10,BUB1B,CASP8,CCNE1,CDKN2A,CDKN2C,...,MUTYH,NBN,PPP2R1A,PREX2,RB1,SETD2,SMARCA4,TPR,TSC1,VAV1
chembl_id,pref_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
CHEMBL1200563,CERIVASTATIN SODIUM,0.001039,0.001838,0.000748,0.000756,0.001989,0.000674,0.001464,0.001667,0.001923,0.001677,...,0.000701,0.000749,0.000748,0.000774,0.000752,0.000736,0.000748,0.000758,0.000713,0.000721
CHEMBL1908377,PYRVINIUM PAMOATE,0.001061,0.001588,0.000134,0.000156,0.002923,0.000607,0.006253,0.003237,0.002975,0.002884,...,0.000361,0.000125,0.000177,0.000104,0.000147,0.000192,0.000129,0.000161,0.000258,0.000219
CHEMBL186,CEFEPIME,0.001687,0.000063,0.001207,0.001540,0.000626,0.005298,0.004068,0.000781,0.000651,0.000080,...,0.002917,0.001050,0.001757,0.000699,0.001440,0.001921,0.001110,0.001594,0.002357,0.002148
CHEMBL260538,ULIPRISTAL ACETATE,0.001807,0.000203,0.000683,0.001083,0.002115,0.005978,0.004845,0.002617,0.002211,0.000271,...,0.003055,0.000503,0.001364,0.000242,0.000965,0.001603,0.000568,0.001150,0.002357,0.001994
CHEMBL2042122,FLUTEMETAMOL F 18,0.003079,0.000023,0.000749,0.001622,0.000627,0.016622,0.004216,0.000796,0.000653,0.000017,...,0.008324,0.000457,0.002385,0.000161,0.001335,0.003091,0.000553,0.001794,0.005298,0.004193
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CHEMBL1200381,ARGININE HYDROCHLORIDE,508.251128,156.386714,624.535717,664.050254,8.953040,758.650070,7.472896,8.065141,8.676421,80.241877,...,728.534800,672.871337,689.964080,770.106211,647.846000,698.069474,652.814535,672.587625,715.661823,708.723793
CHEMBL3545985,BENZPHETAMINE,1238.566817,13.601062,0.973352,1.074380,6222.340414,1.542881,9310.421611,6554.923402,6272.937561,90.055136,...,1.351976,0.916016,1.137157,0.775241,1.045117,1.180964,0.939101,1.090476,1.273659,1.233527
CHEMBL1213136,PILOCARPINE NITRATE,1511.381196,133.641274,1698.390965,2108.170236,133.267033,2667.542783,192.293053,138.907310,134.230867,135.428024,...,2542.197179,1387.170010,2228.559548,825.538912,2048.441176,2299.725986,1505.957771,2139.337458,2455.132908,2385.821081
CHEMBL152,CIDOFOVIR ANHYDROUS,5011.821586,3026.748964,1850.121162,1764.458548,17291.210096,1416.567432,24278.797298,18048.297166,17414.131471,4672.583750,...,1558.237200,1909.099515,1721.567235,2147.137958,1786.553646,1687.277294,1884.970107,1753.161998,1612.718537,1643.932580


In [61]:
cross_table.to_csv("./results/all_affinities_approved_drugs_CrossTable.csv")

In [66]:
cross_table2 = pd.pivot_table(all_res_df, values='neg_log10_affinity_M', index=['chembl_id', 'pref_name'], columns=['GeneID'], aggfunc='min')
cross_table2

Unnamed: 0_level_0,GeneID,ACVR1,ASXL1,ATG7,BCL10,BUB1B,CASP8,CCNE1,CDKN2A,CDKN2C,DNM2,...,MUTYH,NBN,PPP2R1A,PREX2,RB1,SETD2,SMARCA4,TPR,TSC1,VAV1
chembl_id,pref_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
CHEMBL1000,CETIRIZINE,5.973028,5.529428,5.478930,5.613202,5.299599,5.330823,5.561600,5.602920,6.022106,5.405514,...,5.371473,5.564928,5.453741,5.656751,5.491542,5.436455,5.551096,5.472406,5.400964,5.416357
CHEMBL100116,PENTAZOCINE,8.142608,7.754127,7.278925,6.566350,6.160976,5.933146,6.440036,6.546872,7.817569,6.521538,...,6.204478,7.987525,7.016807,8.336255,7.412153,6.841781,7.904809,7.210301,6.474608,6.633629
CHEMBL1002,LEVOSALBUTAMOL,4.922958,5.237999,5.255094,4.842602,5.096407,5.224163,4.914546,4.855003,4.870647,5.201233,...,5.170099,5.233310,5.238116,5.218183,5.261723,5.226834,5.234821,5.250649,5.197077,5.211058
CHEMBL1004,DOXYLAMINE,5.508666,5.147710,5.109989,4.412696,4.990008,4.034246,4.363816,4.404374,5.038053,5.057787,...,5.033940,5.181929,5.091807,5.294688,5.118968,5.079625,5.167892,5.105345,5.054495,5.065456
CHEMBL1005,REMIFENTANIL,6.239908,5.163781,5.113073,5.942178,4.910506,5.436665,5.908663,5.936325,6.573389,5.033411,...,4.999897,5.200860,5.087712,5.280721,5.124443,5.068083,5.186576,5.107150,5.028366,5.046085
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CHEMBL996,CEFOXITIN,4.345076,4.823099,4.828419,4.505180,4.929074,5.338575,4.615436,4.523942,4.219669,4.839112,...,4.852265,4.819509,4.829240,4.834430,4.826489,4.829930,4.821055,4.828546,4.840846,4.834955
CHEMBL9967,PIRENZEPINE,5.183669,4.566724,4.539495,4.856983,4.422770,4.752784,4.850095,4.855958,4.993411,4.493954,...,4.471100,4.587755,4.525641,4.652106,4.545373,4.514293,4.579241,4.536456,4.490961,4.501091
CHEMBL997,IBANDRONIC ACID,4.142407,4.107499,4.061395,4.529680,3.893780,4.704584,4.560168,4.535748,4.207280,3.996594,...,3.964210,4.140503,4.038745,4.241097,4.072458,4.023763,4.127210,4.055524,3.992566,4.006134
CHEMBL998,LORATADINE,5.146473,4.245522,4.172438,6.085420,3.997856,6.596491,6.179392,6.102133,5.430579,4.067414,...,4.040661,4.290327,4.131002,4.405547,4.191825,4.100931,4.272795,4.162412,4.063569,4.076584


In [67]:
cross_table2['MeanByChembl_id'] =  cross_table2.mean(axis=1)
cross_table2

Unnamed: 0_level_0,GeneID,ACVR1,ASXL1,ATG7,BCL10,BUB1B,CASP8,CCNE1,CDKN2A,CDKN2C,DNM2,...,NBN,PPP2R1A,PREX2,RB1,SETD2,SMARCA4,TPR,TSC1,VAV1,MeanByChembl_id
chembl_id,pref_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
CHEMBL1000,CETIRIZINE,5.973028,5.529428,5.478930,5.613202,5.299599,5.330823,5.561600,5.602920,6.022106,5.405514,...,5.564928,5.453741,5.656751,5.491542,5.436455,5.551096,5.472406,5.400964,5.416357,5.536360
CHEMBL100116,PENTAZOCINE,8.142608,7.754127,7.278925,6.566350,6.160976,5.933146,6.440036,6.546872,7.817569,6.521538,...,7.987525,7.016807,8.336255,7.412153,6.841781,7.904809,7.210301,6.474608,6.633629,7.058007
CHEMBL1002,LEVOSALBUTAMOL,4.922958,5.237999,5.255094,4.842602,5.096407,5.224163,4.914546,4.855003,4.870647,5.201233,...,5.233310,5.238116,5.218183,5.261723,5.226834,5.234821,5.250649,5.197077,5.211058,5.131381
CHEMBL1004,DOXYLAMINE,5.508666,5.147710,5.109989,4.412696,4.990008,4.034246,4.363816,4.404374,5.038053,5.057787,...,5.181929,5.091807,5.294688,5.118968,5.079625,5.167892,5.105345,5.054495,5.065456,5.001926
CHEMBL1005,REMIFENTANIL,6.239908,5.163781,5.113073,5.942178,4.910506,5.436665,5.908663,5.936325,6.573389,5.033411,...,5.200860,5.087712,5.280721,5.124443,5.068083,5.186576,5.107150,5.028366,5.046085,5.360826
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CHEMBL996,CEFOXITIN,4.345076,4.823099,4.828419,4.505180,4.929074,5.338575,4.615436,4.523942,4.219669,4.839112,...,4.819509,4.829240,4.834430,4.826489,4.829930,4.821055,4.828546,4.840846,4.834955,4.758344
CHEMBL9967,PIRENZEPINE,5.183669,4.566724,4.539495,4.856983,4.422770,4.752784,4.850095,4.855958,4.993411,4.493954,...,4.587755,4.525641,4.652106,4.545373,4.514293,4.579241,4.536456,4.490961,4.501091,4.662329
CHEMBL997,IBANDRONIC ACID,4.142407,4.107499,4.061395,4.529680,3.893780,4.704584,4.560168,4.535748,4.207280,3.996594,...,4.140503,4.038745,4.241097,4.072458,4.023763,4.127210,4.055524,3.992566,4.006134,4.151954
CHEMBL998,LORATADINE,5.146473,4.245522,4.172438,6.085420,3.997856,6.596491,6.179392,6.102133,5.430579,4.067414,...,4.290327,4.131002,4.405547,4.191825,4.100931,4.272795,4.162412,4.063569,4.076584,4.642713


In [68]:
cross_table2 = cross_table2.sort_values(by='MeanByChembl_id', ascending=False)
cross_table2

Unnamed: 0_level_0,GeneID,ACVR1,ASXL1,ATG7,BCL10,BUB1B,CASP8,CCNE1,CDKN2A,CDKN2C,DNM2,...,NBN,PPP2R1A,PREX2,RB1,SETD2,SMARCA4,TPR,TSC1,VAV1,MeanByChembl_id
chembl_id,pref_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
CHEMBL1908377,PYRVINIUM PAMOATE,8.799242,9.871725,9.806607,8.534135,9.216911,8.203914,8.489807,8.526531,8.540057,9.610170,...,9.903667,9.752005,9.981335,9.833628,9.715685,9.890707,9.792311,9.587854,9.660498,9.358445
CHEMBL4299851,PIFLUFOLASTAT,8.802705,9.528011,9.595888,7.401912,9.843115,7.059892,7.342403,7.392747,8.681403,9.698994,...,9.482640,9.631225,9.369411,9.578927,9.655350,9.501024,9.604997,9.705453,9.683601,9.116042
CHEMBL1200563,CERIVASTATIN SODIUM,8.735739,9.126245,9.121669,8.701300,9.171481,8.834511,8.778039,8.716126,8.775395,9.146010,...,9.125573,9.125975,9.110995,9.123825,9.133326,9.126316,9.120572,9.147097,9.142309,9.021992
CHEMBL186,CEFEPIME,10.200719,8.918247,8.812552,9.203223,8.275885,8.390632,9.107193,9.186442,10.096095,8.639808,...,8.978684,8.755330,9.155558,8.841523,8.716473,8.954508,8.797568,8.627602,8.667965,8.969601
CHEMBL2042122,FLUTEMETAMOL F 18,10.632874,9.125755,8.789828,9.202556,7.779329,8.375084,9.098883,9.185359,10.779349,8.305878,...,9.339669,8.622540,9.794041,8.874453,8.509956,9.257150,8.746107,8.275862,8.377485,8.954411
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CHEMBL3140361,ETHAMBUTOL HYDROCHLORIDE,3.074396,3.317889,3.348215,3.739979,3.469917,4.381316,3.760133,3.743645,3.156285,3.387214,...,3.292094,3.361991,3.271210,3.340726,3.371006,3.303289,3.351792,3.389613,3.381496,3.422050
CHEMBL374731,TELBIVUDINE,3.565630,3.299980,3.272919,3.841673,3.197392,3.765118,3.828990,3.839518,3.597909,3.239838,...,3.319502,3.261121,3.377930,3.278894,3.252799,3.311577,3.269830,3.238125,3.243926,3.401430
CHEMBL1213136,PILOCARPINE NITRATE,3.874059,2.769962,2.676094,3.875277,2.573889,3.716036,3.857275,3.872148,3.868291,2.613580,...,2.857870,2.651976,3.083262,2.688577,2.638324,2.822187,2.669721,2.609925,2.622362,3.039597
CHEMBL152,CIDOFOVIR ANHYDROUS,2.519024,2.732800,2.753389,1.762175,2.848763,1.614773,1.743564,1.759098,2.330443,2.790069,...,2.719171,2.764076,2.668140,2.747984,2.772814,2.724696,2.756178,2.792441,2.784116,2.543950


In [69]:
cross_table2 = cross_table2.reindex(['MeanByChembl_id'] + [col for col in cross_table.columns if col != 'MeanByChembl_id'], axis=1)
cross_table2

Unnamed: 0_level_0,GeneID,MeanByChembl_id,ACVR1,ASXL1,ATG7,BCL10,BUB1B,CASP8,CCNE1,CDKN2A,CDKN2C,...,MUTYH,NBN,PPP2R1A,PREX2,RB1,SETD2,SMARCA4,TPR,TSC1,VAV1
chembl_id,pref_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
CHEMBL1908377,PYRVINIUM PAMOATE,9.358445,8.799242,9.871725,9.806607,8.534135,9.216911,8.203914,8.489807,8.526531,8.540057,...,9.442465,9.903667,9.752005,9.981335,9.833628,9.715685,9.890707,9.792311,9.587854,9.660498
CHEMBL4299851,PIFLUFOLASTAT,9.116042,8.802705,9.528011,9.595888,7.401912,9.843115,7.059892,7.342403,7.392747,8.681403,...,9.748395,9.482640,9.631225,9.369411,9.578927,9.655350,9.501024,9.604997,9.705453,9.683601
CHEMBL1200563,CERIVASTATIN SODIUM,9.021992,8.735739,9.126245,9.121669,8.701300,9.171481,8.834511,8.778039,8.716126,8.775395,...,9.154490,9.125573,9.125975,9.110995,9.123825,9.133326,9.126316,9.120572,9.147097,9.142309
CHEMBL186,CEFEPIME,8.969601,10.200719,8.918247,8.812552,9.203223,8.275885,8.390632,9.107193,9.186442,10.096095,...,8.535086,8.978684,8.755330,9.155558,8.841523,8.716473,8.954508,8.797568,8.627602,8.667965
CHEMBL2042122,FLUTEMETAMOL F 18,8.954411,10.632874,9.125755,8.789828,9.202556,7.779329,8.375084,9.098883,9.185359,10.779349,...,8.079646,9.339669,8.622540,9.794041,8.874453,8.509956,9.257150,8.746107,8.275862,8.377485
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CHEMBL3140361,ETHAMBUTOL HYDROCHLORIDE,3.422050,3.074396,3.317889,3.348215,3.739979,3.469917,4.381316,3.760133,3.743645,3.156285,...,3.409177,3.292094,3.361991,3.271210,3.340726,3.371006,3.303289,3.351792,3.389613,3.381496
CHEMBL374731,TELBIVUDINE,3.401430,3.565630,3.299980,3.272919,3.841673,3.197392,3.765118,3.828990,3.839518,3.597909,...,3.226766,3.319502,3.261121,3.377930,3.278894,3.252799,3.311577,3.269830,3.238125,3.243926
CHEMBL1213136,PILOCARPINE NITRATE,3.039597,3.874059,2.769962,2.676094,3.875277,2.573889,3.716036,3.857275,3.872148,3.868291,...,2.594791,2.857870,2.651976,3.083262,2.688577,2.638324,2.822187,2.669721,2.609925,2.622362
CHEMBL152,CIDOFOVIR ANHYDROUS,2.543950,2.519024,2.732800,2.753389,1.762175,2.848763,1.614773,1.743564,1.759098,2.330443,...,2.807366,2.719171,2.764076,2.668140,2.747984,2.772814,2.724696,2.756178,2.792441,2.784116


In [70]:
cross_table2.to_csv("./results/all_neg_log10_affinities_approved_drugs_CrossTable.csv")