In [1]:
import pandas as pd

The point of this notebook is to check if the improved all-against-all cross template-based docking
resulted in better final poses for the self-docked cases.

In [36]:
# data from template_based_docking_15_03_21 run
new_df = pd.read_csv('../data/rmsd_values_plus_smiles.csv')
new_df = new_df.loc[(new_df['template']==new_df['docked'])]
new_df = new_df[['uniprot_id','template', 'docked', 'rmsd']]
new_df.head()

Unnamed: 0,uniprot_id,template,docked,rmsd
35,P00918,1OKN_STB,1OKN_STB,0.714
341,P00918,3IBN_O60,3IBN_O60,4.664
1273,P00918,4MLT_TM4,4MLT_TM4,0.254
1525,P00918,2GEH_NHY,2GEH_NHY,0.935
1547,P00918,1ZE8_PIU,1ZE8_PIU,0.345


In [37]:
# data from the initial template-based-docking run
old_df = pd.read_csv('../data/data_from_previous_protocol/more_than_3A.csv')
print(old_df.shape)
old_df.head()

(65, 3)


Unnamed: 0,template,docked,rmsd
0,3BIU_10U,3BIU_10U,5.299
1,1TA2_176,1TA2_176,3.971
2,6EO8_2FN,6EO8_2FN,8.017
3,1SL3_170,1SL3_170,6.377
4,3F68_91U,3F68_91U,6.17


For a fair comparison let's make sure that we don't include targets that were not template-based docked
in the first attempt protocol.

In [40]:
uniprots_to_keep = []
for unp_id, template in new_df[['uniprot_id', 'template']].values:
    if template in old_df.template.tolist():
        uniprots_to_keep += [unp_id]
uniprots_to_keep = list(set(uniprots_to_keep))
print(uniprots_to_keep)
# kick out the rows in new_df that don't correspond to uniprots_to_keep
new_df = new_df.loc[new_df['uniprot_id'].isin(uniprots_to_keep)]
new_df.loc[new_df['rmsd']>3].shape

['P00374', 'P42260', 'P27487', 'P00734', 'P53779', 'P00742', 'P04585', 'P15090', 'P04058', 'P56658', 'P00520', 'P20231', 'P47811', 'P22906', 'O43570', 'P14780', 'P08709', 'O14757', 'Q24451', 'P50097', 'P04818', 'P15121', 'P07342', 'P35968', 'P49841', 'P35557']


(117, 4)

Surprising that the number of bad self template-based docking is even bigger...
(maybe for the same target more compounds were successfully?)
Let's do now a head-to-head comparison:

In [42]:
old_df['new_rmsd'] = None
for index, template in old_df[['template']].itertuples():
    if template in new_df.template.tolist():
        old_df.at[index, 'new_rmsd'] = new_df.loc[new_df['template']==template, 'rmsd'].values[0]
old_df.head()

Unnamed: 0,template,docked,rmsd,new_rmsd
0,3BIU_10U,3BIU_10U,5.299,2.661
1,1TA2_176,1TA2_176,3.971,0.3
2,6EO8_2FN,6EO8_2FN,8.017,0.806
3,1SL3_170,1SL3_170,6.377,5.396
4,3F68_91U,3F68_91U,6.17,1.882


In [51]:
print('How many compounds have with the "improved" protocol a RMSD below 3 A?')
print(old_df.loc[old_df['new_rmsd']<3].shape)
print('How many compounds have with the "improved" protocol a RMSD below 2 A?')
print(old_df.loc[old_df['new_rmsd']<2].shape)
print('How many compounds have with the "improved" protocol a RMSD above 3 A?')
print(old_df.loc[old_df['new_rmsd']>3].shape)
print('And how many compounds failed?')
print(old_df.loc[old_df['new_rmsd'].isna()].shape)

How many compounds have with the "improved" protocol a RMSD below 3 A?
(33, 4)
How many compounds have with the "improved" protocol a RMSD below 2 A?
(25, 4)
How many compounds have with the "improved" protocol a RMSD above 3 A?
(20, 4)
And how many compounds failed?
(12, 4)


INCONCLUSIVE...there maybe compounds that the old protocol docked correctly but the new one did not.
It's not possible to rule that out based on the current analysis.