# pydentate: a graph neural network tool for predicting metal-ligand coordination

The following tutorial serves as an introduction to pydentate, an open-source python-based package for predicting metal-ligand coordination. from only SMILES string inputs and generating 3D structures of transition metal complexes with the predicted coordination.

For a detailed explanation of the tool, please refer to "Graph neural networks for predicting metal–ligand coordination of transition metal complexes" (https://doi.org/10.26434/chemrxiv-2024-nzk5q).

If you find this work useful, please consider citing the associated publication:
J. W. Toney, R. G. St. Michel, A. G. Garrison, I. Kevlishvili, H. J. Kulik, ChemRxiv 2024, 10.26434/chemrxiv-2024-nzk5q

### Step 1: install dependencies

In [8]:
import chemprop
import pandas as pd
import numpy as np

### Step 2: predict denticity and coordinating atoms
The trained machine learning models are loaded here and used to predict denticity (total number of coordinating atoms) and coordinating atom indices from ligand SMILES strings. For illustrative purposes, a subset of the holdout data from the original paper is used here, which users should replace with their own datasets.

In [9]:
# test_path: path to .csv you want to generate predictions on
# smiles_columns: name of column in .csv where SMILES are stored. Assumes 'smiles' unless otherwise specified
# preds_path: path to .csv where your results will be saved

# predict denticity from SMILES
pred_dent_args_list = ['--test_path', 'holdout_subset.csv',
                       '--checkpoint_path', 'trained_models/pred_dent_model.pt',
                       '--smiles_columns', 'smiles',
                       '--preds_path', 'dent_preds.csv']

pred_dent_args = chemprop.args.PredictArgs().parse_args(args=pred_dent_args_list)
dent_preds = chemprop.train.make_predictions(args=pred_dent_args)

# predict coordinating atoms from SMILES
pred_catoms_args_list = ['--test_path', 'holdout_subset.csv',
                         '--checkpoint_path', 'trained_models/pred_catoms_model.pt',
                         '--smiles_columns', 'smiles',
                         '--preds_path', 'catom_preds.csv']

pred_catoms_args = chemprop.args.PredictArgs().parse_args(args=pred_catoms_args_list)
catom_preds = chemprop.train.make_predictions(args=pred_catoms_args)

print('Done predicting!')

Loading training args
Setting molecule featurization parameters to default.
Loading data


99it [00:00, 378174.95it/s]
100%|███████████████████████████████████████| 99/99 [00:00<00:00, 245701.83it/s]


Validating SMILES
Test size = 99


  0%|                                                     | 0/1 [00:00<?, ?it/s]

Loading pretrained parameter "encoder.encoder.0.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.0.W_i.weight".
Loading pretrained parameter "encoder.encoder.0.W_h.weight".
Loading pretrained parameter "encoder.encoder.0.W_o.weight".
Loading pretrained parameter "encoder.encoder.0.W_o.bias".
Loading pretrained parameter "readout.1.weight".
Loading pretrained parameter "readout.1.bias".
Loading pretrained parameter "readout.4.weight".
Loading pretrained parameter "readout.4.bias".
Loading pretrained parameter "readout.7.weight".
Loading pretrained parameter "readout.7.bias".



  0%|                                                     | 0/2 [00:00<?, ?it/s][A
 50%|██████████████████████▌                      | 1/2 [00:03<00:03,  3.57s/it][A
100%|█████████████████████████████████████████████| 1/1 [00:13<00:00, 13.73s/it][A


Saving predictions to dent_preds.csv
Elapsed time = 0:00:14
Loading training args
Setting molecule featurization parameters to default.
Loading data


99it [00:00, 385191.18it/s]
100%|███████████████████████████████████████| 99/99 [00:00<00:00, 245701.83it/s]


Validating SMILES
Test size = 99


  0%|                                                     | 0/1 [00:00<?, ?it/s]

Loading pretrained parameter "encoder.encoder.0.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.0.W_i.weight".
Loading pretrained parameter "encoder.encoder.0.W_h.weight".
Loading pretrained parameter "encoder.encoder.0.W_o.weight".
Loading pretrained parameter "encoder.encoder.0.W_o.bias".
Loading pretrained parameter "encoder.encoder.0.W_o_b.weight".
Loading pretrained parameter "encoder.encoder.0.W_o_b.bias".
Loading pretrained parameter "readout.atom_ffn_base.0.1.weight".
Loading pretrained parameter "readout.atom_ffn_base.0.1.bias".
Loading pretrained parameter "readout.atom_ffn_base.0.4.weight".
Loading pretrained parameter "readout.atom_ffn_base.0.4.bias".
Loading pretrained parameter "readout.bond_ffn_base.0.1.weight".
Loading pretrained parameter "readout.bond_ffn_base.0.1.bias".
Loading pretrained parameter "readout.bond_ffn_base.0.4.weight".
Loading pretrained parameter "readout.bond_ffn_base.0.4.bias".
Loading pretrained parameter "readout.ffn_list.0.ffn.


  0%|                                                     | 0/2 [00:00<?, ?it/s][A
 50%|██████████████████████▌                      | 1/2 [00:03<00:03,  3.41s/it][A
100%|█████████████████████████████████████████████| 1/1 [00:13<00:00, 13.60s/it][A

Saving predictions to catom_preds.csv
Elapsed time = 0:00:14
Done predicting!





# Step 3: process predictions
Read in predicted denticity and coordinating atoms, parse into usable format.

In [11]:
# read denticity predictions, parse into usable format
df_dent_preds = pd.read_csv('dent_preds.csv')
parsed_rows = []
for idx, class_0 in enumerate(df_dent_preds['denticities_zero_index_class_0']):
    parsed_rows.append([class_0, df_dent_preds['denticities_zero_index_class_1'][idx],
                      df_dent_preds['denticities_zero_index_class_2'][idx],
                      df_dent_preds['denticities_zero_index_class_3'][idx],
                      df_dent_preds['denticities_zero_index_class_4'][idx],
                      df_dent_preds['denticities_zero_index_class_5'][idx]])
df_dent_preds['denticities_zero_index'] = parsed_rows
df_dent_preds = df_dent_preds[['smiles', 'denticities_zero_index']]

# read coordinating atom predictions, parse into usable format
df_catom_preds = pd.read_csv('catom_preds.csv')
parsed_rows = []
for row in df_catom_preds['Padded_catoms_rdkit']:
    row = row.split(' ')
    parsed_row = []
    for entry in row:
        parsed_row.append(float(entry.replace('[','').replace(']','').replace('\n','')))
    parsed_rows.append(parsed_row)
df_catom_preds['Padded_catoms_rdkit'] = parsed_rows


# Step 3: use models synergistically
In most instances, pydentate correctly identifies the denticity and coordinating atoms of a ligand independently. However, in instances of conflicting initial predictions, the less confident prediction is overwritten to be compatible with the more confident prediction.

For example, a ligand predicted as monodentate but predicted to have two coordinating atoms would be an instance of a conflicting prediction. In such cases, the more confident prediction is used and the less confident prediction overwritten. If the model were more confident in its predicted denticity than coordinating atoms, only the single most confidently predicted coordinating atom would be returned.

A more detailed discussion of this synergistic use is available in Supporting Information Figures S15-S17 of the associated publication.

In [13]:
# use models synergistically
new_catom_preds = []
new_dent_preds = []

for idx, smiles in enumerate(df_catom_preds['smiles']):
    catom_preds = df_catom_preds['Padded_catoms_rdkit'][idx]
    dent_preds = df_dent_preds['denticities_zero_index'][idx]

    if np.sum(np.round(catom_preds)) != np.argmax(dent_preds)+1:
        catom_uncertainty = np.max([1-pred if pred >= 0.5 else pred for pred in catom_preds])
        dent_uncertainty = np.max([1-pred if pred >= 0.5 else pred for pred in dent_preds])
        
        if catom_uncertainty > dent_uncertainty:
            # replace catom_preds
            top_indices = np.argsort(catom_preds)[-(np.argmax(dent_preds)+1):][::-1]
            new_catom_preds.append([1 if idx in top_indices else 0 for idx in range(len(catom_preds))])
            new_dent_preds.append(dent_preds)
        
        else:
            # replace dent preds
            # new_dent_pred = max(np.sum(np.round(catom_preds)),1)
            new_dent_pred = np.sum(np.round(catom_preds))
            new_dent_preds.append(new_dent_pred)
            new_catom_preds.append(catom_preds)

    else:
        new_catom_preds.append(catom_preds)
        new_dent_preds.append(dent_preds)


df_dent_preds['denticities_zero_index'] = new_dent_preds
df_catom_preds['Padded_catoms_rdkit'] = new_catom_preds

# Step 4: round predictions, save results

In [16]:
# round predictions, save results
df_results = pd.DataFrame({'smiles': df_catom_preds['smiles'],
                           'predicted_denticity': df_dent_preds['denticities_zero_index'].apply(lambda x: np.argmax(x)+1 if type(x)==list else x),
                           'predicted_coordinating_atoms': df_catom_preds['Padded_catoms_rdkit'].apply(lambda x: np.round(x))})

df_results.to_csv('combined_ligand_preds.csv', index=False)