I was chatting with Pat Walters at the CADD GRC and he brought up point 7 from his famous [14 points LinkedIn post](https://www.linkedin.com/posts/wpwalters_i-thoroughly-enjoyed-the-first-day-of-the-activity-7316447084209991682-Yuki). The usual justification given for using mol2 files is that they are the only file format we have that allows including partial charges. This is a widely held opinion, but it's not really true. It's actually quite easy to store arbitrary atom (or bond) properties in SD files and the RDKit has a simple mechanism for doing this. This post is a short tutorial on how to use this functionality.

The handling of atomic and bond properties in SD files is described in more detail in [the docs](https://www.rdkit.org/docs/RDKit_Book.html#atom-properties-and-sdf-files).

Rather than doing the usual thing and generating Gasteiger-Marsilli charges, here I'm going to use the DASH-props tree that we developed in the Riniker lab to rapidly generate approximate AM1BCC charges. DASH uses a hierarchical tree of substructures to compute high-quality estimates of AM1BCC charges for a molecule. Because it's based on substructures, the method is fast and conformation independent. If want to learn more about the algorithm, take a look at [the open-access paper](https://pubs.aip.org/aip/jcp/article/161/7/074103/3308079/DASH-properties-Estimating-atomic-and-molecular).


If you want to run the code in this notebook, you'll need to install DASH props. This requires adding some packages to your conda environment. There's an [environment file](https://github.com/rinikerlab/DASH-tree/blob/main/tree_only_env.yml) in the DASH-tree repo that you can use, but if you already have a working RDKit environment, I think this is sufficient:
```
% conda install -c conda-forge pytables tqdm
```
and then pip installing the DASH tree package:
```
% python -m pip install git+https://github.com/rinikerlab/DASH-tree
```

> An aside: the way these properties are stored in the SD file is very easy to parse (The format is actually the result of some discussions I had with a couple of cheminformatics software vendors back in 2019; I think I ended up being the only one to implement what we had discussed), so adding support for this format to other software or toolkits that can already read SD files wouldn't be much of a lift. If you're a developer and have questions, please just ask.

In [1]:
from rdkit import Chem
from rdkit.Chem import Draw
import rdkit
print(rdkit.__version__)

# import the required stuff from DASH:
from serenityff.charge.tree.dash_tree import DASHTree, TreeType
from serenityff.charge.data import dash_props_tree_path


2025.03.4


In [2]:
# Load the property tree.
# Note, that the files will be automatically downloaded the first time the tree is loaded from the ETHZ Research Collection.
tree = DASHTree(tree_folder_path=dash_props_tree_path, tree_type=TreeType.FULL)


Loading DASH tree data
Loaded 122 trees and data


Construct a sample molecule and compute the charges:

In [3]:
esomeprazole = Chem.MolFromSmiles('CC1=C(C(=C(C=C1)N)C(=O)N)C(=O)N')
esomep_h = Chem.AddHs(esomeprazole)

charges = tree.get_molecules_partial_charges(esomep_h, chg_key="AM1BCC", chg_std_key="AM1BCC_std")["charges"]

Set the charges as atomic properties:

In [4]:
for atom in esomep_h.GetAtoms():
    atom.SetDoubleProp("AM1BCC_charge", charges[atom.GetIdx()])


In [5]:

# this is the one extra call you have to make in order to 
# convert the atom properties to a molecule property which will 
# be written to the SD file:
Chem.CreateAtomDoublePropertyList(esomep_h,'AM1BCC_charge')

# write the SD data to a string so that we can look at it:
from io import StringIO
sio = StringIO()
with Chem.SDWriter(sio) as w:
    w.write(esomep_h)

sdf = sio.getvalue()
print(sdf)



     RDKit          2D

 25 25  0  0  0  0  0  0  0  0999 V2000
    3.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7500   -1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.7500   -1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.7500    1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7500    1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.0000    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   -1.5000   -2.5981    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.1155   -3.5040    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   -3.0000   -2.5981    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    1.5000   -2.5981    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.0000   -2.5981    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.7500   -3.8971    0

You can see that the property `atom.dprop.AM1BCC_charge` has been added to the SD string. Also that we're writing way too many sig figs... it's probably worth adding an option to control that.

When that SD string is parsed, the atom properties are extracted automatically and assigned to the atoms:

In [6]:
suppl = Chem.SDMolSupplier()
suppl.SetData(sdf,removeHs=False)

m = next(suppl)
for atom in m.GetAtoms():
    print(atom.GetIdx(), atom.GetDoubleProp("AM1BCC_charge"))


0 -0.057020346264131966
1 -0.05498106388730991
2 -0.1417576117737365
3 -0.24649839652124084
4 0.19183992335343653
5 -0.14084077400735293
6 -0.13325346021838766
7 -0.8537314886918032
8 0.6906610922169911
9 -0.6194815833806008
10 -0.6381003462641319
11 0.6751982937456793
12 -0.5850875925315991
13 -0.6722436259414334
14 0.054944129070337355
15 0.054944129070337355
16 0.054944129070337355
17 0.1595973470895479
18 0.1595973470895479
19 0.416229653735868
20 0.416229653735868
21 0.31563710234748005
22 0.31563710234748005
23 0.3187681933044086
24 0.3187681933044086


That's it... hopefully it is useful