#  Modeling Protein-Ligand Interactions with Atomic Convolutions
By [Nathan C. Frey](https://ncfrey.github.io/) | [Twitter](https://twitter.com/nc_frey) and [Bharath Ramsundar](https://rbharath.github.io/) | [Twitter](https://twitter.com/rbhar90)

This DeepChem tutorial introduces the [Atomic Convolutional Neural Network](https://arxiv.org/pdf/1703.10603.pdf). We'll see the structure of the `AtomicConvModel` and write a simple program to run Atomic Convolutions.

### ACNN Architecture
ACNN’s directly exploit the local three-dimensional structure of molecules to hierarchically learn more complex chemical features by optimizing both the model and featurization simultaneously in an end-to-end fashion.

The atom type convolution makes use of a neighbor-listed distance matrix to extract features encoding local chemical environments from an input representation (Cartesian atomic coordinates) that does not necessarily contain spatial locality. The following methods are used to build the ACNN architecture:

- __Distance Matrix__  
The distance matrix $R$ is constructed from the Cartesian atomic coordinates $X$. It calculates distances from the distance tensor $D$. The distance matrix construction accepts as input a $(N, 3)$ coordinate matrix $C$. This matrix is “neighbor listed” into a $(N, M)$ matrix $R$.

```python
    R = tf.reduce_sum(tf.multiply(D, D), 3)     # D: Distance Tensor
    R = tf.sqrt(R)                              # R: Distance Matrix
    return R
```

- **Atom type convolution**  
The output of the atom type convolution is constructed from the distance matrix $R$ and atomic number matrix $Z$. The matrix $R$ is fed into a (1x1) filter with stride 1 and depth of $N_{at}$ , where $N_{at}$ is the number of unique atomic numbers (atom types) present in the molecular system. The atom type convolution kernel is a step function that operates on the neighbor distance matrix $R$.

- **Radial Pooling layer**  
Radial Pooling is basically a dimensionality reduction process that down-samples the output of the atom type convolutions. The reduction process prevents overfitting by providing an abstracted form of representation through feature binning, as well as reducing the number of parameters learned.
Mathematically, radial pooling layers pool over tensor slices (receptive fields) of size (1x$M$x1) with stride 1 and a depth of $N_r$, where $N_r$ is the number of desired radial filters and $M$ is the maximum number of neighbors.

- **Atomistic fully connected network**  
Atomic Convolution layers are stacked by feeding the flattened ($N$, $N_{at}$ $\cdot$ $N_r$) output of the radial pooling layer into the atom type convolution operation. Finally, we feed the tensor row-wise (per-atom) into a fully-connected network. The
same fully connected weights and biases are used for each atom in a given molecule.

Now that we have seen the structural overview of ACNNs, we'll try to get deeper into the model and see how we can train it and what we expect as the output.

For the training, we will use the publicly available PDBbind dataset. In this example, every row reflects a protein-ligand complex and the target is the binding affinity ($K_i$) of the ligand to the protein in the complex.

## Colab

This tutorial and the rest in this sequence are designed to be done in Google colab. If you'd like to open this notebook in colab, you can use the following link.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/Modeling_Protein_Ligand_Interactions_With_Atomic_Convolutions.ipynb)

## Setup

To run DeepChem within Colab, you'll need to run the following cell of installation commands. This will take about 5 minutes to run to completion and install your environment.

In [1]:
!pip install -q condacolab
import condacolab
condacolab.install()
!/usr/local/bin/conda info -e

✨🍰✨ Everything looks OK!

# conda environments:
#
base                   /usr/local



In [2]:
!/usr/local/bin/conda install -c conda-forge pycosat mdtraj pdbfixer openmm -y -q  # needed for AtomicConvs

Channels:
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /usr/local

  added / updated specs:
    - mdtraj
    - openmm
    - pdbfixer
    - pycosat


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    blosc-1.21.6               |       he440d0b_1          47 KB  conda-forge
    c-blosc2-2.16.0            |       h3122c55_0         335 KB  conda-forge
    ca-certificates-2025.6.15  |       hbd8a1cb_0         148 KB  conda-forge
    certifi-2025.6.15          |     pyhd8ed1ab_0         152 KB  conda-forge
    conda-24.11.3              |  py311h38be061_0         1.1 MB  conda-forge
    cuda-nvrtc-12.9.86         |       h5888daf_0        64.1 MB  conda-forge
    cuda-version-12.9          |       h4f385c5_3          21 KB  conda-forge
    hdf5-1.14.3        

In [3]:
!pip install --pre deepchem
import deepchem
deepchem.__version__

Collecting deepchem
  Downloading deepchem-2.8.1.dev20250708125353-py3-none-any.whl.metadata (2.0 kB)
Collecting joblib (from deepchem)
  Downloading joblib-1.5.1-py3-none-any.whl.metadata (5.6 kB)
Collecting numpy<2 (from deepchem)
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting scikit-learn (from deepchem)
  Downloading scikit_learn-1.7.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (17 kB)
Collecting sympy (from deepchem)
  Downloading sympy-1.14.0-py3-none-any.whl.metadata (12 kB)
Collecting rdkit (from deepchem)
  Downloading rdkit-2025.3.3-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.0 kB)
Collecting Pillow (from rdkit->deepchem)
  Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (9.0 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn->deepchem)
  Downloading threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Collecting mpmath<1.4,>=1.

No normalization for SPS. Feature removed!
No normalization for AvgIpc. Feature removed!
No normalization for NumAmideBonds. Feature removed!
No normalization for NumAtomStereoCenters. Feature removed!
No normalization for NumBridgeheadAtoms. Feature removed!
No normalization for NumHeterocycles. Feature removed!
No normalization for NumSpiroAtoms. Feature removed!
No normalization for NumUnspecifiedAtomStereoCenters. Feature removed!
No normalization for Phi. Feature removed!


Instructions for updating:
experimental_relax_shapes is deprecated, use reduce_retracing instead


Skipped loading modules with pytorch-geometric dependency, missing a dependency. No module named 'torch_geometric'


'2.8.1.dev'

In [4]:
import deepchem as dc
import os

import numpy as np
import tensorflow as tf

import matplotlib.pyplot as plt

from rdkit import Chem

from deepchem.molnet import load_pdbbind
from deepchem.models import AtomicConvModel
from deepchem.feat import AtomicConvFeaturizer

### Getting protein-ligand data
If you worked through [Tutorial 13](https://github.com/deepchem/deepchem/blob/master/examples/tutorials/Modeling_Protein_Ligand_Interactions.ipynb) on modeling protein-ligand interactions, you'll already be familiar with how to obtain a set of data from PDBbind for training our model. Since we explored molecular complexes in detail in the [previous tutorial]((https://github.com/deepchem/deepchem/blob/master/examples/tutorials/Modeling_Protein_Ligand_Interactions.ipynb)), this time we'll simply initialize an `AtomicConvFeaturizer` and load the PDBbind dataset directly using MolNet.

In [5]:
f1_num_atoms = 100  # maximum number of atoms to consider in the ligand
f2_num_atoms = 1000  # maximum number of atoms to consider in the protein
max_num_neighbors = 12  # maximum number of spatial neighbors for an atom

acf = AtomicConvFeaturizer(frag1_num_atoms=f1_num_atoms,
                      frag2_num_atoms=f2_num_atoms,
                      complex_num_atoms=f1_num_atoms+f2_num_atoms,
                      max_num_neighbors=max_num_neighbors,
                      neighbor_cutoff=4)

`load_pdbbind` allows us to specify if we want to use the entire protein or only the binding pocket (`pocket=True`) for featurization. Using only the pocket saves memory and speeds up the featurization. We can also use the "core" dataset of ~200 high-quality complexes for rapidly testing our model, or the larger "refined" set of nearly 5000 complexes for more datapoints and more robust training/validation. On Colab, it takes only a minute to featurize the core PDBbind set! This is pretty incredible, and it means you can quickly experiment with different featurizations and model architectures.

In [None]:
%%time
tasks, datasets, transformers = load_pdbbind(featurizer=acf,
                                             save_dir='.',
                                             data_dir='.',
                                             pocket=True,
                                             reload=False,
                                             set_name='core')



Unfortunately, if you try to use the "refined" dataset, there are some complexes that cannot be featurized. To resolve this issue, rather than increasing `complex_num_atoms`, simply omit the lines of the dataset that have an `x` value of `None`

In [None]:
class MyTransformer(dc.trans.Transformer):
  def transform_array(x, y, w, ids):
    kept_rows = x != None
    return x[kept_rows], y[kept_rows], w[kept_rows], ids[kept_rows],

datasets = [d.transform(MyTransformer) for d in datasets]

In [None]:
datasets

In [None]:
train, val, test = datasets

### Training the model

Now that we've got our dataset, let's go ahead and initialize an `AtomicConvModel` to train. Keep the input parameters the same as those used in `AtomicConvFeaturizer`, or else we'll get errors. `layer_sizes` controls the number of layers and the size of each dense layer in the network. We choose these hyperparameters to be the same as those used in the [original paper](https://arxiv.org/pdf/1703.10603.pdf).

In [None]:
acm = AtomicConvModel(n_tasks=1,
                      frag1_num_atoms=f1_num_atoms,
                      frag2_num_atoms=f2_num_atoms,
                      complex_num_atoms=f1_num_atoms+f2_num_atoms,
                      max_num_neighbors=max_num_neighbors,
                      batch_size=12,
                      layer_sizes=[32, 32, 16],
                      learning_rate=0.003,
                      )

In [None]:
losses, val_losses = [], []

In [None]:
%%time
max_epochs = 50

metric = dc.metrics.Metric(dc.metrics.score_function.rms_score)
step_cutoff = len(train)//12
def val_cb(model, step):
  if step%step_cutoff!=0:
      return
  val_losses.append(model.evaluate(val, metrics=[metric])['rms_score']**2)  # L2 Loss
  losses.append(model.evaluate(train, metrics=[metric])['rms_score']**2)  # L2 Loss

acm.fit(train, nb_epoch=max_epochs, max_checkpoints_to_keep=1,
                callbacks=[val_cb])

The loss curves are not exactly smooth, which is unsurprising because we are using 154 training and 19 validation datapoints. Increasing the dataset size may help with this, but will also require greater computational resources.

In [None]:
f, ax = plt.subplots()
ax.scatter(range(len(losses)), losses, label='train loss')
ax.scatter(range(len(val_losses)), val_losses, label='val loss')
plt.legend(loc='upper right');

The [ACNN paper](https://arxiv.org/pdf/1703.10603.pdf) showed a Pearson $R^2$ score of 0.912 and 0.448 for a random 80/20 split of the PDBbind core train/test sets. Here, we've used an 80/10/10 training/validation/test split and achieved similar performance for the training set (0.943). We can see from the performance on the training, validation, and test sets (and from the results in the paper) that the ACNN can learn chemical interactions from small training datasets, but struggles to generalize. Still, it is pretty amazing that we can train an `AtomicConvModel` with only a few lines of code and start predicting binding affinities!  
From here, you can experiment with different hyperparameters, more challenging splits, and the "refined" set of PDBbind to see if you can reduce overfitting and come up with a more robust model.

In [None]:
score = dc.metrics.Metric(dc.metrics.score_function.pearson_r2_score)
for tvt, ds in zip(['train', 'val', 'test'], datasets):
  print(tvt, acm.evaluate(ds, metrics=[score]))

### Further reading
We have explored the ACNN architecture and used the PDBbind dataset to train an ACNN to predict protein-ligand binding energies. For more information, read the original paper that introduced ACNNs: Gomes, Joseph, et al. "Atomic convolutional networks for predicting protein-ligand binding affinity." [arXiv preprint arXiv:1703.10603](https://arxiv.org/abs/1703.10603) (2017). There are many other methods and papers on predicting binding affinities. Here are a few interesting ones to check out: predictions using [only ligands or proteins](https://www.frontiersin.org/articles/10.3389/fphar.2020.00069/full), [molecular docking with deep learning](https://chemrxiv.org/articles/preprint/GNINA_1_0_Molecular_Docking_with_Deep_Learning/13578140), and [AtomNet](https://arxiv.org/abs/1510.02855).

## Quantum stuff

In [None]:
# 1) IMPORTS & DATA LOADING
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import MinMaxScaler
from deepchem.molnet import load_pdbbind
# Qiskit ML pieces
from qiskit.circuit.library import ZFeatureMap
from qiskit_machine_learning.algorithms import VQR
from qiskit.primitives import Estimator
from scipy.optimize import L_BFGS_B

# 2) LOAD DEEPCHEM DATA
# (assumes you've already run your AtomicConvFeaturizer load & MyTransformer steps)
tasks, datasets, transformers = load_pdbbind(
    featurizer=acf,
    save_dir='.',
    data_dir='.',
    pocket=True,
    reload=False,
    set_name='core'
)
train, val, test = datasets
X_train_dc, y_train = train.X, train.y
X_val_dc,   y_val   = val.X,   val.y
X_test_dc,  y_test  = test.X,  test.y

# 3) FLATTEN DeepChem FEATURES → CLASSICAL MATRIX
def flatten_dc_features(X_dc):
    flat = []
    for sample in X_dc:
        # each sample is a tuple/list of 9 arrays (one per feature type)
        arrays = [arr.flatten() for arr in sample]
        flat.append(np.concatenate(arrays))
    return np.vstack(flat)

X_train_flat = flatten_dc_features(X_train_dc)
X_val_flat   = flatten_dc_features(X_val_dc)
X_test_flat  = flatten_dc_features(X_test_dc)

# 4) SCALE + PCA → N‐QUBIT INPUT VECTORS
n_qubits = 6  # pick based on your simulator capacity
# scale into [−π,π] so feature_map rotations stay numeric
fm_scaler = MinMaxScaler(feature_range=(-np.pi, np.pi))
X_all = fm_scaler.fit_transform(
    np.vstack([X_train_flat, X_val_flat, X_test_flat])
)
pca = PCA(n_components=n_qubits)
X_all_pca = pca.fit_transform(X_all)

# split back
N_train = X_train_flat.shape[0]
N_val   = X_val_flat.shape[0]
X_train_pca = X_all_pca[:N_train]
X_val_pca   = X_all_pca[N_train:N_train+N_val]
X_test_pca  = X_all_pca[N_train+N_val:]

# scale targets y into [−1,+1] for single‐qubit range
y_scaler = MinMaxScaler(feature_range=(-1, 1))
y_all = y_scaler.fit_transform(
    np.vstack([y_train.reshape(-1,1), y_val.reshape(-1,1), y_test.reshape(-1,1)])
)
y_train_scaled = y_all[:N_train].ravel()
y_val_scaled   = y_all[N_train:N_train+N_val].ravel()
y_test_scaled  = y_all[N_train+N_val:].ravel()

# 5) BUILD THE QCNN CIRCUIT
# 5.1 Feature map
feature_map = ZFeatureMap(num_qubits=n_qubits, reps=1)

# 5.2 A simple “1‐conv + 1‐pool” QCNN block for illustration
from qiskit import QuantumCircuit
from qiskit.circuit import ParameterVector

def conv_instruction(n, prefix):
    params = ParameterVector(prefix, length=3*n)
    qc = QuantumCircuit(n, name="Conv")
    idx = 0
    for q1,q2 in zip(range(0,n,2), range(1,n,2)):
        sub = QuantumCircuit(2)
        sub.rz(-np.pi/2, 1)
        sub.cx(1,0)
        sub.rz(params[idx],   0)
        sub.ry(params[idx+1], 1)
        sub.cx(0,1)
        sub.ry(params[idx+2], 1)
        qc.compose(sub, [q1,q2], inplace=True)
        qc.barrier()
        idx += 3
    return qc.to_instruction()

def pool_instruction(srcs, sinks, prefix):
    params = ParameterVector(prefix, length=3*len(srcs))
    n = len(srcs)+len(sinks)
    qc = QuantumCircuit(n, name="Pool")
    idx = 0
    for s,t in zip(srcs, sinks):
        sub = QuantumCircuit(2)
        sub.rz(-np.pi/2, 1)
        sub.cx(1,0)
        sub.rz(params[idx],   0)
        sub.ry(params[idx+1], 1)
        sub.cx(0,1)
        sub.ry(params[idx+2], 1)
        qc.compose(sub, [s,t], inplace=True)
        qc.barrier()
        idx += 3
    return qc.to_instruction()

# Assemble: fmap → conv → pool → measure Z⊗… at end
qc_cnn = QuantumCircuit(n_qubits)
qc_cnn.compose(feature_map, range(n_qubits), inplace=True)
qc_cnn.append(conv_instruction(n_qubits, "c1"), range(n_qubits))
# example pooling halving qubits [0→n/2],[1→n/2+1],…
src = list(range(0,n_qubits,2))
snk = list(range(1,n_qubits,2))
qc_cnn.append(pool_instruction(src, snk, "p1"), range(n_qubits))
# now only n_qubits/2 logical remain; you’d repeat conv+pool until 1 remains

# 6) WRAP IN A VQR & TRAIN
vqr = VQR(
    feature_map=feature_map,
    ansatz=qc_cnn,
    optimizer=L_BFGS_B(maxiter=150),
    estimator=Estimator(),
)

# fit on train, validate on val
vqr.fit(X_train_pca, y_train_scaled)
y_val_pred_scaled = vqr.predict(X_val_pca)
y_val_pred = y_scaler.inverse_transform(y_val_pred_scaled.reshape(-1,1)).ravel()

# 7) EVALUATE ON TEST
y_test_pred_scaled = vqr.predict(X_test_pca)
y_test_pred = y_scaler.inverse_transform(y_test_pred_scaled.reshape(-1,1)).ravel()

# Now you have y_test_pred in the original binding‐energy units!
