Predicting NMR chemical shifts and acid dissociation constant (pKa) for titrable groups from protein structure using learned atomic descriptors (MACE, ORB, AIMNet2, LoCO‑HD) and a lightweight neural network classifier/regressor.
Reference paper: Representing local protein environments with atomistic foundation models
Authors: Meital Bojan, Sanketh Vedula, Advaith Maddipatla, Nadav Bojan Sellam, Federico Napoli, Paul Schanda, Alex M. Bronstein.
Use the minimal environment.
conda env create -f environment.yml
conda activate forcefieldsforce_fields-production/
├── main.py # Entry point for descriptor generation & experiments
├── cs_predict.py # End-to-end chemical shift prediction on a PDB/CIF
├── pka_predict.py # End-to-end pKa prediction on a PDB/CIF for titrable residues
├── experiment.py # Experiment wrapper (dataset, model, trainer, wandb)
├── configs/ # YAML configs (descriptor + experiments)
│ ├── mace_config.yaml
│ ├── orb_config.yaml
│ ├── aimnet_config.yaml
│ └── locohd_config.yaml
├── data/ # Dataset handling and input files
├── descriptors/ # Descriptor implementations
├── models/ # Model architectures and training scripts
└── utils/ # Helper scripts (losses, preprocessing, etc.)
Run the descriptor generation according to the chosen configuration file with mode="descriptor":
python main.py --config [config_file]This step will compute descriptors for all structures under the configured data.
Train models based on precomputed descriptors according to the chosen configuration file with mode="experiments":
python main.py --config [config_file]The results and checkpoints will be saved in models/checkpoints/.
To predict NMR chemical shifts for a given structure, use:
python cs_predict.py \
--pdb path/to/structure.pdb \
--rmax 5.0 \
--output outputs/predictions.csv \
--device cuda:0 \
--save_dir outputs/ \
--prefix "" \
--mace path/to/mace_model.pt \
--cs_models path/to/model1.ckpt path/to/model2.ckptNote: Update the paths to the mace model and the trained chemical shift prediction models (
--cs_models) before running predictions.
To predict pKa for a given structure, use:
python pka_predict.py \
--pdb path/to/structure.pdb \
--atoms N CA C H HA CB
--rmax 5.0 \
--output outputs/predictions.csv \
--device cuda:0 \
--save_dir outputs/ \
--prefix "" \
--mace path/to/mace_model.pt \
--pka_model_paths LYS=path/to/model1.ckpt HIS=path/to/model2.ckptNote: Update the paths and keys to the mace model (optionally) and the trained pKa prediction models (
--pka_model_paths) before running predictions.
- Download: Weights can be found in following Dropbox link
- Place data under:
./data/
If you use this repository, please cite the following paper:
@misc{bojan2025representinglocalproteinenvironments,
title={Representing local protein environments with atomistic foundation models},
author={Meital Bojan and Sanketh Vedula and Advaith Maddipatla and Nadav Bojan Sellam and Federico Napoli and Paul Schanda and Alex M. Bronstein},
year={2025},
eprint={2505.23354},
archivePrefix={arXiv},
primaryClass={q-bio.BM},
url={https://arxiv.org/abs/2505.23354},
}- Make sure
dsspis installed if using secondary-structure features and the DSSP_PATH is correct. - Check and modify configs under
configs/to switch descriptors or tune hyperparameters.