Representing local protein environments with machine learning force fields

Predicting NMR chemical shifts and acid dissociation constant (pKa) for titrable groups from protein structure using learned atomic descriptors (MACE, ORB, AIMNet2, LoCO‑HD) and a lightweight neural network classifier/regressor.

Reference paper: Representing local protein environments with atomistic foundation models

Authors: Meital Bojan, Sanketh Vedula, Advaith Maddipatla, Nadav Bojan Sellam, Federico Napoli, Paul Schanda, Alex M. Bronstein.

1) Environment

Use the minimal environment.

conda env create -f environment.yml
conda activate forcefields

2) Project layout

force_fields-production/
├── main.py                    # Entry point for descriptor generation & experiments
├── cs_predict.py              # End-to-end chemical shift prediction on a PDB/CIF
├── pka_predict.py             # End-to-end pKa prediction on a PDB/CIF for titrable residues
├── experiment.py              # Experiment wrapper (dataset, model, trainer, wandb)
├── configs/                   # YAML configs (descriptor + experiments)
│   ├── mace_config.yaml
│   ├── orb_config.yaml
│   ├── aimnet_config.yaml
│   └── locohd_config.yaml
├── data/                      # Dataset handling and input files
├── descriptors/               # Descriptor implementations
├── models/                    # Model architectures and training scripts
└── utils/                     # Helper scripts (losses, preprocessing, etc.)

3) Usage overview

A) Generate descriptors

Run the descriptor generation according to the chosen configuration file with mode="descriptor":

python main.py --config [config_file]

This step will compute descriptors for all structures under the configured data.

B) Train and evaluate

Train models based on precomputed descriptors according to the chosen configuration file with mode="experiments":

python main.py --config [config_file]

The results and checkpoints will be saved in models/checkpoints/.

C) Predict chemical shifts on a PDB/CIF

To predict NMR chemical shifts for a given structure, use:

python cs_predict.py \
  --pdb path/to/structure.pdb \
  --rmax 5.0 \
  --output outputs/predictions.csv \
  --device cuda:0 \
  --save_dir outputs/ \
  --prefix "" \
  --mace path/to/mace_model.pt \
  --cs_models path/to/model1.ckpt path/to/model2.ckpt

Note: Update the paths to the mace model and the trained chemical shift prediction models (--cs_models) before running predictions.

D) Predict pKa on a PDB/CIF

To predict pKa for a given structure, use:

python pka_predict.py \
  --pdb path/to/structure.pdb \
  --atoms N CA C H HA CB
  --rmax 5.0 \
  --output outputs/predictions.csv \
  --device cuda:0 \
  --save_dir outputs/ \
  --prefix "" \
  --mace path/to/mace_model.pt \
  --pka_model_paths LYS=path/to/model1.ckpt HIS=path/to/model2.ckpt

Note: Update the paths and keys to the mace model (optionally) and the trained pKa prediction models (--pka_model_paths) before running predictions.

4) Data & models

Download: Weights can be found in following Dropbox link
Place data under: ./data/

5) Citation

If you use this repository, please cite the following paper:

@misc{bojan2025representinglocalproteinenvironments,
      title={Representing local protein environments with atomistic foundation models}, 
      author={Meital Bojan and Sanketh Vedula and Advaith Maddipatla and Nadav Bojan Sellam and Federico Napoli and Paul Schanda and Alex M. Bronstein},
      year={2025},
      eprint={2505.23354},
      archivePrefix={arXiv},
      primaryClass={q-bio.BM},
      url={https://arxiv.org/abs/2505.23354}, 
}

6) Tips

Make sure dssp is installed if using secondary-structure features and the DSSP_PATH is correct.
Check and modify configs under configs/ to switch descriptors or tune hyperparameters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Representing local protein environments with machine learning force fields

1) Environment

2) Project layout

3) Usage overview

A) Generate descriptors

B) Train and evaluate

C) Predict chemical shifts on a PDB/CIF

D) Predict pKa on a PDB/CIF

4) Data & models

5) Citation

6) Tips

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
data		data
descriptors		descriptors
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md
cs_predict.py		cs_predict.py
environment.yml		environment.yml
experiment.py		experiment.py
main.py		main.py
pka_predict.py		pka_predict.py

mb012/MLFF_representation

Folders and files

Latest commit

History

Repository files navigation

Representing local protein environments with machine learning force fields

1) Environment

2) Project layout

3) Usage overview

A) Generate descriptors

B) Train and evaluate

C) Predict chemical shifts on a PDB/CIF

D) Predict pKa on a PDB/CIF

4) Data & models

5) Citation

6) Tips

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages